Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns

The Gowalla dataset, a historical benchmark from the now-defunct location-based social network, offers rich check-in and social graph data that has powered foundational research in Point-of-Interest (POI) recommendations, human mobility modeling, and social influence on real-world behavior. Despite its age, Gowalla remains valuable for studying how time, geography, and social context shape user activity. This post explores its structure, use cases, limitations, and how to leverage it with Shaped to build context-aware recommendation models.

Before the current era of ubiquitous location sharing, platforms like Gowalla pioneered the concept of the Location-Based Social Network (LBSN). While the Gowalla service itself is no longer active (it shut down in 2012), the Gowalla dataset, primarily curated and distributed by the Stanford Network Analysis Platform (SNAP), remains a significant historical benchmark for researchers studying user mobility, Point-of-Interest (POI) recommendations, and the interplay between social connections and real-world location visits.

Understanding this dataset is valuable for anyone interested in the foundations of LBSN analysis, context-aware recommendations, and modeling human mobility patterns.

What is the Gowalla Dataset?

The Gowalla dataset typically consists of check-in data collected from the Gowalla LBSN platform before its shutdown. Users on Gowalla could "check in" to specific physical locations (Points of Interest - POIs), sharing their whereabouts with friends. The dataset captures these interactions and the underlying social structure.

Key components usually include:

  1. Check-in Data: Records of users checking into specific locations at particular times.
    • user_id: Identifier for the user performing the check-in.
    • check-in_time: Timestamp of the check-in event.
    • latitude: Latitude coordinate of the check-in location.
    • longitude: Longitude coordinate of the check-in location.
    • location_id (or spot_id): Identifier for the specific POI being checked into.
  2. Social Graph Data: Information about the friendship links between users on the platform.
    • Pairs of user_ids representing a mutual friendship.
  3. Location/POI Metadata (Sometimes Limited): Basic information about the locations themselves might sometimes be included or inferable, but often the primary focus is on the check-in event itself and its coordinates/ID.

Key Characteristics

The Gowalla dataset is defined by:

  • Domain: Location-Based Social Networking (LBSN).
  • Primary Signal: Implicit Feedback via user check-ins. A check-in implies user presence and potential interest in a location.
  • Geo-Spatial Focus: Latitude and longitude coordinates are central, enabling analysis of spatial patterns and location-based recommendations.
  • Social Dimension: The inclusion of the friendship graph allows for studying social influence on location choices and mobility.
  • Temporal Aspect: Timestamps on check-ins permit the study of sequential patterns, daily/weekly rhythms, and user mobility over time.
  • Historical Snapshot: Represents user activity up to Gowalla's shutdown in early 2012. It does not reflect current user behavior or locations.

Why is the Gowalla Dataset Important (Historical Significance)?

Despite its age, the Gowalla dataset remains influential:

  1. Pioneering LBSN Benchmark: It was one of the first large-scale, publicly available datasets capturing real-world LBSN activity, establishing a benchmark for early research in this area.
  2. Foundation for POI Recommendation: Provided crucial data for developing and evaluating algorithms specifically designed to recommend Points of Interest, considering factors like location proximity, user history, time, and social ties.
  3. Human Mobility Pattern Analysis: Enabled numerous studies on understanding how people move within cities, popular routes, home/work detection, and the predictability of movement.
  4. Social Influence on Location: Allowed researchers to quantify how friends' check-ins and proximity influence a user's own location choices.
  5. Context-Aware Recommendation Research: Served as a testbed for incorporating context (time, location, social connections) into recommendation models.

Strengths of the Gowalla Dataset

  • Real-World LBSN Data: Captured genuine user check-in behavior and social connections from a popular platform of its time.
  • Combines Location, Time, and Social Data: Offers a rich multi-faceted view of user activity.
  • Implicit Signal: Check-ins provide a strong implicit signal of user presence and context.
  • Widely Used Benchmark: Facilitates comparison across numerous research papers focusing on LBSNs, POI recommendations, and mobility.

Weaknesses & Considerations

  • Historical Data: The data is over a decade old (pre-2012) and does not reflect current mobility patterns, POI popularity, or the modern LBSN landscape. This is its primary limitation.
  • Platform Shutdown: No possibility of new data or updates from the original source.
  • Data Sparsity: Users typically check into a limited number of locations compared to the total available.
  • Potential Biases: Check-in behavior might be biased towards certain types of users, locations (e.g., social venues vs. mundane places), or geographic areas covered by the platform's user base.
  • Privacy Considerations: While anonymized, location data is inherently sensitive. Ethical usage according to the dataset's terms is crucial.

Common Use Cases & Applications (Primarily Historical/Benchmarking)

  • Developing and benchmarking Point-of-Interest (POI) recommendation algorithms.
  • Modeling next check-in prediction or sequential location patterns.
  • Analyzing the influence of the social network on location choices.
  • Studying human mobility patterns and urban dynamics.
  • Evaluating context-aware recommendation models incorporating time and location.
  • Researching algorithms for friend recommendation based on location similarity.
  • Testing cold-start recommendation strategies in LBSNs.

How to Access the Gowalla Dataset

The most common and reliable source for the Gowalla dataset is the SNAP (Stanford Network Analysis Platform) repository:

This page typically provides access to both the check-in data and the social network edges, along with basic statistics and citation information. Always review and adhere to the terms of use specified by SNAP.

Connecting the Gowalla Dataset to Shaped

Shaped can effectively model the spatio-temporal and social aspects inherent in datasets like Gowalla. Connecting the SNAP Gowalla dataset involves mapping its core components to Shaped's expected structure, allowing you to build powerful POI recommendation models. Here’s a conceptual guide:

1. Setup: Ensure you have the Shaped CLI, pyyaml, and pandas installed, and initialize the Shaped client with your API key.

init-script.py + shell

1 pip install shaped pyyaml pandas
2 
3 import os
4 SHAPED_API_KEY = os.getenv('TEST_SHAPED_API_KEY', '<YOUR_API_KEY>')
5 
6 shaped init --api-key $SHAPED_API_KEY

2. Dataset Preparation (Conceptual): Download the Gowalla check-in data file (e.g., loc-gowalla_totalCheckins.txt.gz) from SNAP. Load it, likely needing to specify the tab separator and column names (user_id, check_in_time, latitude, longitude, location_id).

Map the Gowalla fields to Shaped's requirements:

  • user_id -> user_id (direct mapping)
  • location_id -> item_id (treating POIs as items)
  • check_in_time -> created_at (needs conversion from Gowalla's timestamp format, likely ISO 8601, to Unix epoch seconds/milliseconds)
  • latitude, longitude: Keep these as valuable contextual features for the check-in event.
prepare_gowalla.py

1 import pandas as pd
2 from datetime import datetime
3 
4 data_dir = "path/to/gowalla/data"
5 checkins_file = f"{data_dir}/loc-gowalla_totalCheckins.txt"
6 
7 # Load check-in data
8 checkins_df = pd.read_csv(
9     checkins_file,
10     sep='\t',
11     header=None,
12     names=['user_id', 'check_in_time', 'latitude', 'longitude', 'location_id']
13 )
14 
15 # Convert timestamps to epoch seconds
16 checkins_df['created_at'] = checkins_df['check_in_time'].apply(
17     lambda x: int(datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ').timestamp())
18 )
19 
20 # Rename for Shaped standard
21 checkins_df.rename(columns={'location_id': 'item_id'}, inplace=True)
22 
23 # Select and reorder relevant columns
24 shaped_df = checkins_df[['user_id', 'item_id', 'created_at', 'latitude', 'longitude']]
25 
26 prepared_file_path = f"{data_dir}/shaped_ready_gowalla.jsonl"
27 # Export to JSONL if needed
28 # shaped_df.to_json(prepared_file_path, orient='records', lines=True)
29 print(f"Gowalla data conceptually prepared at: {prepared_file_path}")
30 
31 # Note: You can also process gowalla_edges.txt.gz for social graph features if desired.

3. Create Shaped Dataset using URI: Instead of defining a YAML schema first, we can directly create the dataset and upload the local file using the create-dataset-from-uri command. This command handles both creation and the initial data insertion.

Make sure your prepared JSONL file exists at the path specified (e.g., path/to/gowalla/data/shaped_ready_gowalla.jsonl).

upload-gowalla-checkins.sh

1 # Replace the path with the actual path to your prepared JSONL file
2 shaped create-dataset-from-uri --name gowalla_checkins \
3                                 --path path/to/gowalla/data/shaped_ready_gowalla.jsonl \
4                                 --type jsonl

This command creates a dataset named gowalla_checkins and uploads the content of the specified JSONL file. You can monitor the dataset status using shaped list-datasets --filter-name gowalla_checkins.

4. Create Shaped Model: Define the model schema, specifying how to fetch data and include the location features from the dataset created above.

generate_gowalla_model_yaml.py

1 import yaml
2 import os
3 
4 dir_path = "gowalla_assets"  # Create if needed
5 os.makedirs(dir_path, exist_ok=True)
6 
7 gowalla_poi_model_schema = {
8     "model": {
9         "name": "gowalla_poi_recommendations"
10         # You might specify objectives like 'ranking' or 'next-item'
11     },
12     "connectors": [
13         {
14             "type": "Dataset",
15             "id": "gowalla_checkins",  # Must match the dataset name
16             "name": "gowalla_checkins"
17         }
18         # Add a second connector here if including a social graph dataset
19     ],
20     "fetch": {
21         "events": """
22 SELECT
23     user_id,
24     item_id,     -- Corresponds to location_id in original data
25     created_at,  -- Timestamp of check-in
26     1 as label
27 FROM gowalla_checkins
28 """,
29         "items": """
30 SELECT
31     item_id,     -- Corresponds to location_id in original data
32     latitude,     -- Geo-coordinate feature
33     longitude     -- Geo-coordinate feature
34 FROM gowalla_locations
35 """
36         # Optionally add user features if using social graph
37     }
38 }
39 
40 # Write to file
41 with open(f'{dir_path}/gowalla_poi_model_schema.yaml', 'w') as file:
42     yaml.dump(gowalla_poi_model_schema, file)

Create the model:

create-gowalla-model.sh

1 shaped create-model --file $dir_path/gowalla_poi_model_schema.yaml

Once trained, Shaped can provide POI recommendations, implicitly learning from the location sequences, time patterns, and potentially incorporating the explicit latitude/longitude features provided in the item data.

Conclusion: A Foundational Dataset for LBSN Research

The Gowalla dataset, curated by SNAP, holds a significant place in the history of recommender systems and network analysis. As one of the first large-scale public datasets from a Location-Based Social Network, it fueled foundational research into Point-of-Interest (POI) recommendations, human mobility modeling, and the impact of social ties on real-world behavior. While its historical nature means it doesn't represent current trends, it remains a valuable benchmark for understanding the principles and challenges of LBSN data analysis and serves as a testament to the early exploration of integrating location, time, and social context into intelligent systems.

Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.

Get up and running with one engineer in one sprint

Guaranteed lift within your first 30 days or your money back

100M+
Users and items
1000+
Queries per second
1B+
Requests

Related Posts

Tullie Murrell
 | 
May 27, 2025

The Power of Deep Learning for Hyper-Personalized Recommendations

Tullie Murrell
 | 
May 13, 2025

Key Insights from the Netflix Personalization, Recommendations & Search Workshop 2025

Tullie Murrell
 | 
June 5, 2024

Is this the ChatGPT moment for recommendation systems?