Powering AI Personalization with Your BigQuery Data and Shaped

BigQuery is a powerhouse for large-scale analytics, storing deep interaction histories, rich item catalogs, and user segments. But what if that same data could drive intelligent, real-time personalization? That’s where Shaped comes in. By connecting BigQuery to Shaped, teams can transform their warehouse into a real-time engine for recommendations and search, no custom pipelines or MLOps required. This post breaks down how BigQuery + Shaped unlock smarter, faster relevance from the data you already have.

June 5, 2025

min read

Tullie Murrell

Activating Your Data Warehouse for Intelligent Experiences

Google BigQuery stands as a cornerstone for many organizations, serving as a powerful, scalable data warehouse for storing vast amounts of historical user interaction data, detailed user profiles, rich item catalogs, and critical business metrics. While BigQuery excels at analytics and batch processing, the challenge often lies in activating this wealth of data to drive real-time personalized experiences like recommendations and search ranking.

How do you leverage the deep historical insights locked within your BigQuery tables to predict what a user wants right now? How do you use your carefully curated item metadata and user segments from BigQuery to power sophisticated AI models without building complex, fragile ETL pipelines and ML infrastructure? This is where Shaped's dedicated BigQuery connector provides a seamless solution.

Shaped is an AI-native relevance platform designed to connect directly to your data warehouse, ingest relevant data, train state-of-the-art machine learning models, and serve personalized recommendations and search rankings via simple APIs. This post outlines the value of connecting BigQuery to Shaped and provides a step-by-step guide to setting up the integration.

Why Connect BigQuery to Shaped? Leveraging Your Data Asset

Connecting your BigQuery warehouse directly to Shaped allows you to transform your historical data and curated catalogs into powerful drivers for personalization and insight generation across various use cases:

Data-Rich Recommendations: Utilize the comprehensive data in BigQuery to fuel highly relevant suggestions:
- Leverage Long-Term History: Generate recommendations based on deep historical interaction patterns stored in BigQuery, complementing real-time signals.
- Catalog-Aware Recommendations: Incorporate rich item metadata (attributes, categories, descriptions) directly from your BigQuery catalog tables.
- User Profile Personalization: Utilize user segments, demographics, or calculated attributes stored in BigQuery to tailor recommendations.
- "Similar Item" based on Rich Attributes: Find related items based on detailed metadata alongside behavioral signals.
- Cold-Start Mitigation: Use historical data and item attributes from BigQuery to provide better recommendations for new users or items with sparse interaction data.
Enhanced Search Relevance: Improve search results by incorporating warehouse insights:
- Attribute-Based Filtering & Faceting: Easily use item attributes synced from BigQuery for powerful filtering in search results (via Shaped's APIs).
- Offline Metric Optimization: Train models that learn from historical conversion data or business metrics stored in BigQuery to optimize search ranking.
- Enriching Search with Metadata: Ensure search models have access to the latest, most accurate item attributes directly from your source of truth in BigQuery.
Deeper Analytics & Insights: Connect warehouse data to powerful ML models for analysis:
- Historical Trend Analysis: Train models on specific time windows from BigQuery to understand how user behavior or item relevance has evolved.
- Attribute Importance: Understand which item attributes from your BigQuery tables are most predictive of user engagement.
- Offline Evaluation: Use historical interaction datasets from BigQuery to evaluate the potential performance of different personalization strategies before deploying them live.
Simplified Data Flow: Avoid building and maintaining complex export/import jobs or reverse ETL processes. Shaped's connector handles the data synchronization directly from BigQuery.
Scheduled Training: Automatically retrain models on a schedule as new data lands in your BigQuery tables, ensuring models stay fresh without manual intervention.

How it Works: The BigQuery Dataset Connector

Shaped's BigQuery connector works by securely accessing your specified BigQuery tables using a dedicated Google Cloud Platform (GCP) service account granted read-only permissions. You configure the connection within Shaped, defining which table, columns, and filters to use. Shaped then periodically syncs data from BigQuery based on a timestamp column you specify, ensuring that the models are trained on up-to-date information without requiring real-time streaming infrastructure between BigQuery and Shaped.

Connecting BigQuery to Shaped

Setting up the connection involves granting Shaped read access and configuring the dataset in Shaped.

Step 1: Grant Shaped Read-Only Permissions in GCP

To allow Shaped's secure service account to read data from your BigQuery project, you need to grant it specific IAM roles.

Contact Shaped: Reach out to the Shaped team (via your support channel or sales contact) to obtain the specific email address of Shaped's dedicated GCP service account (<OUR_SERVICE_ACCOUNT>).‍
Install gcloud CLI: Ensure you have the gcloud command-line tool installed and configured for your GCP project.‍
Grant Roles: Execute the following gcloud commands, replacing <YOUR_PROJECT> with your GCP project ID and <OUR_SERVICE_ACCOUNT> with the email provided by Shaped. These commands grant the necessary read-only permissions:

    grant-bigquery-access.sh
    
  

    
# Allows viewing data in tables
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
   --member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
   --role='roles/bigquery.dataViewer'

# Allows running BigQuery jobs (like export/read jobs)
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
   --member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
   --role='roles/bigquery.jobUser'
10
# Allows reading data via the BigQuery Storage Read API for efficiency
gcloud projects add-iam-policy-binding <YOUR_PROJECT> \
   --member='serviceAccount:<OUR_SERVICE_ACCOUNT>' \
   --role='roles/bigquery.readSessionUser'

  

These roles ensure Shaped can only read data and cannot make any modifications to your BigQuery resources.

Step 2: Configure the Shaped Dataset (YAML)

Next, define the connection parameters in a YAML configuration file. This file tells Shaped what data to sync from BigQuery.

Create a YAML file (e.g., bq_dataset.yaml):

    bq_dataset.yaml
    
  

    
name: your_bigquery_dataset # Choose a descriptive name for your Shaped dataset

# --- Required Fields ---

schema_type: BIGQUERY # Specifies the connector type

# Fully qualified BigQuery table name: project.dataset.table
# NOTE: If your project ID contains special characters like hyphens,
# enclose the project ID in backticks AND double quotes.
table: "`your-gcp-project`.your_bq_dataset.your_bq_table"

columns: ["user_id", "item_id", "event_type", "timestamp", "item_category", "item_price"]

# The column in your BigQuery table containing a timestamp or datetime.
# Shaped uses this to perform efficient incremental syncs after the initial load.
datetime_key: "timestamp"

# --- Optional Fields ---

# start_datetime: "2023-01-01T00:00:00Z"
# filters: ["country = 'US'", "event_type IN ('purchase', 'view')"]
# unique_keys: ["user_id", "item_id", "timestamp"]
# batch_size: 100000

  

Key Configuration Points:

table:
Must be the fully qualified name (project.dataset.table). Pay attention to quoting if your project ID has special characters.
columns: Select only the columns needed for your personalization models (user IDs, item IDs, timestamps, event types, relevant metadata).
datetime_key:
Crucial for efficient incremental updates. Choose a column that reliably indicates when a row was created or last updated (e.g., updated_at, created_at, event_timestamp).

Step 3: Create the Dataset in Shaped

Use the Shaped CLI to create the dataset using the YAML file you just configured:

    bq_dataset.yaml
    
1 shaped create-dataset --file bq_dataset.yaml

Shaped will validate the configuration and attempt to connect to your BigQuery table using the granted permissions. You can monitor the dataset's status (syncing progress, potential errors) on the Shaped Dashboard or via the CLI (shaped view-dataset --dataset-name your_bigquery_dataset).

What Happens Next? Syncing, Training, Serving

Once the connection is successfully established:

Initial Sync: Shaped performs an initial full sync of the data from your specified BigQuery table, respecting any start_datetime or filters you configured. This may take time depending on table size.‍
Incremental Syncs:
After the initial load, Shaped periodically checks for new or updated rows in your BigQuery table based on the datetime_key and syncs only the changes.
Model Training: As data syncs, Shaped uses it to train its powerful AI models tailored for search ranking and recommendations. You can configure training schedules within Shaped.‍
API Serving: Once models are trained, you can query Shaped's APIs to get personalized results based on the insights derived from your BigQuery data.‍
Ongoing Retraining: Shaped automatically retrains models based on the latest synced data according to your schedule, keeping personalization fresh.

Conclusion: Activate Your BigQuery Data Powerhouse

Your BigQuery data warehouse is a rich source of information critical for deep personalization. Shaped's BigQuery connector provides a secure, efficient bridge to activate this data, allowing you to leverage state-of-the-art AI for recommendations and search without building complex custom ML infrastructure. By connecting BigQuery to Shaped, you can transform historical data and curated catalogs into dynamic, personalized experiences that drive engagement and business value.

Ready to unlock the personalization potential hidden in your BigQuery data?

‍Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.

Powering AI Personalization with Your BigQuery Data and Shaped

Activating Your Data Warehouse for Intelligent Experiences

Why Connect BigQuery to Shaped? Leveraging Your Data Asset

How it Works: The BigQuery Dataset Connector

Connecting BigQuery to Shaped

Step 1: Grant Shaped Read-Only Permissions in GCP

Step 2: Configure the Shaped Dataset (YAML)

Step 3: Create the Dataset in Shaped

What Happens Next? Syncing, Training, Serving

Conclusion: Activate Your BigQuery Data Powerhouse

Get up and running with one engineer in one sprint

Related Posts

Beyond Retrieval: Optimizing Relevance with Reranking

Vector Search Explained: How AI Powers Smarter Search and Recommendations

Activating ClickHouse Data for AI-Powered Personalization with Shaped