Activating ClickHouse Data for AI-Powered Personalization with Shaped

ClickHouse is unmatched when it comes to high-performance analytics at scale, trusted for powering fast, flexible querying across billions of events. But what if the same data fueling dashboards could also drive real-time personalization? That’s where Shaped comes in. By connecting ClickHouse to Shaped, teams can transform rich historical interaction data into intelligent recommendations, personalized search, and predictive insights, without straining their analytics stack. This post explores how ClickHouse and Shaped together unlock the full potential of your event data.

May 21, 2025

min read

Tullie Murrell

Leveraging Your High-Performance Analytics Database for Real-Time Relevance

ClickHouse has gained immense popularity as an open-source, column-oriented database management system renowned for its exceptional speed in processing large volumes of analytical queries (OLAP). Organizations often leverage ClickHouse to store and analyze massive event streams, logs, time-series data, and user interaction histories due to its raw performance. While invaluable for analytics, the next step is often activating this rich data to drive intelligent, personalized experiences in real-time.

How do you transform the billions of user events stored efficiently in ClickHouse into accurate predictions of what a user will engage with next? How do you personalize search results or recommendation carousels based on patterns hidden within that vast dataset without overloading your ClickHouse cluster with complex, per-user queries? This is where Shaped's dedicated ClickHouse connector provides a powerful and efficient solution.

Shaped is an AI-native relevance platform designed to securely connect to data sources like ClickHouse, ingest relevant data, train state-of-the-art machine learning models, and serve personalized search rankings and recommendations via simple APIs. This post explains the benefits of using your ClickHouse data with Shaped and guides you through the straightforward integration process.

Why Connect ClickHouse to Shaped? From Fast Analytics to Smart Actions

Connecting your ClickHouse database to Shaped allows you to bridge the gap between high-speed analytics and sophisticated AI-driven personalization, enabling powerful use cases:

Deeply Personalized Recommendations: Utilize the extensive historical interaction data often stored in ClickHouse:
- Long-Term Behavior Modeling: Train models on vast event streams in ClickHouse to understand user preferences over extended periods.
- Leverage Detailed Event Properties: Incorporate rich attributes stored alongside events in ClickHouse (e.g., device type, location, specific interaction details) into personalization models.
- Fast-Updating Catalog Awareness: If item metadata is stored or updated frequently in ClickHouse, Shaped can sync it to ensure recommendations reflect the latest catalog state.
- "Similar Item" Discovery: Identify related items based on behavioral patterns learned from large-scale ClickHouse interaction logs.
Enhanced Search Personalization: Improve search relevance using insights derived from ClickHouse data:
- Behaviorally-Informed Ranking: Train search ranking models using historical engagement metrics (clicks, conversions) stored in ClickHouse.
- Personalize Based on Historical Activity: Tailor search results based on a user's long-term interaction patterns captured in ClickHouse event tables.
Advanced Analytics & Model Insights: Apply sophisticated ML to your ClickHouse data for deeper understanding:
- Complex Journey Analysis: Model intricate user paths and predict future actions based on patterns learned from large ClickHouse datasets.
- User/Item Embedding Generation: Create powerful vector representations from ClickHouse data for cohort analysis, anomaly detection, or downstream ML tasks.
- Offline Performance Simulation: Use historical data slices from ClickHouse to evaluate potential personalization strategies.
Efficient Data Synchronization: Shaped's connector directly syncs data from ClickHouse based on a replication key, avoiding the need to build and maintain complex ETL pipelines specifically for personalization ML.
Optimized Resource Usage: Offload the computationally intensive task of training and serving complex ML models from your ClickHouse cluster to Shaped's specialized infrastructure.

How it Works: The ClickHouse Dataset Connector

Shaped connects to your ClickHouse instance using dedicated read-only credentials you provide. You configure which table and columns Shaped should sync. Shaped then periodically queries your ClickHouse table, using a specified replication_key (like a timestamp or an auto-incrementing ID) to efficiently fetch only new or updated rows after the initial data load. This keeps the data in Shaped fresh without requiring constant streaming or heavy querying of your ClickHouse database.

Connecting ClickHouse to Shaped

The setup involves creating a read-only user in ClickHouse and then configuring the dataset connection within Shaped.

Step 1: Prepare ClickHouse - Create a Read-Only User

For security best practices, Shaped requires a dedicated ClickHouse user with only the necessary read permissions.

Connect to your ClickHouse instance using a client or CLI with administrative privileges.

Execute the following SQL commands, replacing database_name.* or database_name.table_name with the actual database and table(s) Shaped needs access to, and choosing a strong password:

-- 1. Create a new user with a secure password

    create_read_only_user.sql
    
  

    
-- 1. Create a new user with a secure password
CREATE USER shaped_read_only IDENTIFIED BY 'YOUR_SECURE_PASSWORD_HERE!';

-- 2. Grant SELECT privileges on the specific database(s) and table(s)

-- Option A: Grant access to all tables in a database
GRANT SELECT ON your_database_name.* TO shaped_read_only;

-- Option B: Grant access only to specific tables (Recommended for stricter security)
-- GRANT SELECT ON your_database_name.your_events_table TO shaped_read_only;
-- GRANT SELECT ON your_database_name.your_items_table TO shaped_read_only;
    
  

Securely store the username (shaped_read_only in this example) and the password you created. You will need them for the Shaped configuration.

Step 2: Configure the Shaped Dataset (YAML)

Define the connection details and data synchronization parameters in a YAML configuration file.

Create a YAML file (e.g., clickhouse_dataset.yaml):

    clickhouse_dataset.yaml
    
  

    
name: your_clickhouse_dataset_name # Choose a descriptive name

# --- Required Fields ---
schema_type: CLICKHOUSE # Specifies the connector type
table: your_table_name # The specific table in ClickHouse to sync from
user: shaped_read_only # The read-only username created in Step 1
password: YOUR_SECURE_PASSWORD_HERE! # The password for the read-only user
host: your.clickhouse.host.com # Hostname or IP address of your ClickHouse server
port: 9440 # Port ClickHouse is listening on (e.g., 8443 for HTTPS, 8123 for HTTP, 9440 for secure native)

# The column Shaped uses to track changes for incremental syncs.
# MUST be a column that reliably increases over time (e.g., event timestamp,
# auto-incrementing ID, updated_at timestamp).
replication_key: event_timestamp # Or created_at, id, updated_at etc.

# --- Optional Fields ---
database: your_database_name # The database containing the table (if not default)

# List specific columns to sync. If omitted, Shaped syncs all columns.
# columns: ["user_id", "item_id", "timestamp", "event_type", "some_property"]
# Columns uniquely identifying a row (for deduplication based on replication_key).
# unique_keys: ["event_id"]
# Schedule for periodic syncs (Cron format, e.g., "@hourly", "@daily", "*/15 * * * *").
# Defaults to "@hourly" if omitted.
# schedule_interval: "@hourly"
# description: "User interaction events from ClickHouse"
    
  

Key Configuration Points:

Credentials:
Ensure user and password match the read-only user created in ClickHouse.
host & port: Provide the correct connection details for your ClickHouse instance.
replication_key: This is critical for efficient syncing after the initial load. Choose a column that guarantees new/updated records will have a greater value than previous records (timestamps or increasing IDs work well).
columns (Optional): Selecting only necessary columns improves sync efficiency and reduces data transfer.

Step 3: Create the Dataset in Shaped

Use the Shaped CLI to create the dataset from your YAML configuration file:

    create_clickhouse_dataset.sh
    
1 shaped create-dataset --file clickhouse_dataset.yaml

Shaped will validate the configuration and credentials, then attempt to connect to your ClickHouse database. You can monitor the initial sync progress and ongoing status via the Shaped Dashboard or CLI (shaped view-dataset --dataset-name your_clickhouse_dataset_name).

What Happens Next? Syncing, Training, Personalizing

Once the connection is active:

Initial Sync:
Shaped performs a full sync of the specified table based on your configuration (respecting columns, etc.).
Incremental Syncs:
Based on the schedule_interval (defaulting to hourly), Shaped queries ClickHouse for rows where the replication_key is greater than the maximum value seen in the previous sync, efficiently fetching only new data.
Model Training: Shaped uses the synced data to train its advanced AI models for personalization. Training can be scheduled within Shaped.
API Serving: After models are trained, Shaped's APIs are ready to serve personalized search rankings, recommendations, or analytics embeddings derived from your ClickHouse data.
Continuous Updates: Scheduled syncs and model retraining keep the personalization fresh based on the latest data available in your ClickHouse instance.

Conclusion: Bridge High-Speed Analytics with AI-Powered Relevance

Your ClickHouse database is a powerhouse for storing and querying vast amounts of data at speed. By connecting it to Shaped, you can effectively leverage this valuable asset to fuel state-of-the-art AI personalization without overburdening your ClickHouse cluster or investing heavily in building custom ML infrastructure. Shaped provides the specialized AI layer, enabling you to transform ClickHouse data into dynamic, engaging user experiences efficiently.

Ready to activate your ClickHouse data for intelligent recommendations and search?

Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.

Activating ClickHouse Data for AI-Powered Personalization with Shaped

Leveraging Your High-Performance Analytics Database for Real-Time Relevance

Why Connect ClickHouse to Shaped? From Fast Analytics to Smart Actions

How it Works: The ClickHouse Dataset Connector

Connecting ClickHouse to Shaped

Step 1: Prepare ClickHouse - Create a Read-Only User

Step 2: Configure the Shaped Dataset (YAML)

Step 3: Create the Dataset in Shaped

What Happens Next? Syncing, Training, Personalizing

Conclusion: Bridge High-Speed Analytics with AI-Powered Relevance

Get up and running with one engineer in one sprint

Related Posts

Optimizing Video Recommendation Systems: A Deep Dive into Tweedie Regression for Predicting Watch Time (Tubi Case Study)

Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns

Vector Search — Lucene is All You Need