Snowplow has set the standard for collecting high-fidelity, granular behavioral data. With their enterprise-grade Behavioral Data Platform (BDP), Snowplow empowers businesses to capture rich, event-level details about user interactions across all platforms with unparalleled flexibility and control over their data pipeline and schema. This granular data is a goldmine for understanding how and why users engage, but unlocking its full potential means activating it – transforming the raw event stream into intelligent, real-time personalization.
How do you leverage the detailed session context, custom entities, and fine-grained event data captured by Snowplow to instantly tailor search results? How do you recommend the next best action or product based on the subtle nuances revealed in the event stream? This is where integrating Snowplow's rich data pipeline with Shaped's AI capabilities creates immense value.
Shaped is an AI-native relevance platform built to ingest real-time event streams, like those generated by Snowplow, and use cutting-edge machine learning to understand user behavior, predict intent, and deliver personalized search, recommendations, and analytics through developer-friendly APIs. This post outlines the benefits of connecting Snowplow to Shaped and provides a guide for integration using AWS Kinesis.
Why Connect Snowplow to Shaped? Driving Value from Granular Events
Connecting Snowplow's detailed behavioral data stream to Shaped's AI engine enables you to move beyond data collection to intelligent, real-time action. It allows you to leverage the richness and flexibility of your Snowplow pipeline to power sophisticated personalization use cases:
- Highly Contextual Recommendations: Utilize the fine-grained event data and custom contexts from Snowplow to deliver deeply relevant recommendations:
- Session-Aware Feeds: Generate dynamic "For You" feeds that adapt based on a user's immediate actions and context within the current session, as captured by Snowplow events.
- Behaviorally Driven Product Suggestions: Recommend items based on complex interaction patterns learned from the granular Snowplow data, going beyond simple co-views or co-purchases.
- "Truly Similar" Item Discovery: Identify items related through nuanced behavioral signals captured in the event stream, not just metadata similarity.
- Next Best Action/Content Prediction: Leverage detailed journey data to predict the most relevant next step for a user – be it an article, video, product, or feature engagement.
- Intelligent & Personalized Search: Enhance search functionality by incorporating the depth of Snowplow behavioral data:
- Personalized Search Ranking: Re-rank search results based on individual user behavior, session context, and inferred intent derived from Snowplow events.
- Deep Intent Understanding: Use models trained on granular event sequences to better understand the underlying goal behind search queries.
- Behaviorally Boosted Search: Improve the relevance of keyword or vector search by incorporating signals learned from Snowplow interaction data.
- Advanced Behavioral Analytics: Gain deeper insights by applying Shaped's ML models to your rich Snowplow data:
- Complex User Journey Modeling: Analyze intricate user paths and predict future sequences based on patterns learned from detailed Snowplow event streams.
- Granular User & Item Embeddings: Generate powerful vector representations reflecting nuanced behaviors captured by Snowplow, useful for segmentation, analysis, and understanding relationships.
- Personalization Impact Measurement: Quantify the effectiveness of AI-driven personalization fueled by your Snowplow data pipeline.
- Explainable AI: Understand model predictions based on the rich, granular features derived from Snowplow events.
- Real-Time Adaptability: Shaped models continuously learn from the live Snowplow event stream via Kinesis, ensuring personalization adapts instantly to the latest interactions and contextual shifts.
- Leverage Your Data Asset: Activate the investment made in your Snowplow pipeline by directly using its high-fidelity data to power customer-facing AI features.
- Simplified ML Infrastructure: Avoid the complexity of building and maintaining bespoke ML systems to process and model Snowplow data for relevance; Shaped provides the managed AI layer.
How it Works: Snowplow -> Kinesis -> Shaped

The integration relies on AWS Kinesis Data Streams as a robust, scalable intermediary. Snowplow events are forwarded to a Kinesis stream managed by Shaped, from which Shaped securely ingests the data in real-time to train its AI models.
Snowplow offers two main ways to achieve this forwarding:
- Snowbridge: Snowplow's native tooling designed to forward enriched events to various destinations, including Kinesis, often requiring minimal configuration. This is generally the recommended approach for simplicity.
- Custom Kinesis Forwarding: For users with specific needs or existing custom applications reading from Snowplow's enriched stream, a custom Kinesis Client Library (KCL) application can be used to forward relevant events to the Shaped Kinesis stream.
Connecting Snowplow to Shaped via AWS Kinesis
Here’s the step-by-step process:
Step 1: Create the Shaped Dataset
First, you need to define and create a dataset within Shaped configured to receive real-time Snowplow events via Kinesis.
Create a YAML configuration file (e.g., snowplow_events.yaml
). Since Snowplow schemas are highly customizable, you'll use the CUSTOM
schema type and need to define the structure based on your specific Snowplow enriched event format.
Important Notes on column_schema
:
- This schema defines the structure Shaped expects in the Kinesis stream.
- It should reflect the key fields from your Snowplow enriched event data that are necessary for training personalization models (user/item IDs, event type, timestamp, relevant context).
- You'll need to ensure your Snowplow forwarding mechanism (Snowbridge or custom app) sends events matching this defined structure. Consulting with the Shaped team is recommended to design an optimal schema.
Use the Shaped CLI to create the dataset:
Monitor the dataset status on the Shaped Dashboard or via shaped list-datasets. It will transition from provisioning to ACTIVE. When deploy_realtime is true, Shaped automatically provisions the necessary AWS Kinesis stream and an IAM role within its own AWS account.
Step 2: Retrieve Shaped Kinesis Details
Once the dataset is ACTIVE, retrieve the Kinesis stream name and the IAM Role ARN provisioned by Shaped. These are required for configuring Snowplow to forward data.
Use the Shaped CLI:
The output will include kinesis_stream_arn
and kinesis_iam_role_arn
(values will be unique):
You need these two values for Snowplow:
- Stream Name:
- Extract the name from the
kinesis_stream_arn
(e.g.,ShapedDatasetStream-xyz789
). - Full IAM Role ARN:
- The value of
kinesis_iam_role_arn
(e.g.,arn:aws:iam::11111111111:role/ShapedDatasetAccessRole-xyz789
).
Step 3: Configure Snowplow Event Forwarding
Now, configure your Snowplow pipeline to send enriched events to the Shaped Kinesis stream.
Option A: Using Snowbridge (Recommended for Simplicity)
- Refer to the Snowplow documentation for configuring Snowbridge destinations.
- You will typically need to provide:
- The target Kinesis Stream Name obtained in Step 2.
- The IAM Role ARN obtained in Step 2 (which Snowbridge will assume to write to the stream).
- The AWS Region of the Kinesis stream (Shaped typically provisions these in
us-east-2
, but confirm if necessary).
- Ensure Snowbridge is configured to forward events in the format matching the
column_schema
you defined in Shaped. You might need to configure transformations or filters within Snowplow/Snowbridge.
Option B: Using Custom Kinesis Forwarding
- If you have a custom application (e.g., using KCL) reading from your primary Snowplow enriched stream, modify it to:
- Filter/select the relevant events and fields needed by Shaped.
- Transform the events to match the
column_schema
defined in Shaped. - Use the AWS SDK to write these transformed records to the target Kinesis Stream Name obtained in Step 2.
- Ensure the application has permissions to assume the IAM Role ARN obtained in Step 2 to write to the Shaped stream.
Important: Regardless of the method, ensure only the necessary fields defined in your column_schema
are sent, and they match the expected data types.
What Happens Next? Fueling AI with Granular Data
Once the connection is live and data is flowing:
- Real-Time Ingestion: Shaped securely ingests the granular event stream from Snowplow via Kinesis.
- AI Model Training: Shaped automatically trains its state-of-the-art ML models on this rich behavioral data, learning intricate user patterns, session dynamics, and item relationships specific to your business.
- Personalization APIs: After models are trained, Shaped's APIs provide real-time personalized rankings, recommendations, and embeddings – all powered by your detailed Snowplow data.
- Continuous Adaptation: Models constantly update based on the incoming event stream, keeping personalization relevant and adaptive to the latest user behavior.
Conclusion: Activate Your Granular Data for Intelligent Experiences
Integrating Snowplow with Shaped bridges the gap between collecting highly detailed behavioral data and activating it for real-time AI-driven personalization. By following the Kinesis integration steps, you can leverage the granularity and flexibility of your Snowplow pipeline to power truly sophisticated recommendations, hyper-personalized search results, and deeper behavioral analytics. Stop letting valuable event-level insights sit untapped – activate your Snowplow data with Shaped to create adaptive, intelligent, and highly engaging customer experiences.
Ready to unlock the full potential of your Snowplow data with AI?
Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.