Activating Your MongoDB Data for AI Personalization with Shaped

MongoDB is a flexible, developer-friendly database widely used for storing rich, evolving user and product data. But turning that operational data into real-time personalization has traditionally required complex ETL and custom ML pipelines. With Shaped’s native MongoDB connector, you can skip the heavy lifting. Shaped ingests schemaless MongoDB documents directly, transforms them for training, and serves personalized recommendations and search via simple APIs, bridging the gap between document stores and intelligent, AI-powered user experiences.

May 27, 2025

min read

Tullie Murrell

From Flexible Documents to Intelligent Experiences

MongoDB is a leading NoSQL document database, favoured for its flexibility, scalability, and ease of development. It's often used as the primary operational database for applications, storing rich user profiles, dynamic product catalogs, content metadata, and even interaction logs within its flexible document structure. While excellent for application development and storing diverse data, activating this data for sophisticated, real-time AI-driven personalization requires bridging the gap between your operational database and specialized machine learning platforms.

How do you leverage the nested user preferences stored in MongoDB documents to power truly personalized recommendations? How do you keep your AI models updated with the latest product attributes added to your MongoDB catalog collection? How do you train models on interaction data stored across potentially schemaless documents without complex data transformation pipelines? This is where Shaped's dedicated MongoDB connector provides a seamless solution.

Shaped is an AI-native relevance platform designed to connect directly to your MongoDB database, ingest data from specified collections, handle the conversion of BSON documents, train state-of-the-art models, and serve personalized search rankings and recommendations via simple APIs. This post explains the benefits of connecting MongoDB to Shaped and guides you through the integration process.

Why Connect MongoDB to Shaped? Leverage Your Operational Data

Connecting your MongoDB database directly to Shaped allows you to activate the rich, often real-time data stored within your application's core database for powerful personalization and analytics use cases:

Power Recommendations with Rich Document Data: Utilize the detailed information stored in your MongoDB collections:
- Deep User Profile Personalization: Incorporate complex user attributes, preferences, or computed segments stored as nested fields within user documents.
- Dynamic Catalog Awareness: Sync detailed product or content metadata directly from your MongoDB catalog collection, ensuring recommendations reflect the latest attributes (even newly added flexible fields).
- Contextual Recommendations: Leverage session data or recent interactions stored in MongoDB to provide timely suggestions.
- "Similar Item" Discovery: Identify related items based on potentially complex attributes and relationships captured within MongoDB documents, combined with behavioral signals.
Enhance Search with Operational Data: Improve search relevance by leveraging data straight from the source:
- Attribute-Based Filtering: Use up-to-date attributes synced from your MongoDB catalog for powerful filtering via Shaped's APIs.
- Personalize Ranking with Profile Data: Tailor search result order based on user profile information stored in MongoDB.
Flexible Data Ingestion for AI: Easily feed data from MongoDB's flexible schema into structured AI models:
- Handle Schemaless Data: Shaped ingests the entire MongoDB document as JSON, allowing you to extract relevant fields for model training using flexible query functions (DuckDB JSON functions within Shaped), even if your schema evolves.
- Sync Interaction Logs: If user events are stored in MongoDB collections, sync them to train behavioral models.
Simplified Data Pipeline: Avoid building and maintaining complex ETL jobs or CDC (Change Data Capture) systems just to get operational data from MongoDB into an ML platform. Shaped's connector handles the synchronization.
Scheduled Syncing: Keep Shaped's models updated by periodically syncing data from MongoDB based on your chosen replication strategy (incremental or full collection).

How it Works: The MongoDB Connector

Shaped connects to your MongoDB instance using a standard MongoDB connection string containing read-only credentials you provide. You specify the database and collection to sync.

Shaped offers two modes for syncing data (replication_mode):

INCREMENTAL (Default): After an initial sync, Shaped periodically checks for new documents added to the collection. It identifies new documents based on the MongoDB document _id (which generally increases over time) or a custom replication_key field you specify (e.g., created_at). This is efficient for collections where data is primarily appended (like event logs).
FULL_COLLECTION: On each scheduled run, Shaped reads the entire specified collection from MongoDB. It then uses the _id (or specified unique_keys) to deduplicate records, effectively replacing the dataset in Shaped with the latest snapshot of the collection. This is suitable for catalog collections or user profiles where documents might be updated frequently in place.

Data Handling:

Since MongoDB is schemaless, Shaped ingests each BSON document, converts it into a JSON structure, and stores it within a primary document column in the Shaped dataset. Additional metadata columns (_id, internal timestamps, namespace) are also added. When building models or features within Shaped, you use powerful JSON extraction functions (based on DuckDB) to pull out specific nested fields from the document column as needed.

Connecting MongoDB to Shaped

The setup involves creating a read-only user in MongoDB, ensuring network accessibility, and configuring the dataset in Shaped.

Step 1: Prepare MongoDB - Create Read-Only User & Allow Network Access

Create Read-Only User: For security, create a dedicated MongoDB user with only read permissions on the specific database or collection Shaped needs to access.
- Connect to your MongoDB instance using an admin account.
- Switch to the target database (use your_database_name;).
- Execute the db.createUser command to create a user with the read role on the database, or create a custom role granting only find action on the specific collection for tighter security (see docs example).

    create_mongo_user.js
    
// Example: Read access to a specific database

db.createUser({
  user: "shaped_readonly",
  pwd: "YOUR_SECURE_PASSWORD_HERE",
  roles: [{ role: "read", db: "your_database_name" }]
});

Securely store the username (shaped_readonly) and password you created.

IP Allowlisting: MongoDB instances (especially managed ones like Atlas) often restrict incoming connections based on IP addresses. You will need to contact the Shaped team to obtain the specific IP addresses used by the Shaped connector and add them to your MongoDB instance's network access list / IP allowlist.

Step 2: Configure the Shaped Dataset (YAML)

Define the MongoDB connection details and sync parameters in a Shaped dataset configuration file.

Create a YAML file (e.g., mongodb_dataset.yaml):

    mongodb_dataset.yaml
    
  

    
name: your_mongodb_dataset_name # Choose a descriptive name

# --- Required Fields ---

schema_type: MONGODB # Specifies the connector type

# Standard MongoDB connection string including read-only user/password,
# host, port (default 27017), and database.
# Ensure special characters in password are properly URL-encoded if needed.
mongodb_connection_string: "mongodb://shaped_readonly:YOUR_SECURE_PASSWORD_HERE@your_mongo_host.com:27017/your_database_name"
collection: your_collection_name # The specific MongoDB collection to sync from
database: your_database_name # The database containing the collection

# --- Optional Fields ---

# Sync data starting from this date (YYYY-MM-DD). Uses the _id or replication_key
# to filter during the initial sync.
# start_date: "2024-01-01"

# Field to use for incremental syncs (instead of _id). Must be a field
# that strictly increases over time (e.g., a creation timestamp).
# replication_key: "created_at"

# Sync mode: INCREMENTAL (default, syncs new docs based on _id/replication_key)
# or FULL_COLLECTION (syncs entire collection each time, deduplicates on _id/unique_keys).
# replication_mode: FULL_COLLECTION # Use for catalogs/profiles that update in place

# Schedule for periodic syncs (Cron format, e.g., "@hourly", "@daily").
# Defaults to "@hourly" if omitted.
# schedule_interval: "@daily"

# description: "Product catalog data from MongoDB"
    
  

Key Configuration Points:

mongodb_connection_string: Ensure this is correctly formatted with the read-only credentials, host, port, and target database. URL-encode special characters in the password if necessary.
collection & database: Specify the exact source collection and database.
replication_mode:
Choose carefully based on your data characteristics. INCREMENTAL is efficient for append-only data like events. FULL_COLLECTION is better for data that gets updated in place, like product catalogs or user profiles, ensuring Shaped always has the latest version.
replication_key:
Only needed for INCREMENTAL mode if you want to use a field other than _id for tracking new documents.

Step 3: Create the Dataset in Shaped

Use the Shaped CLI to create the dataset using your YAML configuration:

    create-mongodb-dataset.sh
    
1 shaped create-dataset --file mongodb_dataset.yaml

Shaped will validate the configuration, attempt to connect to your MongoDB instance (ensure IP allowlisting is done!), and begin the data sync process based on your chosen replication_mode. Monitor the status via the Shaped Dashboard or CLI (shaped view-dataset --dataset-name your_mongodb_dataset_name).

What Happens Next? Ingesting Documents, Training Models

Once connected:

Data Sync:
Shaped connects to MongoDB on the schedule_interval (default: hourly) and performs either an incremental or full collection sync based on replication_mode.
JSON Conversion:
BSON documents are converted to JSON and stored in the document column within Shaped.
Model Training:
Shaped uses this synced data for training. When defining features for your models in Shaped, you'll use JSON extraction functions (e.g., json_extract_string(document, '$.path.to.field')) to access specific fields within the JSON documents.
API Serving: Trained models power Shaped's APIs, serving personalized results derived from your MongoDB data.

Conclusion: Activate Your Operational MongoDB Data for AI

Your MongoDB database is likely a rich source of up-to-date operational data. Shaped's MongoDB connector provides a direct bridge to leverage this data for sophisticated AI personalization without complex ETL or impacting your application's primary database performance significantly. By securely connecting Shaped, you can transform flexible document data into powerful recommendation and search experiences, handling schema evolution gracefully and activating your core data assets for intelligent action.

Ready to power personalization with your MongoDB data?

Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.

Activating Your MongoDB Data for AI Personalization with Shaped

From Flexible Documents to Intelligent Experiences

Why Connect MongoDB to Shaped? Leverage Your Operational Data

How it Works: The MongoDB Connector

Connecting MongoDB to Shaped

Step 1: Prepare MongoDB - Create Read-Only User & Allow Network Access

Step 2: Configure the Shaped Dataset (YAML)

Step 3: Create the Dataset in Shaped

What Happens Next? Ingesting Documents, Training Models

Conclusion: Activate Your Operational MongoDB Data for AI

Get up and running with one engineer in one sprint

Related Posts

Not your average RecSys metrics. Part 1: Serendipity

MovieLens to Production in Minutes

Cosine Similarity: Not the Silver Bullet We Thought It Was