From Flexible Documents to Intelligent Experiences
MongoDB is a leading NoSQL document database, favoured for its flexibility, scalability, and ease of development. It's often used as the primary operational database for applications, storing rich user profiles, dynamic product catalogs, content metadata, and even interaction logs within its flexible document structure. While excellent for application development and storing diverse data, activating this data for sophisticated, real-time AI-driven personalization requires bridging the gap between your operational database and specialized machine learning platforms.
How do you leverage the nested user preferences stored in MongoDB documents to power truly personalized recommendations? How do you keep your AI models updated with the latest product attributes added to your MongoDB catalog collection? How do you train models on interaction data stored across potentially schemaless documents without complex data transformation pipelines? This is where Shaped's dedicated MongoDB connector provides a seamless solution.
Shaped is an AI-native relevance platform designed to connect directly to your MongoDB database, ingest data from specified collections, handle the conversion of BSON documents, train state-of-the-art models, and serve personalized search rankings and recommendations via simple APIs. This post explains the benefits of connecting MongoDB to Shaped and guides you through the integration process.
Why Connect MongoDB to Shaped? Leverage Your Operational Data
Connecting your MongoDB database directly to Shaped allows you to activate the rich, often real-time data stored within your application's core database for powerful personalization and analytics use cases:
- Power Recommendations with Rich Document Data: Utilize the detailed information stored in your MongoDB collections:
- Deep User Profile Personalization: Incorporate complex user attributes, preferences, or computed segments stored as nested fields within user documents.
- Dynamic Catalog Awareness: Sync detailed product or content metadata directly from your MongoDB catalog collection, ensuring recommendations reflect the latest attributes (even newly added flexible fields).
- Contextual Recommendations: Leverage session data or recent interactions stored in MongoDB to provide timely suggestions.
- "Similar Item" Discovery: Identify related items based on potentially complex attributes and relationships captured within MongoDB documents, combined with behavioral signals.
- Enhance Search with Operational Data: Improve search relevance by leveraging data straight from the source:
- Attribute-Based Filtering: Use up-to-date attributes synced from your MongoDB catalog for powerful filtering via Shaped's APIs.
- Personalize Ranking with Profile Data: Tailor search result order based on user profile information stored in MongoDB.
- Flexible Data Ingestion for AI: Easily feed data from MongoDB's flexible schema into structured AI models:
- Handle Schemaless Data: Shaped ingests the entire MongoDB document as JSON, allowing you to extract relevant fields for model training using flexible query functions (DuckDB JSON functions within Shaped), even if your schema evolves.
- Sync Interaction Logs: If user events are stored in MongoDB collections, sync them to train behavioral models.
- Simplified Data Pipeline: Avoid building and maintaining complex ETL jobs or CDC (Change Data Capture) systems just to get operational data from MongoDB into an ML platform. Shaped's connector handles the synchronization.
- Scheduled Syncing: Keep Shaped's models updated by periodically syncing data from MongoDB based on your chosen replication strategy (incremental or full collection).
How it Works: The MongoDB Connector
Shaped connects to your MongoDB instance using a standard MongoDB connection string containing read-only credentials you provide. You specify the database and collection to sync.
Shaped offers two modes for syncing data (replication_mode
):
INCREMENTAL
(Default): After an initial sync, Shaped periodically checks for new documents added to the collection. It identifies new documents based on the MongoDB document_id
(which generally increases over time) or a customreplication_key
field you specify (e.g.,created_at
). This is efficient for collections where data is primarily appended (like event logs).FULL_COLLECTION
: On each scheduled run, Shaped reads the entire specified collection from MongoDB. It then uses the_id
(or specifiedunique_keys
) to deduplicate records, effectively replacing the dataset in Shaped with the latest snapshot of the collection. This is suitable for catalog collections or user profiles where documents might be updated frequently in place.
Data Handling:
Since MongoDB is schemaless, Shaped ingests each BSON document, converts it into a JSON structure, and stores it within a primary document
column in the Shaped dataset. Additional metadata columns (_id
, internal timestamps, namespace) are also added. When building models or features within Shaped, you use powerful JSON extraction functions (based on DuckDB) to pull out specific nested fields from the document
column as needed.
Connecting MongoDB to Shaped

The setup involves creating a read-only user in MongoDB, ensuring network accessibility, and configuring the dataset in Shaped.
Step 1: Prepare MongoDB - Create Read-Only User & Allow Network Access
- Create Read-Only User: For security, create a dedicated MongoDB user with only read permissions on the specific database or collection Shaped needs to access.
- Connect to your MongoDB instance using an admin account.
- Switch to the target database (use
your_database_name
;). - Execute the
db.createUser
command to create a user with the read role on the database, or create a custom role granting onlyfind
action on the specific collection for tighter security (see docs example).
- Securely store the username (
shaped_readonly
) and password you created.
- IP Allowlisting: MongoDB instances (especially managed ones like Atlas) often restrict incoming connections based on IP addresses. You will need to contact the Shaped team to obtain the specific IP addresses used by the Shaped connector and add them to your MongoDB instance's network access list / IP allowlist.
Step 2: Configure the Shaped Dataset (YAML)
Define the MongoDB connection details and sync parameters in a Shaped dataset configuration file.
Create a YAML file (e.g., mongodb_dataset.yaml
):
Key Configuration Points:
mongodb_connection_string
: Ensure this is correctly formatted with the read-only credentials, host, port, and target database. URL-encode special characters in the password if necessary.collection & database
: Specify the exact source collection and database.replication_mode
:- Choose carefully based on your data characteristics.
INCREMENTAL
is efficient for append-only data like events.FULL_COLLECTION
is better for data that gets updated in place, like product catalogs or user profiles, ensuring Shaped always has the latest version. replication_key:
- Only needed for
INCREMENTAL
mode if you want to use a field other than_id
for tracking new documents.
Step 3: Create the Dataset in Shaped
Use the Shaped CLI to create the dataset using your YAML configuration:
Shaped will validate the configuration, attempt to connect to your MongoDB instance (ensure IP allowlisting is done!), and begin the data sync process based on your chosen replication_mode
. Monitor the status via the Shaped Dashboard or CLI (shaped view-dataset --dataset-name your_mongodb_dataset_name
).
What Happens Next? Ingesting Documents, Training Models

Once connected:
- Data Sync:
- Shaped connects to MongoDB on the
schedule_interval
(default: hourly) and performs either an incremental or full collection sync based onreplication_mode
. - JSON Conversion:
- BSON documents are converted to JSON and stored in the
document
column within Shaped. - Model Training:
- Shaped uses this synced data for training. When defining features for your models in Shaped, you'll use JSON extraction functions (e.g.,
json_extract_string(document, '$.path.to.field')
) to access specific fields within the JSON documents. - API Serving: Trained models power Shaped's APIs, serving personalized results derived from your MongoDB data.
Conclusion: Activate Your Operational MongoDB Data for AI
Your MongoDB database is likely a rich source of up-to-date operational data. Shaped's MongoDB connector provides a direct bridge to leverage this data for sophisticated AI personalization without complex ETL or impacting your application's primary database performance significantly. By securely connecting Shaped, you can transform flexible document data into powerful recommendation and search experiences, handling schema evolution gracefully and activating your core data assets for intelligent action.
Ready to power personalization with your MongoDB data?
Request a demo of Shaped today to see it in action with your specific use case. Or, start exploring immediately with our free trial sandbox.