Quick Answer: How to Prevent Repetitive Agent Recommendations
Building AI agents that never repeat the same recommendations requires implementing stateful filtering at the retrieval layer. Here's what you need to know:
- Use prebuilt('exclude_seen') filters to automatically filter out items users have already interacted with
- Track interactions in real-time by storing user-item engagement in an interaction table
- Apply filters at the WHERE clause after retrieval but before scoring to minimize computation
- Configure personal filters once in your engine config, then reference them in any query
- Achieve sub-50ms filtering on millions of items without manual state management
- Avoid the trap of client-side deduplication which creates race conditions and scales poorly
Nothing kills an agent's UX faster than watching it recommend the same three articles, products, or restaurants you've already seen. Users lose trust immediately when an AI assistant ignores their history and serves up content they've already engaged with. Yet most retrieval systems handle this poorly—either ignoring the problem entirely or implementing fragile client-side solutions that break under load.
The $800K Problem: When Agents Forget What Users Have Seen
A leading travel booking platform discovered their AI concierge was costing them $800,000 per month in lost revenue. The culprit? Their recommendation engine kept suggesting hotels users had already viewed, declined, or booked. Users would ask "What are some good hotels in Paris?" three times in a conversation, and each time the agent would confidently recommend the same five properties.
The engineering team tried multiple fixes. First, they implemented client-side deduplication—tracking seen items in the session state and filtering them out before displaying results. This worked in development but created race conditions in production when users opened multiple tabs or switched devices. Then they attempted a Redis cache with 24-hour TTLs to track viewed items, which added 200ms of latency to every query and still missed interactions that happened in the last few seconds.
The real issue ran deeper than implementation details. Their architecture fundamentally separated retrieval from state management. The vector search engine had no concept of user history. The recommendation model knew nothing about what had already been shown. The application layer tried to bridge this gap, but by then it was too late—the damage was done in wasted compute, poor rankings, and frustrated users.
This pattern repeats across industries. Content platforms re-suggest articles users read yesterday. E-commerce sites show purchased items in "recommended for you" feeds. Restaurant apps propose venues where users already have reservations. Each repetition erodes trust and wastes valuable recommendation slots that could introduce genuinely new options.
The Traditional Approach: Why Client-Side Filtering Fails
Most engineering teams start with the obvious solution: filter out seen items in the application layer after retrieval. The logic seems sound—pull a large set of candidates from the retrieval system, then remove items the user has already interacted with before displaying results.
Here's what that typically looks like:
This approach has five critical failures that become apparent under production load:
1. Wasted Compute and Cost. You're embedding the query, searching the vector index, computing similarity scores, and retrieving metadata for potentially hundreds of items you'll immediately discard. If 60% of candidates are already seen (common for engaged users), you're burning 60% of your compute budget on items you'll never show. At scale, this means paying for 160,000 unnecessary vector searches per day instead of 100,000 useful ones.
2. Inconsistent Ranking Quality. Retrieving top 100 candidates, then filtering 60 seen items, means you're actually ranking from position 41-100 in the true similarity ranking. You've lost your top 40 most relevant items. The user gets recommendations from the second tier of quality, not the first. Your carefully tuned ranking model optimizes for the wrong set.
3. Race Conditions and State Sync. User opens three tabs, each querying recommendations. Tab 1 gets items A, B, C. Tab 2 queries before Tab 1 updates the seen set, also gets A, B, C. Tab 3 now sees A, B, C again. Multi-device usage compounds this. The same user on mobile and desktop sees duplicates because state synchronization has ~2-5 second lag.
4. Database Pressure from State Lookups. Every query hits your state store (Redis, PostgreSQL, etc.) twice: once to fetch the seen set before filtering, once to update it after showing results. For a system serving 10,000 queries per second, that's 20,000 database operations per second just for deduplication. Your state store becomes the bottleneck, not your ML model.
5. Complex Application Logic. Your application code now handles state management, deduplication, re-ranking after filtering, and seen-set updates. This complexity leaks across services. Your recommendation API needs to know about state. Your web frontend needs to track interactions. Your mobile app implements a different version. Testing becomes a nightmare because behavior depends on temporal state.
Here's what the architecture looks like with traditional client-side filtering:
Every request requires four network hops, two database queries, and filtering logic spread across three services. Latency ranges from 150-300ms depending on cache hit rates. Error handling is complex because failures can occur at each step. And the whole system gets more fragile as the seen set grows—users with thousands of interactions take longer to filter than new users.
The fundamental mistake is treating retrieval and state as separate concerns. Your retrieval system should understand user history natively, filtering candidates before expensive scoring operations. This isn't just about performance—it's about architectural coherence. When retrieval knows what's been seen, it can make better decisions about what to retrieve in the first place.
The Shaped Edge: Stateful Retrieval with Prebuilt Filters
Shaped takes a different approach: state-aware retrieval where the engine knows about user history and applies filters at the optimal point in the pipeline. Instead of fetching candidates then filtering in application code, you declare filters once in your engine configuration and reference them in queries.
Here's the complete architecture with Shaped's prebuilt filters:
Step 1: Define a Personal Filter in Engine Config
First, you configure the filter in your engine YAML. This tells Shaped what data constitutes "seen" items and how to match them to users.
This configuration creates a personal filter dataset that maps users to items they've interacted with. The personal_filter type tells Shaped to maintain a per-user index of seen items. The filter automatically updates as new interactions arrive in the user_interactions table.
Step 2: Apply the Filter in Queries
Now you can reference this filter in any query using prebuilt():
The filter runs after retrieval but before scoring, removing seen items from the candidate set before expensive model inference. This is the optimal placement in the pipeline—you only score items the user hasn't seen.
Step 3: Track Interactions in Real-Time
Shaped automatically ingests new interactions from your interaction table. When a user clicks, views, or purchases an item, that interaction flows into the filter dataset without manual synchronization:
How This Changes the Architecture
Compare the data flow:
The filter runs in the same system that handles retrieval and scoring. No state synchronization across services. No race conditions. No manual cache invalidation. The engine maintains the seen set as a materialized view that updates in real-time as interactions stream in.
Why This Works: The Four-Stage Pipeline with Optimal Filter Placement
Understanding where filtering happens in the retrieval pipeline explains why prebuilt filters outperform client-side approaches. Modern ranking systems operate in four stages:
Stage 1: Retrieval pulls candidate items from indexes. This is cheap (vector similarity, lexical search) but imprecise—you retrieve 50-1000 candidates knowing most won't be perfect.
Stage 2: Filtering removes unwanted items based on business rules. This is where prebuilt('exclude_seen') runs. Filtering after retrieval but before scoring saves compute on items you'll discard anyway.
Stage 3: Scoring applies expensive ML models to predict engagement, conversion, or other objectives. This is your most computationally costly step—you want to score only items that passed filtering.
Stage 4: Reordering applies diversity and exploration to avoid echo chambers and surface variety.
Here's where traditional client-side filtering gets the order wrong:
By filtering before scoring, you avoid running ML inference on items you'll immediately discard. For a model with 50ms inference time, filtering 60 items saves 3,000ms of compute per query. At 10,000 queries per second, that's 8.3 hours of compute saved every second.
The placement matters for ranking quality too. Filtering after retrieval means you're working with the full similarity-ranked candidate set. If you retrieve top 100 by similarity, then filter, you're selecting from genuinely relevant items—not from the second tier (positions 41-100) like you'd get with over-fetch-then-filter.
Implementation Guide: Building Exclusion Filters Step by Step
Let's build a complete recommendation system with exclude_seen filtering, walking through each component.
Setting Up Your Data Schema
Start by defining your item table and interaction table. The interaction table tracks every user engagement:
Configuring the Engine with Filters
Now create your engine configuration with the exclude_seen filter:
This configuration creates:
- A content embedding for semantic search over product text
- A collaborative filtering embedding (ALS) based on user-item interactions
- A click-through rate prediction model
- An exclude_seen filter that looks at the last 90 days of interactions
Building Queries with Exclusion
With the filter configured, you can now build queries that automatically exclude seen items:
from shaped import RankQueryBuilder, Similarity, ColumnOrder
Handling Edge Cases
New users with no history: The filter gracefully handles users with zero interactions—it simply doesn't filter anything, returning pure ranked results.
Combining multiple filters: You can apply multiple prebuilt filters in a single query:
Filter-specific interaction types: Different recommendation surfaces might exclude different interaction types. A "browse again" widget might exclude purchases but allow views, while a main feed excludes everything:
Advanced Patterns: Beyond Basic Exclusion
Once you have basic exclusion working, several advanced patterns become possible.
Pattern 1: Category-Aware Exclusion for Diversity
Instead of excluding all seen items, exclude items from categories the user has already explored recently. This maintains diversity while avoiding exact duplicates:
Pattern 2: Time-Decayed Exclusion
Instead of hard filtering, apply a penalty to recently seen items using value models. Items seen yesterday get a large penalty, items seen 30 days ago get a small penalty:
This requires adding a view that computes last_seen_days:
Pattern 3: Collaborative Exclusion
Exclude items that similar users have already engaged with extensively. This is useful for discovery feeds where you want to surface items that your peer group hasn't saturated:
Pattern 4: Session-Aware Filtering
For conversational agents, exclude items mentioned earlier in the current conversation without excluding items from previous sessions:
Pattern 5: Progressive Exploration
Combine exclusion with exploration to progressively expand the user's horizon. Start with close-to-history recommendations, then gradually introduce more diverse items:
Hot Take: Client-Side Deduplication Is an Anti-Pattern
Here's a controversial opinion: if you're filtering seen items in your application code, you're doing it wrong. Full stop.
The industry has normalized this pattern. Nearly every recommendation tutorial shows filtering in the app layer. Major platforms run variations of this approach at scale. But that doesn't make it right—it makes it cargo-culted technical debt.
Filtering belongs in the retrieval system, not the application layer. Here's why this isn't just an optimization, it's an architectural imperative:
Retrieval systems should be state-aware by design. Just like databases maintain indexes and constraints, retrieval engines should maintain user state and apply it during candidate selection. Offloading this to the application is like offloading SQL WHERE clauses to application code—technically possible, but architecturally wrong.
State and ranking are coupled, not separate. You can't rank effectively without knowing what the user has seen. A model that predicts "click-through rate" needs to know if the item is novel or repeated. Separating state from ranking means your model optimizes for the wrong objective.
Scaling requires pushing logic down the stack. As query volume grows, you need systems that can apply filters efficiently at the data layer. Pulling millions of candidates into application memory to filter them scales linearly with traffic. Filtering in the engine scales logarithmically because it runs on indexed data.
The resistance to this idea comes from how we've historically built these systems. Vector databases didn't support user state, so we added it in Redis. Recommendation APIs didn't track interactions, so we added event streams. We've built complex distributed systems to compensate for retrieval engines that lack basic stateful capabilities.
But modern retrieval platforms like Shaped prove this complexity is unnecessary. When retrieval natively understands user history, filtering becomes declarative configuration instead of imperative code. You express "exclude items the user has seen" once in a config file, not repeatedly in every service that calls the API.
When to Use Exclude Seen (and When Not To)
Prebuilt exclusion filters aren't appropriate for every scenario. Here's a framework for deciding when they make sense:
Use Exclude Seen When:
You have high-engagement users. If users typically interact with 50+ items, exclusion prevents frustrating repetition. Content platforms, e-commerce sites, and social feeds fall into this category.
Recommendations refresh frequently. If users return daily or weekly expecting fresh content, exclusion is essential. Without it, you'll show the same popular items every time.
Your catalog is large relative to engagement. With 10,000 items and users who interact with 100, you have plenty of unseen options to recommend. Exclusion improves quality without exhausting the catalog.
You track interactions already. If you're logging views, clicks, or purchases for analytics, you already have the data needed for exclusion filters. The marginal cost is minimal.
You want to optimize for discovery. Exclusion forces the system to surface items users haven't found yet, expanding their awareness of your catalog.
Skip Exclude Seen When:
You have a small catalog. With 100 items and active users, exclusion might eliminate most of your inventory. A user who's seen 80 items has only 20 options left—you might want to re-recommend rather than scrape the bottom of the catalog.
Re-engagement is the goal. Email digest recommendations might benefit from showing items users engaged with before but didn't complete (e.g., "finish watching this series"). Exclusion would hide these opportunities.
Interactions don't signal saturation. A view doesn't mean the user is done with an item. News articles users "viewed" by scrolling past might deserve a second chance. In these cases, time-decay or soft penalties work better than hard exclusion.
You're doing pure exploration. For "random" or "serendipity" features where users explicitly want surprise, exclusion works against the goal. Let them encounter familiar items mixed with new ones.
Cold start is your primary challenge. If most users are new with few interactions, exclusion doesn't buy you much. Focus on solving cold start before worrying about repetition.
Decision Matrix
Common Pitfalls When Implementing Exclusion Filters
Even with prebuilt filters, there are ways to shoot yourself in the foot. Here are the five mistakes teams make most often:
Pitfall 1: Filtering Too Aggressively
Excluding everything a user has ever interacted with can backfire. A user who's been on your platform for three years might have "seen" 80% of your catalog. With aggressive filtering, you're left recommending from the dregs—low-quality items that didn't make the cut in previous sessions.
Solution: Use time windows or interaction types selectively. Exclude views from the last 30 days, but allow views from six months ago. Exclude purchases permanently (they own it), but allow clicks to resurface after time passes.
Pitfall 2: Ignoring Filter Performance
Filters run on every query. If your filter query scans millions of rows or does complex joins, you'll add 50-100ms to every request. Users notice this latency.
Solution: Ensure your filter query has proper indexes and uses efficient joins. The filter table should be materialized or use indexed lookups, not table scans.
Pitfall 3: Not Tracking Impressions
You might exclude items the user clicked but forget to exclude items they scrolled past without engaging. If you show the same 20 items across multiple sessions because the user never clicked, you're still being repetitive.
Solution: Track impressions (items shown to the user) in addition to engagement. Update your interaction table with "shown" events.
Pitfall 4: Forgetting to Test Cold Start
Your exclusion logic might work great for active users but break the experience for new users with no history. An empty filter returns all candidates, but if you're not prepared for this, you might have bugs.
Solution: Explicitly test with users who have zero interactions. Make sure the fallback behavior (no filtering) is intentional and produces good results.
Pitfall 5: Over-Filtering in Multi-Retrieval Queries
When you use multiple retrievers (content similarity + collaborative filtering + trending), and all three return overlapping items, aggressive filtering might eliminate most candidates before scoring.
Solution: Retrieve more candidates per retriever (e.g., 100 instead of 50) when using exclusion filters. This ensures you have enough candidates after filtering.
FAQ: Common Questions About Exclude Seen Filters
Q: How do I exclude items from the current session without affecting other sessions?
A: Use a separate session-based filter or pass session-specific items as parameters. For session context, you can use the item.item_id NOT IN clause with parameters:
Q: What happens if the filter removes all candidates?
A: If all retrieved candidates are filtered out, you'll get fewer results than requested. To prevent empty results, retrieve significantly more candidates than you need (e.g., retrieve 200 to return 20 final results). You can also add fallback retrievers like popularity ranking that don't depend on personalization.
Q: Can I exclude items based on complex rules beyond simple seen/not-seen?
A: Yes. The filter query can use any SQL logic. For example, exclude items from categories the user has bought from five times, or exclude items similar to ones they've returned:
Q: How often do filters update when new interactions come in?
A: Shaped maintains filters as materialized views that update in real-time as new data arrives in your interaction table. The latency depends on your connector—streaming connectors (Kafka, PostgreSQL CDC) update within seconds, batch connectors within 15 minutes. For most applications, this is fast enough. If you need instant updates, use a streaming connector.
Q: Can I have different exclusion rules for different recommendation surfaces?
A: Absolutely. Create multiple filters with different names and reference them in the appropriate queries:
Then use prebuilt('exclude_all_interactions') in main feeds and prebuilt('exclude_only_purchases') in browse-again widgets.
Q: What's the performance impact of filtering on large user histories?
A: Negligible. Shaped maintains filters as pre-computed indexes optimized for lookups. Even for users with 100,000 interactions, filter application adds <5ms to query latency. The filter is essentially a hash set lookup, not a table scan.
Next Steps: Implementing Exclude Seen in Your System
Here's how to go from reading this article to having exclude_seen filters running in production:
Quick Start (Under 30 Minutes)
Step 1: Verify Your Interaction Data (5 min)
Check that you have an interaction table with user_id, item_id, and timestamp columns. If you don't, create one:
Step 2: Add Filter to Engine Config (10 min)
Open your engine YAML and add the filters section:
Step 3: Update One Query to Use the Filter (10 min)
Modify an existing recommendation query to include the filter:
Step 4: Deploy and Test (5 min)
Deploy your updated engine config and test with a user who has interaction history:
Production Checklist
Before rolling out to all users, verify these items:
- [ ] Filter query has proper indexes on timestamp and user_id columns
- [ ] Time window for exclusion is appropriate (30-90 days typically)
- [ ] Cold start users (no interactions) get reasonable results
- [ ] Candidate retrieval sizes are increased to account for filtering (retrieve 100+ to return 20)
- [ ] Monitoring is in place for query latency and filter performance
- [ ] Edge cases are handled (all candidates filtered, user with 10,000+ interactions)
- [ ] Interaction logging is reliable and captures all relevant events
- [ ] Different interaction types (view, click, purchase) have appropriate filters
- [ ] A/B test is configured to measure impact on engagement metrics
Getting Help
If you encounter issues:
- Documentation: Full filter reference at docs.shaped.ai
- Slack Community: Join the Shaped community Slack for real-time help
- Support: Email support@shaped.ai with your engine config and error logs
- Office Hours: Schedule a session with the Shaped team for a demo
Conclusion: Stateful Retrieval Is the Future
The shift from stateless to stateful retrieval systems mirrors the evolution of databases in the 1970s. Early database systems were simple stores—you queried them, they returned data, that's it. As applications grew complex, databases absorbed more logic: constraints, triggers, stored procedures, and indexes. The same progression is happening in retrieval.
For too long, we've treated retrieval engines as stateless functions: send a query, get candidates, filter and rank in application code. This made sense when vector databases were pure similarity search, nothing more. But modern retrieval demands more—personalization, behavioral signals, business rules, and yes, state awareness.
Exclude_seen filters represent a small but significant piece of this evolution. They move state management from fragile application logic into the retrieval layer where it belongs. The result is faster queries, simpler code, better rankings, and users who trust your recommendations because they never see the same content twice.
The broader trend is clear: retrieval systems will absorb more intelligence traditionally implemented in application layers. Filtering, ranking, personalization, exploration, and real-time adaptation will all migrate down the stack into purpose-built engines. The applications that embrace this shift early will scale better and iterate faster than those clinging to the old client-side patterns.
Start small. Add exclude_seen to one recommendation surface. Measure the impact on engagement and user satisfaction. Then expand from there. Your users will notice the difference immediately—and they'll wonder why they ever had to see the same recommendations twice.
Ready to implement stateful retrieval? Try Shaped for free with $100 credits here.



