Your Agent's Retrieval Is Broken. Here's What We Built to Fix It.

Your AI agent hallucinates. Not because the LLM is bad — because the retrieval is. We've spent the past two years building Shaped, a retrieval engine for AI agents. Today we're launching the Shaped MCP Server, which means any MCP-compatible agent can use Shaped for retrieval natively.

Mar 24, 2026

min read

Nic Scheltema

Your AI agent hallucinates. Not because the LLM is bad — because the retrieval is.

GPT-4o stuffed with 50,000 tokens of barely-relevant context will hallucinate just as confidently as GPT-3.5 did. The model isn’t the bottleneck. The retrieval layer is. It determines the quality ceiling for every agent system you build, and right now, for most teams, that ceiling is embarrassingly low.

We’ve spent the past two years building Shaped — a retrieval engine for AI agents. Today we’re launching the Shaped MCP Server, which means any MCP-compatible agent (Cursor, Claude Code, Windsurf, VS Code Copilot, Gemini, OpenAI) can use Shaped for retrieval natively. No custom integration. No glue code.

But before we get to what we built, let’s talk about why we built it.

The $1.50 Answer

Here’s what happens when your agent tries to answer a question today.

A user asks something. The agent embeds the query, searches a vector database, and gets back 200 results ranked by cosine similarity. Maybe it passes those through a static reranker. Then it stuffs everything — 50,000 tokens of context — into the LLM prompt and hopes the model finds the right answer somewhere in there.

Most of the time, it doesn’t. Not because the answer isn’t in those 50,000 tokens, but because it’s buried in paragraph 47 of document 183, surrounded by 49,500 tokens of noise. The LLM can’t find the needle, so it fabricates something plausible. That’s not a hallucination problem. That’s a retrieval problem.

But it gets worse. When the agent detects that its answer is bad — the user rephrases, or an evaluation step flags low confidence — it re-retrieves. Another 200 documents. Another 50,000 tokens. Another $0.50 in LLM costs. Same imprecise retrieval, same mediocre results.

If you’ve used Cursor, you’ve seen this. The agent retrieves code context, evaluates whether it found the right files, and frequently has to search again. Each cycle adds 1-3 seconds of latency and doubles the token cost. We’ve watched agents burn through three retrieval cycles to answer a single question. That’s 150,000 tokens and roughly $1.50 for one answer.

At 10 queries per session, that’s $15. At scale across a product with thousands of users, retrieval costs become the single biggest line item in your agent infrastructure.

And the worst part: it never gets better. The 200 results your agent retrieves on day 100 are ranked by the same cosine similarity as day 1. Nothing learns. Nothing adapts. Nothing improves.

Metric	Without Shaped	With Shaped
Results returned	200	10
Tokens per query	50,000	2,500
Latency	~400ms	<50ms
LLM cost per query	~$0.50	~$0.03
Retries needed	2-3 per answer	0
Total cost per answer	~$1.50	~$0.03
Improves over time	No	Yes

Why Your Current Stack Can’t Fix This

Most teams building agents today are running some variation of the same architecture: a vector database (Pinecone, Weaviate, Chroma) for embedding search, maybe Elasticsearch for keyword matching, a reranker (Cohere, a cross-encoder) to re-order results, Redis or something similar for user features, and a few hundred lines of glue code stitching it all together.

Five services. Five bills. Five points of failure. And despite all that complexity, the retrieval still isn’t good enough.

Vector search alone returns “sort of relevant” results. Cosine similarity finds documents that are semantically close to your query. But “semantically close” and “actually useful” are different things. Two documents can be equidistant in embedding space — one is exactly what the user needs, the other is tangentially related. The embedding doesn’t know the difference. It can’t, because it has no concept of what this specific user has found useful in the past, or what the agent actually needs to generate a good answer.

Static rerankers don’t learn. Cohere’s reranker and open-source cross-encoders can improve the ordering of results, but they apply the same model on day 100 as day 1. They don’t personalize. They don’t adapt to your specific data, your users, or your query patterns. They’re a fixed function applied to every query identically.

Nobody is optimizing across the full pipeline. Each service in the DIY stack solves one piece. The vector DB retrieves. The reranker reorders. The feature store provides user signals. The glue code pipes data between them. But nobody is looking at the end-to-end picture: given this user, this query, and this data, what are the 10 most useful results? That’s a ranking problem, and ranking requires a trained model that considers all the signals together — not five separate services each optimizing their own piece.

How Shaped Fixes Retrieval

Shaped is a retrieval engine that replaces the entire DIY stack with a single API call. You connect your data, write a query, and get back ranked results in under 50 milliseconds.

But the real difference isn’t consolidation — it’s intelligence. Shaped doesn’t just search your data. It learns what’s relevant.

Hybrid Retrieval in a Single Query

Every Shaped query can combine multiple retrieval strategies simultaneously. Vector search catches semantic meaning (“configure single sign-on” matches “set up SSO”). Lexical search catches exact terms that embeddings sometimes miss (“SAML” or “Okta”). And user-specific behavioral signals surface results that this particular user has found useful in the past.

You express this in ShapedQL, a SQL-like query language:

SELECT content, title
FROM
  text_search('SSO enterprise configuration', mode='vector'),
  text_search('SSO enterprise', mode='lexical'),
  similarity(user_id=$user_id)
WHERE doc_type = 'guide'
ORDER BY relevance(user, item)
LIMIT 10

One query. Three retrieval strategies. Ten ranked results. Under 50 milliseconds.

Trained Scoring Models

This is where Shaped fundamentally diverges from vector databases. When you create a Shaped engine, it doesn’t just build an index — it trains scoring models on your data.

The ORDER BY relevance(user, item) in the query above isn’t cosine similarity. It’s a machine learning model that has learned, from your interaction data, what “relevant” actually means for your users. It considers the retrieval signals (vector similarity, keyword match score, behavioral affinity), item attributes (recency, category, metadata), and user context (what they’ve clicked, bookmarked, or rated before).

The result: instead of 200 approximate matches ranked by embedding distance, you get 10 precisely ranked results scored by a model that understands your specific data. That’s why Shaped returns 2,500 tokens instead of 50,000 — it’s not just returning fewer results, it’s returning the right results.

A 4-Stage Ranking Pipeline

Every Shaped query runs through a four-stage pipeline that mirrors how the best recommendation systems in the world work:

Most retrieval systems stop at step 1. The best ones get to step 2. Shaped does all four in a single query.

Continuous Learning from Agent Feedback

Here’s the part we’re most excited about — and the part that no vector database can replicate.

When a user rephrases a question after getting a bad answer, that’s a signal. When they give a thumbs-down, that’s a signal. When the agent has to re-retrieve because the first results weren’t good enough, that’s a signal. Shaped’s scoring models retrain on these signals automatically.

This creates a flywheel: better retrieval → fewer retries → more successful interactions → more positive training signal → even better retrieval. Day 100 is dramatically better than day 1. And you don’t have to do anything — the improvement happens automatically as your agent is used.

This directly addresses the retry loop problem. As the scoring model learns which results lead to successful agent responses, retrieval precision improves, and the agent stops needing to re-retrieve. The $1.50 answer becomes a $0.03 answer — not just because you’re returning fewer tokens, but because you’re returning the right ones on the first try.

AI Enrichment at the Storage Layer

One more thing that’s unique to Shaped: before the agent even queries, Shaped can automatically enrich your data using LLMs. It adds semantic metadata, descriptions, and structured fields at the storage layer — so the ranking models have richer signals to work with from the start.

If your product catalog has sparse descriptions, Shaped can generate detailed semantic tags. If your documentation lacks structured metadata, Shaped can classify and categorize it. This enrichment is materialized (stored and indexed), so there’s no LLM cost at query time. It’s a one-time investment that improves every subsequent retrieval.

The MCP Server: Shaped for Any Agent

We’ve known for a while that Shaped’s retrieval quality is a step change. The question was: how do we make it accessible to the broadest possible set of agents?

The answer is MCP.

The Model Context Protocol has become the standard for connecting AI agents to external tools. It’s supported by Cursor, Claude Desktop, Claude Code, Windsurf, VS Code Copilot, Gemini, OpenAI’s Agents SDK, LangChain, LlamaIndex, and thousands of custom agents. When we build one MCP server, it works with all of them.

Getting started takes two lines:

pip install shaped-mcp

Then add Shaped to your agent’s MCP config:

{
  "mcpServers": {
    "shaped": {
      "command": "shaped-mcp",
      "env": { "SHAPED_API_KEY": "your-key" }
    }
  }
}

Your agent now has a shaped_search tool that returns ranked results directly — no script generation, no terminal execution, no custom API integration. The agent calls shaped_search, gets 10 ranked results, and passes them to the LLM. Done.

This matters more than it might seem. Without MCP, every retrieval in a coding agent like Cursor requires the agent to: read the API docs (if they’re in context), generate a Python or curl script, execute it in the terminal, parse the raw JSON output, and use the results to continue. Every single retrieval burns tokens on code generation and is fragile — the script can fail on authentication, parameter formatting, or JSON parsing.

With MCP, the agent calls shaped_search(query="configure SSO") as naturally as it reads a file. Zero overhead. Zero wasted tokens on glue code.

We also provide a hosted endpoint at https://mcp.shaped.ai for environments that can’t run local processes — browser-based agents, serverless functions, mobile apps, and multi-tenant platforms. Same tools, same capabilities, no local install required.

What It Looks Like in Practice

Same question. Two retrieval stacks.

With Shaped: The agent asks “How do I configure SSO for enterprise accounts?” Shaped returns 10 ranked results — the SSO setup guide, the SAML configuration doc, and the identity provider reference — in 38 milliseconds. The LLM generates specific, step-by-step instructions: navigate to Admin → Org Settings → Authentication, select your identity provider, upload your SAML metadata XML.

Without Shaped: The agent stuffs 50,000 tokens of context into the prompt — 200 documents returned by cosine similarity, 90% of which are tangentially related. The LLM generates a vague overview: “SSO allows users to authenticate using a single set of credentials. Enterprise accounts can configure SSO through the admin panel.” No specific steps. No references to the actual documentation. The user rephrases. The agent re-retrieves. Another 50,000 tokens. Another cycle.

10 results. 2,500 tokens. 38 milliseconds. $0.03. The agent gets exactly what it needs — nothing more.

Getting Started

Shaped connects to your existing data via 20+ connectors — Postgres, S3, BigQuery, Snowflake, MongoDB, Amplitude, Segment, and more. Both real-time streaming and batch sync are supported, so your agent always retrieves from fresh data.

The setup takes one morning:

Connect a data source in the Shaped console. Point at your database, warehouse, or blob storage. No ETL, no data migration.
Configure an engine. Define what data to index, which embeddings to generate, and what scoring models to train. Shaped handles the rest — embedding generation, model training, and index building happen automatically.
Write a query and test it. Use ShapedQL in the playground to experiment with different retrieval strategies. Also available via the Python SDK, TypeScript SDK, or MCP.
Deploy. Your agent connects via MCP or the REST API. The scoring model retrains on outcomes automatically. Retrieval improves every day.

We’re offering $100 in free credits with no credit card required.

→ Learn more: shaped.ai/agent-context

→ Docs: docs.shaped.ai

→ MCP setup: pip install shaped-mcp

If you’re building agents and retrieval quality is your bottleneck, we’d love to hear from you. What are you running into? What have you tried? We’re at hello@shaped.ai and we respond to everything.

See Shaped in action

Talk to an engineer about your specific use case — search, recommendations, or feed ranking.

Book a demo →