A Furniture Discovery example: Why LLMs Fail at Retrieval
Imagine you’re building an agent for a high-end furniture marketplace. A user asks: “Show me some modern lounge chairs that are actually comfortable and won’t take forever to ship.”
If you build this using only an LLM + a standard Vector DB (RAG), here is the failure mode:
- The Semantic Trap: The Vector DB finds 50 “lounge chairs” where the text embedding matches “modern.” But it doesn’t know what “comfortable” means. To a database, comfort is just a word; to a business, comfort is a behavioral signal found in low return rates and high repeat purchase data.
- The Data Stale-mate: The LLM recommends a beautiful chair that went out of stock ten minutes ago because the vector index is only updated via weekly batch jobs.
- The Context Window Tax: You can’t shovel 500 candidate chairs into a prompt to “let the LLM decide.” It’s expensive, high-latency, and LLMs suffer from positional bias, often picking the first items they see regardless of quality.
The outcome? An agent that is a “stochastic parrot”—it looks smart but makes low-quality decisions that lose revenue and erode user trust.
Step 1: Connecting Your Data (The Foundation)
High-fidelity retrieval requires a unified view of your product catalog, user profiles, and event streams. Shaped provides three ways to do this:
- The Console: Select from 20+ native connectors (BigQuery, Snowflake, Segment, Amplitude) and authenticate in clicks.
- The CLI: Use shaped create-table to provision schemas.
- The SDKs: Use the Python or TypeScript SDKs to stream data directly from your application.
For our furniture agent, we’ll use the Python SDK to connect our catalog.
# connect_data.py
import shaped
client = shaped.Client(api_key="your_api_key")
# Connect catalog from Postgres (or use the Console to select Snowflake/BigQuery)
client.create_table(
name="furniture_catalog",
schema_type="POSTGRES",
connection_config={
"host": "db.marketplace.com",
"database": "inventory",
"table": "products"
},
replication_key="updated_at" # 15-min incremental syncs
)
Step 2: AI Enrichment (Transforming Content to Intelligence)
Raw data like “Walnut Chair” isn’t enough for an agent to determine if something is “ergonomic.” Shaped allows you to create AI Views—materialized, LLM-enriched representations of your data. This creates “agent-ready” features before a user even asks a question.
# ai_view.json
// Define an AI View in the Console or via the API
{
"name": "enriched_furniture",
"view_type": "AI_ENRICHMENT",
"source_table": "furniture_catalog",
"source_columns": ["description", "specs", "review_summary"],
"enriched_output_columns": ["comfort_score", "aesthetic_profile"],
"prompt": "Analyze the specs and reviews. Rate ergonomics from 0-1 and identify the specific design style (e.g., Scandic, Bauhaus)."
}
Outcome: Your agent now has access to a “comfort_score” feature that was derived from actual customer sentiment, not just marketing copy.
Step 3: Defining the Engine (The Intelligence Layer)
An Engine is where Shaped trains models on your data. Unlike a simple vector DB, a Shaped Engine combines semantic embeddings with behavioral models like ELSA or Two-Tower.
You can define this in a YAML file via the CLI or configure it through the Shaped Console.
# engine.yaml
# engine.yaml
name: furniture_agent_engine
data:
item_table: { name: "enriched_furniture" }
interaction_table: { name: "user_purchases" } # Learn what people actually buy
index:
embeddings:
- name: furniture_embeddings
encoder:
type: hugging_face
model_name: "sentence-transformers/all-MiniLM-L6-v2"
training:
models:
- name: conversion_model
policy_type: elsa # Scalable autoencoder for behavioral signals
Step 4: High-Signal Retrieval with ShapedQL
Now, when your agent needs to find the “best” chairs, it doesn’t just do a vector lookup. It executes a ShapedQL query that handles the four stages of discovery: Retrieve, Filter, Score, and Reorder.
# agent_query.sql
-- This query is sent by your agent to the Shaped Query API
SELECT *
FROM text_search(
query='modern comfortable lounge chair',
mode='vector',
text_embedding_ref='furniture_embeddings',
limit=100
)
WHERE lead_time_days <= 3 AND inventory_count > 0
ORDER BY score(
expression='0.6 * conversion_model + 0.4 * item.comfort_score',
input_user_id=$user_id
)
LIMIT 5
Step 5: Defining the Agent Logic
Finally, your LLM agent (using OpenAI, Anthropic, etc.) calls Shaped as a tool. Instead of giving the LLM a “junk drawer” of 50 semi-relevant chairs, you give it 5 deterministic, business-validated, and behaviorally-ranked items.
The Python Integration:
# agent_tool.py
# The agent tool definition
def get_furniture_context(user_query, user_id):
# Call the Shaped Engine
response = client.query(
engine_name="furniture_agent_engine",
query=SHAPED_QL_STRING,
parameters={"agent_query": user_query, "user_id": user_id}
)
return response['results']
# The Agent can now confidently say:
# "I found 3 modern chairs available for 2-day shipping.
# Based on 500+ reviews, the 'Eames-style' is rated highest for comfort."
The Outcomes: Why This Wins
By moving retrieval and ranking from the prompt into Shaped, you achieve:
- Zero Hallucinations on Constraints: If ShapedQL filters for inventory > 0, the agent cannot recommend an out-of-stock item.
- Behavioral Accuracy: Your agent suggests items people actually buy and keep (via the ELSA model), not just items that match the word “modern.”
- Low Latency: Shaped’s fast_tier serves these queries in <50ms, ensuring your agent doesn’t feel sluggish.
- Developer Agility: Change your business logic (e.g., “boost high-margin items”) by editing a single ShapedQL line in the Console, without re-deploying your agent code.
Stop treating your agent’s context window like a search bar. Give it the high-signal, behavioral intelligence it needs to actually work.
Want to try Shaped with your own data?Sign up for a free trial with $300 credits here.
See Shaped in action
Talk to an engineer about your specific use case — search, recommendations, or feed ranking.