Quick Answer: Why Agents Can’t Read Your Database
An AI agent answers from the context it retrieves. If your retrieval index contains shipping_tier: 4, order_status: 7, and priority_level: 2, the agent has to guess what those mean. It can’t.
That’s not a model problem. That’s a semantic gap — your database speaks in codes, your agent needs sentences.
Key Takeaways:
- Status codes break agents —
shipping_tier: 4is meaningless without a lookup table the agent doesn’t have - AI Views close the gap — Materialize human-readable summaries at ingestion: “Ships in 2 days via FedEx”
- This runs once, at write time — Enrichment happens when data enters Shaped, not every time the agent queries
- Works on any structured data — Product tiers, order statuses, shipping codes, user segments, error codes
- Zero inference-time cost — The agent retrieves enriched text, not raw codes. No LLM call needed during retrieval
The Status Code Problem
Imagine a customer support agent for an e-commerce platform. A user asks:
User: “When will my order arrive?”
The agent retrieves this from the database:
// order_record.json
{
"order_id": "ORD-2847",
"user_id": "USR-9472",
"shipping_tier": 4,
"order_status": 7,
"priority_level": 2,
"created_at": "2026-02-15T14:23:00Z"
}
What does shipping_tier: 4 mean? The agent doesn’t know. It might mean:
- Standard ground (5-7 days)
- Express air (2 days)
- Same-day delivery
- International freight
The actual meaning lives in a lookup table somewhere — or worse, in tribal knowledge. The agent has no way to translate 4 into “Ships in 2 days via FedEx.”
This Breaks in Production
Here’s what happens when an agent retrieves status codes instead of semantic context:
Agent response:
“Your order ORD-2847 has shipping tier 4 and status 7.”
What the user needed:
“Your order ships in 2 days via FedEx and is currently in transit to the distribution center.”
The model didn’t fail. The retrieval layer handed it unusable data.
Why This Happens
Most production databases store structured, coded data for efficiency:
- Shipping tiers → integers (1-10)
- Order statuses → enums (PENDING, PROCESSING, SHIPPED, DELIVERED)
- User segments → codes (GOLD, SILVER, BRONZE)
- Priority levels → numbers (0-5)
This is correct database design. It’s compact, queryable, and joins efficiently. But it’s semantically opaque to an LLM.
The semantic gap:
# semantic_gap
Database column → shipping_tier: 4
Agent needs → "Ships in 2 days via FedEx"
What AI Views Actually Do
An AI View is a materialized enrichment layer that runs at ingestion time. It takes structured columns from your source table and generates semantic summaries that get indexed alongside the raw data.
The Pipeline
order_status: 7
priority: 2
+ lookup rules
via FedEx.
Currently in transit.”
readable text
Key Properties
1. Write-time enrichment The AI View runs when data enters Shaped, not when the agent queries. Enrichment cost is paid once per record, not on every retrieval.
2. Materialized in the index The enriched text is stored and indexed. The agent retrieves it like any other field — zero inference-time latency.
3. Automatic updates
When a record updates (e.g., order_status: 7 becomes order_status: 8), the AI View re-enriches that row. The index stays fresh.
4. Source columns stay untouched
The raw shipping_tier: 4 is preserved. The AI View adds a new column (ai_shipping_summary) — it doesn’t replace anything.
Part 1: The Traditional Fix (Post-Retrieval Enrichment)
The standard solution is to enrich after retrieval, in the agent’s context layer. You retrieve the raw codes, then hydrate them with a lookup service before passing context to the LLM.
Architecture
Retrieval
Service
Hydration
This adds two round trips before the agent can answer: one to retrieve codes, one to look them up.
Implementation
Step 1: Shipping tier lookup table
-- shipping_tiers_lookup.sql
CREATE TABLE shipping_tiers (
tier_id INT PRIMARY KEY,
tier_name VARCHAR(100),
description TEXT,
estimated_days INT,
carrier VARCHAR(50)
);
INSERT INTO shipping_tiers VALUES
(1, 'Economy Ground', 'Standard ground shipping, 7-10 business days', 10, 'USPS'),
(2, 'Standard Ground', 'Ground shipping, 5-7 business days', 7, 'UPS'),
(3, 'Expedited', 'Expedited shipping, 3-5 business days', 5, 'UPS'),
(4, 'Express', 'Express air shipping, 2 business days', 2, 'FedEx'),
(5, 'Overnight', 'Overnight delivery, 1 business day', 1, 'FedEx'),
(6, 'Same-Day', 'Same-day delivery in metro areas', 0, 'Local Courier');
Step 2: Order status lookup
-- order_statuses_lookup.sql
CREATE TABLE order_statuses (
status_id INT PRIMARY KEY,
status_name VARCHAR(50),
customer_message TEXT
);
INSERT INTO order_statuses VALUES
(1, 'PENDING', 'Your order is being processed'),
(2, 'PAYMENT_CONFIRMED', 'Payment confirmed, order is being prepared'),
(3, 'PICKING', 'Your order is being picked from the warehouse'),
(4, 'PACKING', 'Your order is being packed for shipment'),
(5, 'READY_TO_SHIP', 'Your order is ready to ship'),
(6, 'LABEL_CREATED', 'Shipping label created'),
(7, 'IN_TRANSIT', 'Your order is in transit to the distribution center'),
(8, 'OUT_FOR_DELIVERY', 'Your order is out for delivery'),
(9, 'DELIVERED', 'Your order has been delivered'),
(10, 'EXCEPTION', 'There is an issue with your order');
Step 3: Context hydration service
# context_hydrator.py
import psycopg2
from typing import Dict, List
class ContextHydrator:
def __init__(self, db_config):
self.conn = psycopg2.connect(**db_config)
def hydrate_order(self, order: Dict) -> str:
"""
Takes raw order dict with codes, returns human-readable summary.
"""
# Look up shipping tier
shipping = self._lookup_shipping_tier(order['shipping_tier'])
# Look up order status
status = self._lookup_order_status(order['order_status'])
# Look up priority
priority_text = self._lookup_priority(order.get('priority_level', 0))
# Assemble summary
summary = f"""
Order {order['order_id']}:
- Shipping: {shipping['description']} (estimated {shipping['estimated_days']} days via {shipping['carrier']})
- Status: {status['customer_message']}
- Priority: {priority_text}
- Ordered: {order['created_at']}
""".strip()
return summary
def _lookup_shipping_tier(self, tier_id: int) -> Dict:
cursor = self.conn.cursor()
cursor.execute("""
SELECT tier_name, description, estimated_days, carrier
FROM shipping_tiers
WHERE tier_id = %s
""", (tier_id,))
row = cursor.fetchone()
if not row:
return {'description': f'Unknown tier {tier_id}', 'estimated_days': 0, 'carrier': 'Unknown'}
return {
'tier_name': row[0],
'description': row[1],
'estimated_days': row[2],
'carrier': row[3]
}
def _lookup_order_status(self, status_id: int) -> Dict:
cursor = self.conn.cursor()
cursor.execute("""
SELECT status_name, customer_message
FROM order_statuses
WHERE status_id = %s
""", (status_id,))
row = cursor.fetchone()
if not row:
return {'customer_message': f'Status code {status_id}'}
return {'status_name': row[0], 'customer_message': row[1]}
def _lookup_priority(self, priority: int) -> str:
priority_map = {
0: 'Standard priority',
1: 'Low priority',
2: 'Medium priority',
3: 'High priority',
4: 'Urgent',
5: 'Critical'
}
return priority_map.get(priority, f'Priority level {priority}')
# Usage in agent retrieval layer
hydrator = ContextHydrator(db_config={
'host': 'localhost',
'database': 'ecommerce',
'user': 'agent',
'password': 'xxx'
})
# After retrieving from vector DB
raw_orders = vector_db.query("user order status", top_k=5)
# Hydrate each result
enriched_context = []
for order in raw_orders:
enriched = hydrator.hydrate_order(order)
enriched_context.append(enriched)
# Pass to LLM
context = "\n\n".join(enriched_context)
llm_response = llm.chat(user_query, context=context)
What You’re Operating
| Component | What It Is | Failure Mode |
|---|---|---|
| Lookup tables | PostgreSQL tables with tier/status definitions | Schema drift, missing codes |
| Hydration service | Python service that queries lookup tables | Connection pool exhaustion, query latency |
| Two-phase retrieval | Vector query → lookup query → LLM | 2x latency, doubles infrastructure load |
| Deployment | Long-running Python process | Crashes, memory leaks |
| Monitoring | Custom metrics on lookup hit rate | Invisible when lookup fails silently |
The cost:
- Latency: 50-150ms for vector retrieval + 20-80ms for lookup queries = 70-230ms before LLM call
- Infrastructure: Dedicated lookup service + PostgreSQL read replicas
- Code to maintain: approx. 400 lines (lookup tables, hydration logic, connection pooling, error handling)
- Operational complexity: Monitor lookup service health, handle schema changes, cache invalidation
Part 2: The Shaped Way — AI Views
Shaped’s AI Views move enrichment from retrieval time to ingestion time. You define a view that takes source columns (the raw codes) and generates enriched output columns (the semantic summaries). This runs once when data enters the index.
Architecture
Ingestion
prompt: “Translate codes into customer-facing summary”
enriched_output_columns: [ai_order_summary]
Enriched: “Ships in 2 days via FedEx. Currently in transit.”
Retrieval
Implementation
Step 1: Connect your source table
# orders_table.yaml
version: v2
name: orders
schema_type: POSTGRES
host: postgres.example.com
port: 5432
database: ecommerce
table_name: orders
schema:
- name: order_id
type: STRING
- name: user_id
type: STRING
- name: shipping_tier
type: INTEGER
- name: order_status
type: INTEGER
- name: priority_level
type: INTEGER
- name: created_at
type: TIMESTAMP
- name: updated_at
type: TIMESTAMP
# terminal
$ shaped create-table --file orders_table.yaml
Step 2: Create the AI View
# create_ai_view.sh
curl -X POST "https://api.shaped.ai/v2/views" \
-H "Content-Type: application/json" \
-H "x-api-key: $SHAPED_API_KEY" \
-d '{
"name": "orders_with_enrichment",
"view_type": "AI_ENRICHMENT",
"source_table": "orders",
"source_columns": [
"shipping_tier",
"order_status",
"priority_level",
"created_at"
],
"source_columns_in_output": [
"order_id",
"user_id",
"shipping_tier",
"order_status"
],
"enriched_output_columns": [
"ai_order_summary"
],
"prompt": "You are a customer support assistant. Translate the shipping tier, order status, and priority level into a clear, customer-facing summary. Use this mapping:\n\nShipping tiers:\n1 = Economy Ground (7-10 days, USPS)\n2 = Standard Ground (5-7 days, UPS)\n3 = Expedited (3-5 days, UPS)\n4 = Express (2 days, FedEx)\n5 = Overnight (1 day, FedEx)\n6 = Same-Day (metro areas, Local Courier)\n\nOrder statuses:\n1 = Being processed\n2 = Payment confirmed, being prepared\n3 = Being picked from warehouse\n4 = Being packed\n5 = Ready to ship\n6 = Shipping label created\n7 = In transit to distribution center\n8 = Out for delivery\n9 = Delivered\n10 = Exception / issue\n\nPriority levels:\n0 = Standard\n1 = Low\n2 = Medium\n3 = High\n4 = Urgent\n5 = Critical\n\nGenerate a 2-3 sentence summary that tells the customer when they will receive their order and what is happening now."
}'
What this does:
- Reads
shipping_tier,order_status,priority_level,created_atfrom theorderstable - Passes them to an LLM with the enrichment prompt
- LLM generates a customer-facing summary in the
ai_order_summarycolumn - This enrichment runs once when the row is inserted or updated
- The enriched view is now a queryable table:
orders_with_enrichment
Output example:
Given this input row:
// input_row.json
{
"order_id": "ORD-2847",
"user_id": "USR-9472",
"shipping_tier": 4,
"order_status": 7,
"priority_level": 2,
"created_at": "2026-02-15T14:23:00Z"
}
The AI View generates:
// enriched_output.json
{
"order_id": "ORD-2847",
"user_id": "USR-9472",
"shipping_tier": 4,
"order_status": 7,
"ai_order_summary": "Your order ships via FedEx Express with an estimated delivery in 2 business days. It's currently in transit to the distribution center. This is a medium-priority order placed on February 15, 2026."
}
Step 3: Index the enriched view in your engine
# agent_engine.yaml
version: v2
name: customer_support_agent
data:
order_table:
name: orders_with_enrichment # Use the AI View as the data source
type: table
encoder:
name: text-embedding-3-small
provider: openai
columns:
- name: ai_order_summary # This is what gets embedded and retrieved
weight: 1.0
# terminal
$ shaped create-engine --file agent_engine.yaml
Step 4: Query from your agent
# agent_with_shaped.py
import requests
SHAPED_API_KEY = "your-api-key"
def get_order_context(user_query: str, user_id: str):
"""
Retrieve enriched order context for the agent.
"""
response = requests.post(
"https://api.shaped.ai/v2/rank",
headers={"x-api-key": SHAPED_API_KEY},
json={
"engine_name": "customer_support_agent",
"query": user_query,
"candidates": {
"table": "orders_with_enrichment",
"filter": f"user_id = '{user_id}'"
},
"limit": 5
}
)
results = response.json()
# Extract enriched summaries
context = []
for result in results['results']:
context.append(result['ai_order_summary'])
return "\n\n".join(context)
# Agent workflow
user_query = "When will my order arrive?"
user_id = "USR-9472"
# One retrieval call gets enriched context
enriched_context = get_order_context(user_query, user_id)
# Pass to LLM
llm_response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a customer support agent."},
{"role": "user", "content": f"Context:\n{enriched_context}\n\nQuestion: {user_query}"}
]
)
print(llm_response.choices[0].message.content)
What the agent gets back:
# agent_output
Your order ships via FedEx Express with an estimated delivery in 2 business days.
It's currently in transit to the distribution center.
This is a medium-priority order placed on February 15, 2026.
The agent now has semantic, human-readable context in a single retrieval call. No lookup tables, no hydration service, no two-phase queries.
Real-World Examples
Example 1: Product Tiers
Raw data:
// product_tier_raw.json
{
"product_id": "PROD-4821",
"tier_code": "PREM_PLUS",
"feature_flags": 147,
"max_users": 50
}
AI View prompt:
# product_tier_prompt
Translate the tier code and feature flags into a customer-facing product description.
Tier codes: BASIC, STANDARD, PREM, PREM_PLUS, ENTERPRISE.
Feature flags are bitwise: 1=API access, 2=SSO, 4=Advanced analytics, 8=White-label,
16=Priority support, 32=Dedicated account manager, 64=Custom SLA, 128=On-prem deployment.
Enriched output:
# product_tier_enriched
Premium Plus plan with API access, SSO, advanced analytics, priority support,
and dedicated account manager. Supports up to 50 users.
Example 2: User Segments
Raw data:
// user_segment_raw.json
{
"user_id": "USR-7724",
"segment_id": 8,
"ltv_bucket": 4,
"churn_risk": 2
}
AI View prompt:
# user_segment_prompt
Translate user segment, LTV bucket, and churn risk into an internal sales summary.
Segments: 1=Trial, 2=Free, 3=Paid Monthly, 4=Paid Annual, 5=Enterprise Trial,
6=Enterprise Paid, 7=Churned, 8=VIP.
LTV buckets: 1=$0-500, 2=$500-2K, 3=$2K-10K, 4=$10K-50K, 5=$50K+.
Churn risk: 0=Low, 1=Medium-Low, 2=Medium, 3=Medium-High, 4=High.
Enriched output:
# user_segment_enriched
VIP customer with lifetime value between $10K-50K.
Currently shows medium churn risk — recommend proactive outreach.
Example 3: Error Codes
Raw data:
// error_code_raw.json
{
"error_id": "ERR-9284",
"error_code": "AUTH_403_QUOTA",
"service": "api_gateway",
"user_tier": 2
}
AI View prompt:
# error_code_prompt
Translate error codes into user-facing messages.
AUTH_403_QUOTA = User exceeded their API quota for their tier.
Tier 1 = 1K requests/day, Tier 2 = 10K/day, Tier 3 = 100K/day, Tier 4 = Unlimited.
Enriched output:
# error_code_enriched
You've exceeded your API quota of 10,000 requests per day (Tier 2).
Quota resets at midnight UTC. Upgrade to Tier 3 for 100K requests/day.
Comparison: Traditional vs Shaped
| Component | Traditional Stack | Shaped |
|---|---|---|
| Lookup tables | PostgreSQL tables (manual maintenance) | AI View prompt (declarative) |
| Hydration service | Custom Python service (long-running) | AI View (automatic, serverless) |
| Enrichment timing | Post-retrieval (every query) | Pre-indexing (once per write) |
| Latency impact | +20-80ms per query | 0ms (materialized at write time) |
| Schema changes | Update lookup tables + restart service | Update AI View prompt, re-backfill |
| Missing codes | Silent failures, fallback to code | LLM infers from context in prompt |
| Infrastructure cost | Lookup service + read replicas (~$800/month) | Included in Shaped (~$0) |
| Code to maintain | ~400 lines (hydrator + lookups + pooling) | ~30 lines (YAML + prompt) |
| Time to production | 2-4 weeks | 1-2 days |
FAQ
Q: When should I use AI Views vs traditional lookup tables?
Use AI Views when you have structured codes (status IDs, tier codes, feature flags) that need to be translated into human-readable text for agent retrieval. Use traditional lookup tables when you need exact, deterministic mappings that never change or when enrichment logic is too complex for a prompt (multi-step business rules, external API calls).
Q: What happens if the AI View generates incorrect enrichment?
The raw columns are always preserved in the output. You can query both shipping_tier: 4 (exact) and ai_order_summary (enriched). If the enrichment is wrong, update the prompt and re-backfill. The source data is never modified.
Q: How much does AI enrichment cost?
Enrichment runs once per row at write time. For a table with 1M rows and 100 new rows/hour, the LLM cost is approximately $0.01 per 1K enrichments (using GPT-3.5-turbo). Total: approximately $10 for the initial backfill, approximately $0.02/hour ongoing. Compare this to the infrastructure cost of running a dedicated hydration service (approximately $800/month).
Q: Can I enrich multiple columns in one AI View?
Yes. You can generate multiple enriched output columns from the same source columns. For example, generate both ai_order_summary (customer-facing) and ai_internal_notes (support agent context) in one view.
Q: How do I handle codes that aren’t in the prompt?
The LLM will infer from context or return “Unknown tier X”. To handle new codes, update the AI View prompt with the new mappings and trigger a re-backfill for affected rows. This is faster than updating lookup tables and redeploying services.
Q: Does this work with real-time data?
Yes. AI Views run on every insert/update. If an order’s status changes from 7 (in transit) to 8 (out for delivery), the AI View re-enriches that row within 30 seconds. The agent always retrieves the latest enriched summary.
Q: Can I use AI Views for image or audio data?
Yes. AI Views support image URLs in source_columns. You can extract visual metadata (product color, style, defects) from images and materialize it as searchable text. Audio enrichment requires pre-processing (transcription) before the AI View.
Q: What if I need different enrichments for different use cases?
Create multiple AI Views from the same source table. For example, orders_customer_view for customer-facing summaries and orders_support_view for internal support context. Each view gets indexed separately and can be queried independently.
Conclusion
The “status code problem” is structural: databases store codes, agents need sentences. The traditional fix — post-retrieval hydration via lookup tables — works, but it doubles your latency, adds infrastructure, and increases operational complexity.
AI Views solve this at the ingestion layer. Enrichment runs once, at write time. The agent retrieves materialized, human-readable context in a single query. No lookup service, no two-phase retrieval, no inference-time LLM calls.
If your agent is retrieving shipping_tier: 4 instead of “Ships in 2 days via FedEx,” you have a semantic gap. AI Views close it.
Ready to close the semantic gap? Sign up for Shaped and get $300 in free credits. See how AI Views transform your agent’s retrieval quality. Visit console.shaped.ai/register to get started.