The 'Status Code' Problem: Using AI Views to Enrich Legacy Data

AI agents fail when your database stores shipping_tier: 4 instead of 'Ships in 2 days via FedEx.' Learn how to use AI Views to automatically materialize semantic summaries of structured data at ingestion—so your agent retrieves human-readable context, not status codes.

The 'Status Code' Problem: Using AI Views to Enrich Legacy Data

Quick Answer: Why Agents Can’t Read Your Database

An AI agent answers from the context it retrieves. If your retrieval index contains shipping_tier: 4, order_status: 7, and priority_level: 2, the agent has to guess what those mean. It can’t.

That’s not a model problem. That’s a semantic gap — your database speaks in codes, your agent needs sentences.

Key Takeaways:

  • Status codes break agentsshipping_tier: 4 is meaningless without a lookup table the agent doesn’t have
  • AI Views close the gap — Materialize human-readable summaries at ingestion: “Ships in 2 days via FedEx”
  • This runs once, at write time — Enrichment happens when data enters Shaped, not every time the agent queries
  • Works on any structured data — Product tiers, order statuses, shipping codes, user segments, error codes
  • Zero inference-time cost — The agent retrieves enriched text, not raw codes. No LLM call needed during retrieval

The Status Code Problem

Imagine a customer support agent for an e-commerce platform. A user asks:

User: “When will my order arrive?”

The agent retrieves this from the database:

// order_record.json
{
  "order_id": "ORD-2847",
  "user_id": "USR-9472",
  "shipping_tier": 4,
  "order_status": 7,
  "priority_level": 2,
  "created_at": "2026-02-15T14:23:00Z"
}

What does shipping_tier: 4 mean? The agent doesn’t know. It might mean:

  • Standard ground (5-7 days)
  • Express air (2 days)
  • Same-day delivery
  • International freight

The actual meaning lives in a lookup table somewhere — or worse, in tribal knowledge. The agent has no way to translate 4 into “Ships in 2 days via FedEx.”

This Breaks in Production

Here’s what happens when an agent retrieves status codes instead of semantic context:

Agent response:

“Your order ORD-2847 has shipping tier 4 and status 7.”

What the user needed:

“Your order ships in 2 days via FedEx and is currently in transit to the distribution center.”

The model didn’t fail. The retrieval layer handed it unusable data.

Why This Happens

Most production databases store structured, coded data for efficiency:

  • Shipping tiers → integers (1-10)
  • Order statuses → enums (PENDING, PROCESSING, SHIPPED, DELIVERED)
  • User segments → codes (GOLD, SILVER, BRONZE)
  • Priority levels → numbers (0-5)

This is correct database design. It’s compact, queryable, and joins efficiently. But it’s semantically opaque to an LLM.

The semantic gap:

# semantic_gap
Database column → shipping_tier: 4
Agent needs      → "Ships in 2 days via FedEx"

What AI Views Actually Do

An AI View is a materialized enrichment layer that runs at ingestion time. It takes structured columns from your source table and generates semantic summaries that get indexed alongside the raw data.

The Pipeline

Raw Data
shipping_tier: 4
order_status: 7
priority: 2
AI View
LLM prompt
+ lookup rules
Shaped Index
“Ships in 2 days
via FedEx.
Currently in transit.”
Agent Retrieval
Agent gets
readable text

Key Properties

1. Write-time enrichment The AI View runs when data enters Shaped, not when the agent queries. Enrichment cost is paid once per record, not on every retrieval.

2. Materialized in the index The enriched text is stored and indexed. The agent retrieves it like any other field — zero inference-time latency.

3. Automatic updates When a record updates (e.g., order_status: 7 becomes order_status: 8), the AI View re-enriches that row. The index stays fresh.

4. Source columns stay untouched The raw shipping_tier: 4 is preserved. The AI View adds a new column (ai_shipping_summary) — it doesn’t replace anything.

Part 1: The Traditional Fix (Post-Retrieval Enrichment)

The standard solution is to enrich after retrieval, in the agent’s context layer. You retrieve the raw codes, then hydrate them with a lookup service before passing context to the LLM.

Architecture

Agent Query
Vector DB
Retrieval
shipping_tier: 4
order_status: 7
Lookup
Service
Query shipping_tiers table,
order_statuses table
Context
Hydration
“Ships in 2 days via FedEx.
Currently in transit.”
LLM Call
Agent response

This adds two round trips before the agent can answer: one to retrieve codes, one to look them up.

Implementation

Step 1: Shipping tier lookup table

-- shipping_tiers_lookup.sql
CREATE TABLE shipping_tiers (
  tier_id INT PRIMARY KEY,
  tier_name VARCHAR(100),
  description TEXT,
  estimated_days INT,
  carrier VARCHAR(50)
);

INSERT INTO shipping_tiers VALUES
  (1, 'Economy Ground', 'Standard ground shipping, 7-10 business days', 10, 'USPS'),
  (2, 'Standard Ground', 'Ground shipping, 5-7 business days', 7, 'UPS'),
  (3, 'Expedited', 'Expedited shipping, 3-5 business days', 5, 'UPS'),
  (4, 'Express', 'Express air shipping, 2 business days', 2, 'FedEx'),
  (5, 'Overnight', 'Overnight delivery, 1 business day', 1, 'FedEx'),
  (6, 'Same-Day', 'Same-day delivery in metro areas', 0, 'Local Courier');

Step 2: Order status lookup

-- order_statuses_lookup.sql
CREATE TABLE order_statuses (
  status_id INT PRIMARY KEY,
  status_name VARCHAR(50),
  customer_message TEXT
);

INSERT INTO order_statuses VALUES
  (1, 'PENDING', 'Your order is being processed'),
  (2, 'PAYMENT_CONFIRMED', 'Payment confirmed, order is being prepared'),
  (3, 'PICKING', 'Your order is being picked from the warehouse'),
  (4, 'PACKING', 'Your order is being packed for shipment'),
  (5, 'READY_TO_SHIP', 'Your order is ready to ship'),
  (6, 'LABEL_CREATED', 'Shipping label created'),
  (7, 'IN_TRANSIT', 'Your order is in transit to the distribution center'),
  (8, 'OUT_FOR_DELIVERY', 'Your order is out for delivery'),
  (9, 'DELIVERED', 'Your order has been delivered'),
  (10, 'EXCEPTION', 'There is an issue with your order');

Step 3: Context hydration service

# context_hydrator.py
import psycopg2
from typing import Dict, List

class ContextHydrator:
    def __init__(self, db_config):
        self.conn = psycopg2.connect(**db_config)

    def hydrate_order(self, order: Dict) -> str:
        """
        Takes raw order dict with codes, returns human-readable summary.
        """
        # Look up shipping tier
        shipping = self._lookup_shipping_tier(order['shipping_tier'])

        # Look up order status
        status = self._lookup_order_status(order['order_status'])

        # Look up priority
        priority_text = self._lookup_priority(order.get('priority_level', 0))

        # Assemble summary
        summary = f"""
Order {order['order_id']}:
- Shipping: {shipping['description']} (estimated {shipping['estimated_days']} days via {shipping['carrier']})
- Status: {status['customer_message']}
- Priority: {priority_text}
- Ordered: {order['created_at']}
        """.strip()

        return summary

    def _lookup_shipping_tier(self, tier_id: int) -> Dict:
        cursor = self.conn.cursor()
        cursor.execute("""
            SELECT tier_name, description, estimated_days, carrier
            FROM shipping_tiers
            WHERE tier_id = %s
        """, (tier_id,))
        row = cursor.fetchone()
        if not row:
            return {'description': f'Unknown tier {tier_id}', 'estimated_days': 0, 'carrier': 'Unknown'}
        return {
            'tier_name': row[0],
            'description': row[1],
            'estimated_days': row[2],
            'carrier': row[3]
        }

    def _lookup_order_status(self, status_id: int) -> Dict:
        cursor = self.conn.cursor()
        cursor.execute("""
            SELECT status_name, customer_message
            FROM order_statuses
            WHERE status_id = %s
        """, (status_id,))
        row = cursor.fetchone()
        if not row:
            return {'customer_message': f'Status code {status_id}'}
        return {'status_name': row[0], 'customer_message': row[1]}

    def _lookup_priority(self, priority: int) -> str:
        priority_map = {
            0: 'Standard priority',
            1: 'Low priority',
            2: 'Medium priority',
            3: 'High priority',
            4: 'Urgent',
            5: 'Critical'
        }
        return priority_map.get(priority, f'Priority level {priority}')


# Usage in agent retrieval layer
hydrator = ContextHydrator(db_config={
    'host': 'localhost',
    'database': 'ecommerce',
    'user': 'agent',
    'password': 'xxx'
})

# After retrieving from vector DB
raw_orders = vector_db.query("user order status", top_k=5)

# Hydrate each result
enriched_context = []
for order in raw_orders:
    enriched = hydrator.hydrate_order(order)
    enriched_context.append(enriched)

# Pass to LLM
context = "\n\n".join(enriched_context)
llm_response = llm.chat(user_query, context=context)

What You’re Operating

Component What It Is Failure Mode
Lookup tables PostgreSQL tables with tier/status definitions Schema drift, missing codes
Hydration service Python service that queries lookup tables Connection pool exhaustion, query latency
Two-phase retrieval Vector query → lookup query → LLM 2x latency, doubles infrastructure load
Deployment Long-running Python process Crashes, memory leaks
Monitoring Custom metrics on lookup hit rate Invisible when lookup fails silently

The cost:

  • Latency: 50-150ms for vector retrieval + 20-80ms for lookup queries = 70-230ms before LLM call
  • Infrastructure: Dedicated lookup service + PostgreSQL read replicas
  • Code to maintain: approx. 400 lines (lookup tables, hydration logic, connection pooling, error handling)
  • Operational complexity: Monitor lookup service health, handle schema changes, cache invalidation

Part 2: The Shaped Way — AI Views

Shaped’s AI Views move enrichment from retrieval time to ingestion time. You define a view that takes source columns (the raw codes) and generates enriched output columns (the semantic summaries). This runs once when data enters the index.

Architecture

Your Database: orders table
Shaped
Ingestion
Batch or streaming sync
AI View: order_enrichment
source_columns: [shipping_tier, order_status, priority_level]
prompt: “Translate codes into customer-facing summary”
enriched_output_columns: [ai_order_summary]
LLM enrichment
runs once per row
Shaped Index
Raw: shipping_tier: 4, order_status: 7
Enriched: “Ships in 2 days via FedEx. Currently in transit.”
Both raw + enriched
stored together
Agent
Retrieval
Gets raw codes AND
enriched summary
in one query

Implementation

Step 1: Connect your source table

# orders_table.yaml
version: v2
name: orders
schema_type: POSTGRES
host: postgres.example.com
port: 5432
database: ecommerce
table_name: orders
schema:
  - name: order_id
    type: STRING
  - name: user_id
    type: STRING
  - name: shipping_tier
    type: INTEGER
  - name: order_status
    type: INTEGER
  - name: priority_level
    type: INTEGER
  - name: created_at
    type: TIMESTAMP
  - name: updated_at
    type: TIMESTAMP
# terminal
$ shaped create-table --file orders_table.yaml

Step 2: Create the AI View

# create_ai_view.sh
curl -X POST "https://api.shaped.ai/v2/views" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $SHAPED_API_KEY" \
  -d '{
    "name": "orders_with_enrichment",
    "view_type": "AI_ENRICHMENT",
    "source_table": "orders",
    "source_columns": [
      "shipping_tier",
      "order_status",
      "priority_level",
      "created_at"
    ],
    "source_columns_in_output": [
      "order_id",
      "user_id",
      "shipping_tier",
      "order_status"
    ],
    "enriched_output_columns": [
      "ai_order_summary"
    ],
    "prompt": "You are a customer support assistant. Translate the shipping tier, order status, and priority level into a clear, customer-facing summary. Use this mapping:\n\nShipping tiers:\n1 = Economy Ground (7-10 days, USPS)\n2 = Standard Ground (5-7 days, UPS)\n3 = Expedited (3-5 days, UPS)\n4 = Express (2 days, FedEx)\n5 = Overnight (1 day, FedEx)\n6 = Same-Day (metro areas, Local Courier)\n\nOrder statuses:\n1 = Being processed\n2 = Payment confirmed, being prepared\n3 = Being picked from warehouse\n4 = Being packed\n5 = Ready to ship\n6 = Shipping label created\n7 = In transit to distribution center\n8 = Out for delivery\n9 = Delivered\n10 = Exception / issue\n\nPriority levels:\n0 = Standard\n1 = Low\n2 = Medium\n3 = High\n4 = Urgent\n5 = Critical\n\nGenerate a 2-3 sentence summary that tells the customer when they will receive their order and what is happening now."
  }'

What this does:

  • Reads shipping_tier, order_status, priority_level, created_at from the orders table
  • Passes them to an LLM with the enrichment prompt
  • LLM generates a customer-facing summary in the ai_order_summary column
  • This enrichment runs once when the row is inserted or updated
  • The enriched view is now a queryable table: orders_with_enrichment

Output example:

Given this input row:

// input_row.json
{
  "order_id": "ORD-2847",
  "user_id": "USR-9472",
  "shipping_tier": 4,
  "order_status": 7,
  "priority_level": 2,
  "created_at": "2026-02-15T14:23:00Z"
}

The AI View generates:

// enriched_output.json
{
  "order_id": "ORD-2847",
  "user_id": "USR-9472",
  "shipping_tier": 4,
  "order_status": 7,
  "ai_order_summary": "Your order ships via FedEx Express with an estimated delivery in 2 business days. It's currently in transit to the distribution center. This is a medium-priority order placed on February 15, 2026."
}

Step 3: Index the enriched view in your engine

# agent_engine.yaml
version: v2
name: customer_support_agent
data:
  order_table:
    name: orders_with_enrichment  # Use the AI View as the data source
    type: table
encoder:
  name: text-embedding-3-small
  provider: openai
  columns:
    - name: ai_order_summary  # This is what gets embedded and retrieved
      weight: 1.0
# terminal
$ shaped create-engine --file agent_engine.yaml

Step 4: Query from your agent

# agent_with_shaped.py
import requests

SHAPED_API_KEY = "your-api-key"

def get_order_context(user_query: str, user_id: str):
    """
    Retrieve enriched order context for the agent.
    """
    response = requests.post(
        "https://api.shaped.ai/v2/rank",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "engine_name": "customer_support_agent",
            "query": user_query,
            "candidates": {
                "table": "orders_with_enrichment",
                "filter": f"user_id = '{user_id}'"
            },
            "limit": 5
        }
    )

    results = response.json()

    # Extract enriched summaries
    context = []
    for result in results['results']:
        context.append(result['ai_order_summary'])

    return "\n\n".join(context)


# Agent workflow
user_query = "When will my order arrive?"
user_id = "USR-9472"

# One retrieval call gets enriched context
enriched_context = get_order_context(user_query, user_id)

# Pass to LLM
llm_response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a customer support agent."},
        {"role": "user", "content": f"Context:\n{enriched_context}\n\nQuestion: {user_query}"}
    ]
)

print(llm_response.choices[0].message.content)

What the agent gets back:

# agent_output
Your order ships via FedEx Express with an estimated delivery in 2 business days.
It's currently in transit to the distribution center.
This is a medium-priority order placed on February 15, 2026.

The agent now has semantic, human-readable context in a single retrieval call. No lookup tables, no hydration service, no two-phase queries.

Real-World Examples

Example 1: Product Tiers

Raw data:

// product_tier_raw.json
{
  "product_id": "PROD-4821",
  "tier_code": "PREM_PLUS",
  "feature_flags": 147,
  "max_users": 50
}

AI View prompt:

# product_tier_prompt
Translate the tier code and feature flags into a customer-facing product description.
Tier codes: BASIC, STANDARD, PREM, PREM_PLUS, ENTERPRISE.
Feature flags are bitwise: 1=API access, 2=SSO, 4=Advanced analytics, 8=White-label,
16=Priority support, 32=Dedicated account manager, 64=Custom SLA, 128=On-prem deployment.

Enriched output:

# product_tier_enriched
Premium Plus plan with API access, SSO, advanced analytics, priority support,
and dedicated account manager. Supports up to 50 users.

Example 2: User Segments

Raw data:

// user_segment_raw.json
{
  "user_id": "USR-7724",
  "segment_id": 8,
  "ltv_bucket": 4,
  "churn_risk": 2
}

AI View prompt:

# user_segment_prompt
Translate user segment, LTV bucket, and churn risk into an internal sales summary.
Segments: 1=Trial, 2=Free, 3=Paid Monthly, 4=Paid Annual, 5=Enterprise Trial,
6=Enterprise Paid, 7=Churned, 8=VIP.
LTV buckets: 1=$0-500, 2=$500-2K, 3=$2K-10K, 4=$10K-50K, 5=$50K+.
Churn risk: 0=Low, 1=Medium-Low, 2=Medium, 3=Medium-High, 4=High.

Enriched output:

# user_segment_enriched
VIP customer with lifetime value between $10K-50K.
Currently shows medium churn risk — recommend proactive outreach.

Example 3: Error Codes

Raw data:

// error_code_raw.json
{
  "error_id": "ERR-9284",
  "error_code": "AUTH_403_QUOTA",
  "service": "api_gateway",
  "user_tier": 2
}

AI View prompt:

# error_code_prompt
Translate error codes into user-facing messages.
AUTH_403_QUOTA = User exceeded their API quota for their tier.
Tier 1 = 1K requests/day, Tier 2 = 10K/day, Tier 3 = 100K/day, Tier 4 = Unlimited.

Enriched output:

# error_code_enriched
You've exceeded your API quota of 10,000 requests per day (Tier 2).
Quota resets at midnight UTC. Upgrade to Tier 3 for 100K requests/day.

Comparison: Traditional vs Shaped

ComponentTraditional StackShaped
Lookup tablesPostgreSQL tables (manual maintenance)AI View prompt (declarative)
Hydration serviceCustom Python service (long-running)AI View (automatic, serverless)
Enrichment timingPost-retrieval (every query)Pre-indexing (once per write)
Latency impact+20-80ms per query0ms (materialized at write time)
Schema changesUpdate lookup tables + restart serviceUpdate AI View prompt, re-backfill
Missing codesSilent failures, fallback to codeLLM infers from context in prompt
Infrastructure costLookup service + read replicas (~$800/month)Included in Shaped (~$0)
Code to maintain~400 lines (hydrator + lookups + pooling)~30 lines (YAML + prompt)
Time to production2-4 weeks1-2 days

FAQ

Q: When should I use AI Views vs traditional lookup tables?

Use AI Views when you have structured codes (status IDs, tier codes, feature flags) that need to be translated into human-readable text for agent retrieval. Use traditional lookup tables when you need exact, deterministic mappings that never change or when enrichment logic is too complex for a prompt (multi-step business rules, external API calls).

Q: What happens if the AI View generates incorrect enrichment?

The raw columns are always preserved in the output. You can query both shipping_tier: 4 (exact) and ai_order_summary (enriched). If the enrichment is wrong, update the prompt and re-backfill. The source data is never modified.

Q: How much does AI enrichment cost?

Enrichment runs once per row at write time. For a table with 1M rows and 100 new rows/hour, the LLM cost is approximately $0.01 per 1K enrichments (using GPT-3.5-turbo). Total: approximately $10 for the initial backfill, approximately $0.02/hour ongoing. Compare this to the infrastructure cost of running a dedicated hydration service (approximately $800/month).

Q: Can I enrich multiple columns in one AI View?

Yes. You can generate multiple enriched output columns from the same source columns. For example, generate both ai_order_summary (customer-facing) and ai_internal_notes (support agent context) in one view.

Q: How do I handle codes that aren’t in the prompt?

The LLM will infer from context or return “Unknown tier X”. To handle new codes, update the AI View prompt with the new mappings and trigger a re-backfill for affected rows. This is faster than updating lookup tables and redeploying services.

Q: Does this work with real-time data?

Yes. AI Views run on every insert/update. If an order’s status changes from 7 (in transit) to 8 (out for delivery), the AI View re-enriches that row within 30 seconds. The agent always retrieves the latest enriched summary.

Q: Can I use AI Views for image or audio data?

Yes. AI Views support image URLs in source_columns. You can extract visual metadata (product color, style, defects) from images and materialize it as searchable text. Audio enrichment requires pre-processing (transcription) before the AI View.

Q: What if I need different enrichments for different use cases?

Create multiple AI Views from the same source table. For example, orders_customer_view for customer-facing summaries and orders_support_view for internal support context. Each view gets indexed separately and can be queried independently.

Conclusion

The “status code problem” is structural: databases store codes, agents need sentences. The traditional fix — post-retrieval hydration via lookup tables — works, but it doubles your latency, adds infrastructure, and increases operational complexity.

AI Views solve this at the ingestion layer. Enrichment runs once, at write time. The agent retrieves materialized, human-readable context in a single query. No lookup service, no two-phase retrieval, no inference-time LLM calls.

If your agent is retrieving shipping_tier: 4 instead of “Ships in 2 days via FedEx,” you have a semantic gap. AI Views close it.

Ready to close the semantic gap? Sign up for Shaped and get $300 in free credits. See how AI Views transform your agent’s retrieval quality. Visit console.shaped.ai/register to get started.

Get up and running with one engineer in one sprint

Guaranteed lift within your first 30 days or your money back

100M+
Users and items
1000+
Queries per second
1B+
Requests

Related Posts

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines
Jun 11, 2025
 | 
9

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines

5 Best APIs for Adding Personalized Recommendations to Your App in 2025
Aug 19, 2025
 | 
4

5 Best APIs for Adding Personalized Recommendations to Your App in 2025

Action is All You Need: Dual-Flow Generative Ranking Network for Recommendation
Aug 28, 2025
 | 
6

Action is All You Need: Dual-Flow Generative Ranking Network for Recommendation