Quick Answer: Your Agent’s Logic Should Live in Git
You have 47 versions of your application code in Git. You can diff, rollback, and deploy them through CI/CD. But your agent’s retrieval logic—the ranking expressions, filters, signal definitions that determine what context it sees—lives in:
- A hardcoded string in your application
- A prompt template in production without version history
- Manual configuration in a web UI with no audit trail
- Environment variables scattered across deployment configs
When your agent breaks, you can’t diff what changed. When you need to A/B test retrieval strategies, you duplicate code. When you want to rollback a bad ranking model, you hope you remember the previous configuration.
This is infrastructure as chaos, not infrastructure as code.
Key Takeaways:
- Retrieval logic is infrastructure — Engines, queries, and ranking models should be versioned like any other code
- YAML > Hardcoded strings — Declarative configuration files enable diff, review, and rollback
- Git is your source of truth — Changes to agent behavior go through PR review, not live edits in production
- CI/CD deploys your agent — Merge to main triggers automated deployment with zero downtime
- Shaped CLI enables GitOps —
shaped create-engine --file engine.yamldeploys from your repo
Time to read: 20 minutes | Includes: 8 code examples, 2 architecture diagrams, 1 deployment workflow
The Version Control Gap
Imagine your product recommendation agent breaks. Conversion rate drops 40% overnight. You need to find what changed.
For your application code, this is trivial:
git log --oneline --since="1 week ago"
git diff HEAD~5 HEAD -- agent/retrieval.py
You see exactly what changed. You can rollback with git revert. You can compare behavior across branches.
But for your agent’s retrieval logic—the part that determines which products get ranked, which filters apply, which signals feed the model—you have:
Option 1: Hardcoded in application
# agent.py (committed to Git ✓)
def get_recommendations(user_id, query):
results = vector_db.search(
query=query,
filter=f"category='electronics' AND price < 500", # ← Hardcoded
limit=100
)
# Rank by score formula (also hardcoded)
scored = [(r, r['embedding_score'] * 0.6 + r['popularity'] * 0.4) for r in results]
return sorted(scored, key=lambda x: x[1], reverse=True)[:10]
This is versioned in Git, but changing the filter requires a code deploy. Every ranking experiment needs a new application release.
Option 2: Environment variables
# .env (NOT in Git ✗)
RETRIEVAL_FILTER="category='electronics' AND price < 500"
RANKING_FORMULA="embedding_score * 0.6 + popularity * 0.4"
This decouples retrieval from code, but now your configuration has no version history. You don’t know when RANKING_FORMULA changed or why.
Option 3: Database-stored config
# Stored in PostgreSQL config table
config = db.query("SELECT value FROM agent_config WHERE key = 'ranking_formula'").fetchone()
ranking_formula = eval(config['value']) # ← Executing arbitrary strings from DB
Now you have an audit trail (if you log changes), but no diff, no PR review, and no way to test changes in staging before production.
Option 4: Manual UI configuration
You log into a vector DB console and change filters/ranking via web forms. Changes take effect immediately. No history, no rollback, no review.
The Problem
None of these approaches treat retrieval logic as infrastructure—versioned, reviewed, tested, deployed through CI/CD.
When your agent breaks:
- You can’t diff what changed in retrieval logic vs application code
- You can’t rollback just the ranking model without a full code deploy
- You can’t A/B test retrieval strategies across branches
- You can’t review changes to filters/signals/ranking in a PR
- You can’t deploy staging → production with confidence
What GitOps for Agents Looks Like
GitOps for agents means: All agent configuration lives in Git as declarative YAML files.
The Repository Structure
my-agent/
├── src/
│ └── agent.py # Application code
├── infrastructure/
│ ├── engines/
│ │ ├── product_search.yaml # Search engine config
│ │ ├── recommendations.yaml # Recommendation engine
│ │ └── content_feed.yaml # Feed ranking
│ ├── signals/
│ │ └── features.yaml # Feature definitions
│ └── queries/
│ ├── personalized_search.sql # Saved query templates
│ └── trending_products.sql
├── .github/
│ └── workflows/
│ └── deploy.yml # CI/CD pipeline
└── README.md
What’s in Git: Engine definitions (indexing, models, deployment config), signal definitions (features, aggregations, crosses), query templates (ranking expressions, filters), and deployment configuration (replicas, data tier, auto-scaling).
What’s NOT in Git: API keys (stored in GitHub Secrets), model weights (stored in Shaped, versioned automatically), and runtime data (logs, metrics).
The GitOps Workflow
shaped plan —file engine.yaml
Key benefits:
- Every change goes through review — No one can change ranking logic without a PR
- Diff-able —
git diffshows exactly what changed in retrieval behavior - Rollback-able —
git revert+ redeploy restores previous config - Testable — Deploy to staging first, run automated tests
- Auditable — Full history of who changed what and when
Part 1: The Traditional Approach (Hardcoded Retrieval)
The standard approach is to hardcode retrieval logic in your application, with configuration split between code, environment variables, and manual UI settings.
Architecture
Problems:
- Configuration is fragmented (code + env + UI)
- No single source of truth
- Changes require code deploys OR manual UI updates
- No version history for non-code config
- Difficult to A/B test or stage changes
Implementation
Step 1: Hardcoded retrieval in application
# agent.py
import os
from pinecone import Pinecone
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
index = pc.Index("product-index")
def search_products(user_id: str, query: str):
"""
Search for products.
All logic is hardcoded or in env vars.
"""
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer('all-MiniLM-L6-v2')
query_vector = embedder.encode(query).tolist()
results = index.query(
vector=query_vector,
filter={
"category": {"$eq": "electronics"},
"price": {"$lt": 500},
"in_stock": {"$eq": True}
},
top_k=100,
include_metadata=True
)
ranking_formula = os.getenv(
'RANKING_FORMULA',
'embedding_score * 0.6 + popularity * 0.4'
)
scored_results = []
for match in results['matches']:
embedding_score = match['score']
popularity = match['metadata'].get('view_count', 0) / 10000
score = embedding_score * 0.6 + popularity * 0.4
scored_results.append({
'product_id': match['id'],
'name': match['metadata']['name'],
'score': score
})
scored_results.sort(key=lambda x: x['score'], reverse=True)
return scored_results[:10]
Step 2: Environment configuration
# .env (not in version control)
PINECONE_API_KEY=your-key
RANKING_FORMULA=embedding_score * 0.6 + popularity * 0.4
TOP_K=100
PRICE_LIMIT=500
What You’re Operating
| Component | Config Method | Version Control | Rollback |
|---|---|---|---|
| Filter logic | Hardcoded Python | Yes (Git) | Code deploy |
| Ranking formula | Environment variable | No | Manual change + restart |
| Index settings | Vector DB UI | No | Remember + recreate |
| Embedding model | Hardcoded in code | Yes (Git) | Code deploy |
| Top-K / limits | Environment variable | No | Manual change + restart |
The problem: Changing retrieval behavior requires mixing code deploys, environment updates, and manual UI changes. There’s no single workflow, no unified version history.
Part 2: The Shaped Way — Declarative Engines
Shaped treats engines as declarative infrastructure. You define your entire retrieval pipeline in a YAML file that lives in Git.
Architecture
✓ Validates config ✓ Compiles to execution graph ✓ Deploys with zero downtime ✓ Versions model weights
Everything is versioned, everything is code.
Implementation
Step 1: Define engine in YAML
# infrastructure/engines/product_search.yaml
version: v2
name: product_search
data:
item_table:
name: products
type: table
user_table:
name: users
type: table
interaction_table:
name: user_clicks
type: table
index:
- name: product_embedding
encoder:
name: text-embedding-3-small
provider: openai
columns:
- name: product_name
weight: 0.6
- name: description
weight: 0.4
training:
models:
- name: collaborative_ranker
policy_type: elsa
strategy: early_stopping
deployment:
data_tier: fast_tier
server:
worker_count: 4
autoscaling:
enabled: true
min_instances: 2
max_instances: 10
target_qps: 1000
Step 2: Define ranking logic in ShapedQL
-- infrastructure/queries/personalized_search.sql
SELECT
item_id,
product_name,
price,
category,
score(
expression = '
(1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.6
+ (1.0 / (1.0 + item._derived_popular_rank)) * 0.3
+ user_category_affinity * 0.1
'
) as final_score
FROM products
WHERE
category = 'electronics'
AND price < 500
AND in_stock = true
ORDER BY final_score DESC
LIMIT 10
Step 3: Deploy via CLI in CI/CD
# .github/workflows/deploy-agent.yml
name: Deploy Agent Infrastructure
on:
push:
branches: [main]
paths:
- 'infrastructure/engines/**'
- 'infrastructure/queries/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Shaped CLI
run: pip install shaped-cli
- name: Validate engine config
run: shaped validate --file infrastructure/engines/product_search.yaml
env:
SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY }}
- name: Deploy to staging
run: |
shaped create-engine \
--file infrastructure/engines/product_search.yaml \
--env staging
env:
SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY_STAGING }}
- name: Run integration tests
run: python tests/test_retrieval.py --env staging
- name: Deploy to production
run: |
shaped create-engine \
--file infrastructure/engines/product_search.yaml \
--env production
env:
SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY_PROD }}
Step 4: Query from application (no config in code)
# agent.py — now just API calls, no retrieval config
import requests
import os
SHAPED_API_KEY = os.getenv('SHAPED_API_KEY')
def search_products(user_id: str, query: str):
response = requests.post(
"https://api.shaped.ai/v2/rank",
headers={"x-api-key": SHAPED_API_KEY},
json={
"engine_name": "product_search",
"user_id": user_id,
"query": query,
# Ranking logic lives in Git, not here
}
)
return response.json()['results']
Now your application code is thin—it just calls the Shaped API. All retrieval logic lives in version-controlled YAML.
The GitOps Workflow in Practice
1. Make Changes via PR
A developer wants to change the ranking formula:
# Create branch
git checkout -b experiment/boost-recency
# Edit engine config
vim infrastructure/engines/product_search.yaml
--- a/infrastructure/engines/product_search.yaml
+++ b/infrastructure/engines/product_search.yaml
@@ -23,7 +23,8 @@ training:
- name: collaborative_ranker
policy_type: elsa
ranking_expression: |
- (1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.6
- + (1.0 / (1.0 + item._derived_popular_rank)) * 0.4
+ (1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.5
+ + (1.0 / (1.0 + item._derived_popular_rank)) * 0.3
+ + (1.0 / (1.0 + days_since_published)) * 0.2
Reviewers can see exactly what changed—recency boost added at 20% weight, embedding down from 60% to 50%, popularity down from 40% to 30%. They can comment on the weights, suggest alternatives, and approve once satisfied.
2. CI Validates
shaped validate --file infrastructure/engines/product_search.yaml
# ✓ YAML is valid
# ✓ All referenced columns exist in data tables
# ✓ Ranking expression syntax is correct
# ✓ No breaking changes detected
3. Deploy to Staging, Test, Promote
# tests/test_retrieval.py
def test_recency_boost():
"""Verify recent products rank higher after change"""
results = search_products(user_id="test-user", query="laptop")
recent_items = [r for r in results[:5] if r['days_since_published'] < 7]
assert len(recent_items) >= 2, "Recent items should rank higher"
4. Rollback if Needed
git revert HEAD
git push origin main
# CI/CD automatically redeploys previous config
Or manually:
shaped rollback-engine --name product_search --to-version v23
Real-World Examples
Example 1: A/B Test Ranking Strategies
Deploy two engine variants simultaneously, split traffic in application code, measure, commit the winner to main.
# infrastructure/engines/product_search_popular.yaml
training:
models:
- name: popular_ranker
ranking_expression: |
embedding * 0.4 + popularity * 0.6
# infrastructure/engines/product_search_personalized.yaml
training:
models:
- name: personalized_ranker
ranking_expression: |
embedding * 0.3 + user_affinity * 0.7
# agent.py
import random
def search_products(user_id, query):
engine = "product_search_popular" if random.random() < 0.5 else "product_search_personalized"
return shaped.rank(engine_name=engine, user_id=user_id, query=query)
Example 2: Multi-Environment Config
Staging runs an aggressive recency experiment while production stays conservative—two branches, two YAML files, same workflow.
# staging branch: product_search.yaml
ranking_expression: |
embedding * 0.3 + recency * 0.5 + popularity * 0.2
# main branch: product_search.yaml
ranking_expression: |
embedding * 0.5 + recency * 0.2 + popularity * 0.3
Comparison: Traditional vs. GitOps with Shaped
| Aspect | Traditional | GitOps w/ Shaped |
|---|---|---|
| Version Control | Partial (code only) | Complete (config + code) |
| Rollback | Full app redeploy | Single git revert |
| A/B Testing | Manual code duplication | Branch-based experiments |
| Review Process | Code + manual UI changes | Single PR diff |
| Deployment | Tied to releases | Independent CI/CD |
| Audit Trail | Git log only | Git + Shaped versioning |
| Debugging | ”Why did this change?” → ??? | ”Why did this change?” → git log -p |
FAQ
Q: Do I have to use Shaped to do GitOps for agents?
Not necessarily. You can build similar workflows with any infrastructure that supports declarative configuration (Terraform, Ansible, custom tools). Shaped is purpose-built for retrieval, so the workflow is simpler.
Q: Won’t GitOps slow down my experimentation?
The opposite. PR review actually accelerates experiments because multiple people can spot issues before production, you can revert instantly if something breaks, staging deploys are automatic and fast, and you have a complete history of what worked.
Q: What if I need to change ranking in real-time (e.g., emergency flash sale)?
Shaped supports both static (versioned) and dynamic configuration. You can have base rankings in Git and override them via API for time-limited events.
Q: How do I handle secrets (API keys, model credentials)?
Keep secrets in GitHub Secrets or your CI/CD provider. The YAML references them by name but never contains actual values.
Q: Can I revert just the ranking formula without changing filters?
Yes. Each component (embeddings, models, filters, signals) is independently versioned. You can rollback any single piece.
Getting Started
To implement GitOps for your AI agents:
- Audit your current setup — Where does your retrieval logic live? (code, env, UI, DB?)
- Export to declarative format — Convert it to YAML. This is a one-time effort.
- Set up CI/CD — Add validation and deployment steps to your pipeline
- Build in stages — Start with staging, validate, then production
- Monitor and iterate — Track which experiments work, commit winners to main
The investment pays for itself the first time you need to rollback, debug, or run an A/B test.