GitOps for AI Agents: Versioning Your Retrieval Logic

You wouldn't deploy code without version control. Why deploy your agent's retrieval logic as a hardcoded prompt? Learn how to declare engines, queries, and ranking logic as YAML files that live in your repo and deploy via CI/CD.

GitOps for AI Agents: Versioning Your Retrieval Logic

Quick Answer: Your Agent’s Logic Should Live in Git

You have 47 versions of your application code in Git. You can diff, rollback, and deploy them through CI/CD. But your agent’s retrieval logic—the ranking expressions, filters, signal definitions that determine what context it sees—lives in:

  • A hardcoded string in your application
  • A prompt template in production without version history
  • Manual configuration in a web UI with no audit trail
  • Environment variables scattered across deployment configs

When your agent breaks, you can’t diff what changed. When you need to A/B test retrieval strategies, you duplicate code. When you want to rollback a bad ranking model, you hope you remember the previous configuration.

This is infrastructure as chaos, not infrastructure as code.

Key Takeaways:

  • Retrieval logic is infrastructure — Engines, queries, and ranking models should be versioned like any other code
  • YAML > Hardcoded strings — Declarative configuration files enable diff, review, and rollback
  • Git is your source of truth — Changes to agent behavior go through PR review, not live edits in production
  • CI/CD deploys your agent — Merge to main triggers automated deployment with zero downtime
  • Shaped CLI enables GitOpsshaped create-engine --file engine.yaml deploys from your repo

Time to read: 20 minutes | Includes: 8 code examples, 2 architecture diagrams, 1 deployment workflow


The Version Control Gap

Imagine your product recommendation agent breaks. Conversion rate drops 40% overnight. You need to find what changed.

For your application code, this is trivial:

git log --oneline --since="1 week ago"
git diff HEAD~5 HEAD -- agent/retrieval.py

You see exactly what changed. You can rollback with git revert. You can compare behavior across branches.

But for your agent’s retrieval logic—the part that determines which products get ranked, which filters apply, which signals feed the model—you have:

Option 1: Hardcoded in application

# agent.py (committed to Git ✓)
def get_recommendations(user_id, query):
    results = vector_db.search(
        query=query,
        filter=f"category='electronics' AND price < 500",  # ← Hardcoded
        limit=100
    )
    # Rank by score formula (also hardcoded)
    scored = [(r, r['embedding_score'] * 0.6 + r['popularity'] * 0.4) for r in results]
    return sorted(scored, key=lambda x: x[1], reverse=True)[:10]

This is versioned in Git, but changing the filter requires a code deploy. Every ranking experiment needs a new application release.

Option 2: Environment variables

# .env (NOT in Git ✗)
RETRIEVAL_FILTER="category='electronics' AND price < 500"
RANKING_FORMULA="embedding_score * 0.6 + popularity * 0.4"

This decouples retrieval from code, but now your configuration has no version history. You don’t know when RANKING_FORMULA changed or why.

Option 3: Database-stored config

# Stored in PostgreSQL config table
config = db.query("SELECT value FROM agent_config WHERE key = 'ranking_formula'").fetchone()
ranking_formula = eval(config['value'])  # ← Executing arbitrary strings from DB

Now you have an audit trail (if you log changes), but no diff, no PR review, and no way to test changes in staging before production.

Option 4: Manual UI configuration

You log into a vector DB console and change filters/ranking via web forms. Changes take effect immediately. No history, no rollback, no review.

The Problem

None of these approaches treat retrieval logic as infrastructure—versioned, reviewed, tested, deployed through CI/CD.

When your agent breaks:

  • You can’t diff what changed in retrieval logic vs application code
  • You can’t rollback just the ranking model without a full code deploy
  • You can’t A/B test retrieval strategies across branches
  • You can’t review changes to filters/signals/ranking in a PR
  • You can’t deploy staging → production with confidence

What GitOps for Agents Looks Like

GitOps for agents means: All agent configuration lives in Git as declarative YAML files.

The Repository Structure

my-agent/
├── src/
│   └── agent.py                    # Application code
├── infrastructure/
│   ├── engines/
│   │   ├── product_search.yaml     # Search engine config
│   │   ├── recommendations.yaml    # Recommendation engine
│   │   └── content_feed.yaml       # Feed ranking
│   ├── signals/
│   │   └── features.yaml           # Feature definitions
│   └── queries/
│       ├── personalized_search.sql # Saved query templates
│       └── trending_products.sql
├── .github/
│   └── workflows/
│       └── deploy.yml              # CI/CD pipeline
└── README.md

What’s in Git: Engine definitions (indexing, models, deployment config), signal definitions (features, aggregations, crosses), query templates (ranking expressions, filters), and deployment configuration (replicas, data tier, auto-scaling).

What’s NOT in Git: API keys (stored in GitHub Secrets), model weights (stored in Shaped, versioned automatically), and runtime data (logs, metrics).

The GitOps Workflow

Developer edits engine.yaml
Create PR in GitHub
CI runs validation
shaped validate —file engine.yaml
shaped plan —file engine.yaml
PR review + approval
Merge to main
CD deploys to staging
shaped create-engine —file engine.yaml —env staging
Automated tests pass
CD deploys to production
shaped create-engine —file engine.yaml —env production
Zero-downtime rollout

Key benefits:

  • Every change goes through review — No one can change ranking logic without a PR
  • Diff-ablegit diff shows exactly what changed in retrieval behavior
  • Rollback-ablegit revert + redeploy restores previous config
  • Testable — Deploy to staging first, run automated tests
  • Auditable — Full history of who changed what and when

Part 1: The Traditional Approach (Hardcoded Retrieval)

The standard approach is to hardcode retrieval logic in your application, with configuration split between code, environment variables, and manual UI settings.

Architecture

Agent Application (Python)
Hardcoded filters · .env ranking · DB config
Fetch User State
Redis / PostgreSQL
Separate service call
No link to retrieval
Vector Search
Pinecone / Weaviate
Hardcoded filter expressions
No version history
Manual Re-rank
Custom Python scoring logic
Formula lives in .env
Changed manually
Results returned to agent

Problems:

  • Configuration is fragmented (code + env + UI)
  • No single source of truth
  • Changes require code deploys OR manual UI updates
  • No version history for non-code config
  • Difficult to A/B test or stage changes

Implementation

Step 1: Hardcoded retrieval in application

# agent.py
import os
from pinecone import Pinecone

pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
index = pc.Index("product-index")

def search_products(user_id: str, query: str):
    """
    Search for products.
    All logic is hardcoded or in env vars.
    """
    from sentence_transformers import SentenceTransformer
    embedder = SentenceTransformer('all-MiniLM-L6-v2')
    query_vector = embedder.encode(query).tolist()

    results = index.query(
        vector=query_vector,
        filter={
            "category": {"$eq": "electronics"},
            "price": {"$lt": 500},
            "in_stock": {"$eq": True}
        },
        top_k=100,
        include_metadata=True
    )

    ranking_formula = os.getenv(
        'RANKING_FORMULA',
        'embedding_score * 0.6 + popularity * 0.4'
    )

    scored_results = []
    for match in results['matches']:
        embedding_score = match['score']
        popularity = match['metadata'].get('view_count', 0) / 10000
        score = embedding_score * 0.6 + popularity * 0.4
        scored_results.append({
            'product_id': match['id'],
            'name': match['metadata']['name'],
            'score': score
        })

    scored_results.sort(key=lambda x: x['score'], reverse=True)
    return scored_results[:10]

Step 2: Environment configuration

# .env (not in version control)
PINECONE_API_KEY=your-key
RANKING_FORMULA=embedding_score * 0.6 + popularity * 0.4
TOP_K=100
PRICE_LIMIT=500

What You’re Operating

ComponentConfig MethodVersion ControlRollback
Filter logicHardcoded PythonYes (Git)Code deploy
Ranking formulaEnvironment variableNoManual change + restart
Index settingsVector DB UINoRemember + recreate
Embedding modelHardcoded in codeYes (Git)Code deploy
Top-K / limitsEnvironment variableNoManual change + restart

The problem: Changing retrieval behavior requires mixing code deploys, environment updates, and manual UI changes. There’s no single workflow, no unified version history.


Part 2: The Shaped Way — Declarative Engines

Shaped treats engines as declarative infrastructure. You define your entire retrieval pipeline in a YAML file that lives in Git.

Architecture

Git Repository
infrastructure/engines/product_search.yaml
CI/CD Pipeline
GitHub Actions
shaped validate
shaped plan
Shaped CLI
shaped create-engine —file …
Shaped Control Plane
✓ Validates config    ✓ Compiles to execution graph    ✓ Deploys with zero downtime    ✓ Versions model weights
Agent queries Shaped API
All config from Git — nothing hardcoded

Everything is versioned, everything is code.

Implementation

Step 1: Define engine in YAML

# infrastructure/engines/product_search.yaml
version: v2
name: product_search
data:
  item_table:
    name: products
    type: table
  user_table:
    name: users
    type: table
  interaction_table:
    name: user_clicks
    type: table

index:
  - name: product_embedding
    encoder:
      name: text-embedding-3-small
      provider: openai
      columns:
        - name: product_name
          weight: 0.6
        - name: description
          weight: 0.4

training:
  models:
    - name: collaborative_ranker
      policy_type: elsa
      strategy: early_stopping

deployment:
  data_tier: fast_tier
  server:
    worker_count: 4
  autoscaling:
    enabled: true
    min_instances: 2
    max_instances: 10
    target_qps: 1000

Step 2: Define ranking logic in ShapedQL

-- infrastructure/queries/personalized_search.sql
SELECT 
  item_id,
  product_name,
  price,
  category,
  score(
    expression = '
      (1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.6
      + (1.0 / (1.0 + item._derived_popular_rank)) * 0.3
      + user_category_affinity * 0.1
    '
  ) as final_score
FROM products
WHERE 
  category = 'electronics'
  AND price < 500
  AND in_stock = true
ORDER BY final_score DESC
LIMIT 10

Step 3: Deploy via CLI in CI/CD

# .github/workflows/deploy-agent.yml
name: Deploy Agent Infrastructure

on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/engines/**'
      - 'infrastructure/queries/**'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install Shaped CLI
        run: pip install shaped-cli
      
      - name: Validate engine config
        run: shaped validate --file infrastructure/engines/product_search.yaml
        env:
          SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY }}
      
      - name: Deploy to staging
        run: |
          shaped create-engine \
            --file infrastructure/engines/product_search.yaml \
            --env staging
        env:
          SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY_STAGING }}
      
      - name: Run integration tests
        run: python tests/test_retrieval.py --env staging
      
      - name: Deploy to production
        run: |
          shaped create-engine \
            --file infrastructure/engines/product_search.yaml \
            --env production
        env:
          SHAPED_API_KEY: ${{ secrets.SHAPED_API_KEY_PROD }}

Step 4: Query from application (no config in code)

# agent.py — now just API calls, no retrieval config
import requests
import os

SHAPED_API_KEY = os.getenv('SHAPED_API_KEY')

def search_products(user_id: str, query: str):
    response = requests.post(
        "https://api.shaped.ai/v2/rank",
        headers={"x-api-key": SHAPED_API_KEY},
        json={
            "engine_name": "product_search",
            "user_id": user_id,
            "query": query,
            # Ranking logic lives in Git, not here
        }
    )
    return response.json()['results']

Now your application code is thin—it just calls the Shaped API. All retrieval logic lives in version-controlled YAML.


The GitOps Workflow in Practice

1. Make Changes via PR

A developer wants to change the ranking formula:

# Create branch
git checkout -b experiment/boost-recency

# Edit engine config
vim infrastructure/engines/product_search.yaml
--- a/infrastructure/engines/product_search.yaml
+++ b/infrastructure/engines/product_search.yaml
@@ -23,7 +23,8 @@ training:
     - name: collaborative_ranker
       policy_type: elsa
       ranking_expression: |
-        (1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.6
-        + (1.0 / (1.0 + item._derived_popular_rank)) * 0.4
+        (1.0 / (1.0 + rank(embedding="product_embedding"))) * 0.5
+        + (1.0 / (1.0 + item._derived_popular_rank)) * 0.3
+        + (1.0 / (1.0 + days_since_published)) * 0.2

Reviewers can see exactly what changed—recency boost added at 20% weight, embedding down from 60% to 50%, popularity down from 40% to 30%. They can comment on the weights, suggest alternatives, and approve once satisfied.

2. CI Validates

shaped validate --file infrastructure/engines/product_search.yaml
# ✓ YAML is valid
# ✓ All referenced columns exist in data tables
# ✓ Ranking expression syntax is correct
# ✓ No breaking changes detected

3. Deploy to Staging, Test, Promote

# tests/test_retrieval.py
def test_recency_boost():
    """Verify recent products rank higher after change"""
    results = search_products(user_id="test-user", query="laptop")
    recent_items = [r for r in results[:5] if r['days_since_published'] < 7]
    assert len(recent_items) >= 2, "Recent items should rank higher"

4. Rollback if Needed

git revert HEAD
git push origin main
# CI/CD automatically redeploys previous config

Or manually:

shaped rollback-engine --name product_search --to-version v23

Real-World Examples

Example 1: A/B Test Ranking Strategies

Deploy two engine variants simultaneously, split traffic in application code, measure, commit the winner to main.

# infrastructure/engines/product_search_popular.yaml
training:
  models:
    - name: popular_ranker
      ranking_expression: |
        embedding * 0.4 + popularity * 0.6
# infrastructure/engines/product_search_personalized.yaml
training:
  models:
    - name: personalized_ranker
      ranking_expression: |
        embedding * 0.3 + user_affinity * 0.7
# agent.py
import random

def search_products(user_id, query):
    engine = "product_search_popular" if random.random() < 0.5 else "product_search_personalized"
    return shaped.rank(engine_name=engine, user_id=user_id, query=query)

Example 2: Multi-Environment Config

Staging runs an aggressive recency experiment while production stays conservative—two branches, two YAML files, same workflow.

# staging branch: product_search.yaml
ranking_expression: |
  embedding * 0.3 + recency * 0.5 + popularity * 0.2
# main branch: product_search.yaml
ranking_expression: |
  embedding * 0.5 + recency * 0.2 + popularity * 0.3

Comparison: Traditional vs. GitOps with Shaped

AspectTraditionalGitOps w/ Shaped
Version ControlPartial (code only)Complete (config + code)
RollbackFull app redeploySingle git revert
A/B TestingManual code duplicationBranch-based experiments
Review ProcessCode + manual UI changesSingle PR diff
DeploymentTied to releasesIndependent CI/CD
Audit TrailGit log onlyGit + Shaped versioning
Debugging”Why did this change?” → ???”Why did this change?” → git log -p

FAQ

Q: Do I have to use Shaped to do GitOps for agents?

Not necessarily. You can build similar workflows with any infrastructure that supports declarative configuration (Terraform, Ansible, custom tools). Shaped is purpose-built for retrieval, so the workflow is simpler.

Q: Won’t GitOps slow down my experimentation?

The opposite. PR review actually accelerates experiments because multiple people can spot issues before production, you can revert instantly if something breaks, staging deploys are automatic and fast, and you have a complete history of what worked.

Q: What if I need to change ranking in real-time (e.g., emergency flash sale)?

Shaped supports both static (versioned) and dynamic configuration. You can have base rankings in Git and override them via API for time-limited events.

Q: How do I handle secrets (API keys, model credentials)?

Keep secrets in GitHub Secrets or your CI/CD provider. The YAML references them by name but never contains actual values.

Q: Can I revert just the ranking formula without changing filters?

Yes. Each component (embeddings, models, filters, signals) is independently versioned. You can rollback any single piece.


Getting Started

To implement GitOps for your AI agents:

  1. Audit your current setup — Where does your retrieval logic live? (code, env, UI, DB?)
  2. Export to declarative format — Convert it to YAML. This is a one-time effort.
  3. Set up CI/CD — Add validation and deployment steps to your pipeline
  4. Build in stages — Start with staging, validate, then production
  5. Monitor and iterate — Track which experiments work, commit winners to main

The investment pays for itself the first time you need to rollback, debug, or run an A/B test.

Get up and running with one engineer in one sprint

Guaranteed lift within your first 30 days or your money back

100M+
Users and items
1000+
Queries per second
1B+
Requests

Related Posts

$1.9M Funding Round
Apr 27, 2022
 | 
2

$1.9M Funding Round

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines
Jun 11, 2025
 | 
9

10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines

5 Best APIs for Adding Personalized Recommendations to Your App in 2025
Aug 19, 2025
 | 
4

5 Best APIs for Adding Personalized Recommendations to Your App in 2025