← All Posts
ai-mlApril 19, 2026prompt-engineeringllmazure-openaicosmos-dbsemantic-kernellangfuseproductionenterprise-aiversioning

Prompt Engineering in Production — Part 1: Anatomy, Storage, and Versioning

Prompts are not strings in your code. They are versioned, audited, environment-aware artifacts stored in Cosmos DB and Git, retrieved via a Prompt SDK, and deployed with the same discipline as application code. Part 1 of 4.

A prompt is not a string in your code. In production, it is a versioned artifact — with an environment, an owner, an audit trail, a deployment pipeline, and a rollback procedure.

The teams that treat prompts as code comments ship fast and break things in ways they can't explain six months later. The teams that build prompt infrastructure ship slower initially and operate with confidence at scale.

This is Part 1 of a 4-part series on production prompt engineering. We start with the foundation: what a prompt actually is, how to store it, how to version it, and how your application code references it at runtime — in both open source and Azure stacks.


Part 1 covers:

  • The four layers of a production prompt
  • Why prompts must leave your codebase
  • Storage: Cosmos DB vs Git-based versioning
  • The Prompt SDK — how code references prompts at runtime
  • Environment promotion: dev → staging → prod
  • Rollback and blue/green prompt deployment

The Four Layers of a Production Prompt

Every LLM request is built from four distinct prompt layers. Each layer has a different owner, a different change frequency, and a different risk profile.

Why layer ownership matters: when a response is wrong, you need to know which layer caused it. A system prompt bug affects every user. A RAG context bug affects queries on a specific topic. A user message manipulation is a security event. Layered architecture is also layered debugging.

Layer 1 — System Prompt

The system prompt defines the model's identity, constraints, and behavior for the entire session. It's the highest-risk layer because it affects every user on every request.

You are SO, a mortgage loan assistant for MortgageIQ. You help loan officers 
understand loan guidelines, eligibility requirements, and underwriting decisions.

Rules:
- Always cite the specific guideline section and document version for any factual claim
- Never provide a final loan approval or denial — flag exceptions for human review
- If a question is outside mortgage lending, respond: "I can only assist with 
  mortgage-related questions."
- Respond in the same language as the user's question
- Format responses with: Summary (2 sentences), Details (bullet points), Source Citations

You are operating under FHA/VA/Conventional loan guidelines as of {{guideline_version}}.
Current date: {{current_date}}
User role: {{user_role}}
Tenant: {{tenant_id}}

Template variables ({{guideline_version}}, {{user_role}}) are resolved at runtime from the prompt store — they make a single prompt template serve multiple contexts without hardcoding.

Layer 2 — Few-Shot Examples

Few-shot examples teach the model the expected input/output pattern. They're more stable than the system prompt but change when the desired behavior changes.

Example 1:
User: What's the maximum DTI for a conventional loan?
Assistant: **Summary:** Conventional loans generally allow a maximum DTI of 45%, 
with exceptions up to 50% for strong compensating factors.
**Details:**
- Standard limit: 45% back-end DTI (Fannie Mae B3-6-02)
- Exception: up to 50% with 720+ credit score and 12 months reserves
- Front-end DTI: no hard limit, lender overlay typically 36%
**Source:** Fannie Mae Selling Guide B3-6-02 (2025-Q1)

Example 2:
User: Can a borrower use gift funds for FHA down payment?
Assistant: **Summary:** Yes — FHA allows 100% of the down payment to come from 
gift funds with no minimum borrower contribution required.
...

Layer 3 — RAG Context

Retrieved chunks injected per query. The template wrapper matters as much as the chunks themselves:

Context from MortgageIQ Knowledge Base:
---
[Source: FHA Handbook 4000.1, Section II.A.4.c, p.147, v2025-Q1]
The maximum qualifying ratios for FHA loans are 31% housing expense ratio 
and 43% total debt ratio. Exceptions may be granted up to 50% with documented 
compensating factors...
---
[Source: FHA Handbook 4000.1, Section II.A.4.d, p.149, v2025-Q1]
Acceptable compensating factors include: verified and documented cash reserves 
equal to at least three months of total housing payment...
---
Use only the above sources to answer the question. If the answer is not in the 
sources, state: "I don't have reliable information on this in the current guidelines."

Layer 4 — User Message

The raw user query. This is the attack surface for prompt injection — covered in Part 3.


Why Prompts Must Leave Your Codebase

The worst pattern in production LLM systems:

# ❌ Anti-pattern — prompt hardcoded in application code
def get_response(user_query: str) -> str:
    system_prompt = """You are a mortgage assistant. Be helpful.
    Always cite sources. Don't approve loans."""  # v??? deployed when???
    
    return openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_query}
        ]
    )

Problems with this:

  • No versioning — what prompt was live on March 3rd when the compliance audit found a wrong answer?
  • No rollback — a bad prompt change requires a code deployment to fix
  • No environment separation — dev and prod run the same prompt (or different hardcoded ones scattered across branches)
  • No A/B testing — you can't test two system prompts without a code branch
  • No non-engineer access — product and legal can't update prompt behavior without a PR
  • No audit trail — no record of who changed what and when

The correct pattern treats prompts as configuration — externalized, versioned, and retrieved at runtime.


Storage Architecture

Production prompt storage needs two things: durable versioned history (Git) and fast runtime retrieval (database).

Git-Based Versioning

Prompts live in a dedicated Git repository (or a /prompts directory in your monorepo). Each prompt is a YAML file with full metadata:

# prompts/mortgage-assistant/system-prompt.v1.2.0.yaml
apiVersion: prompts/v1
kind: SystemPrompt

metadata:
  name: mortgage-assistant-system
  version: "1.2.0"
  previous_version: "1.1.3"
  status: stable                    # draft | review | staging | stable | deprecated
  owner: ml-platform-team
  approvers:
    - jane.smith@mortgageiq.com     # compliance
    - raj.patel@mortgageiq.com      # product
  created_at: "2026-04-01T09:00:00Z"
  approved_at: "2026-04-03T14:22:00Z"
  changelog: |
    v1.2.0: Added explicit citation format requirement per compliance review CR-2048.
            Restricted loan approval language per legal review LR-0312.
    v1.1.3: Added Spanish language support via respond-in-user-language rule.
    v1.1.0: Added user role context variable.

config:
  model: gpt-4o
  temperature: 0.1                  # low — deterministic for compliance
  max_tokens: 1500
  top_p: 0.95

template: |
  You are SO, a mortgage loan assistant for MortgageIQ.
  You help {{user_role}} understand loan guidelines and underwriting decisions.

  Rules:
  - Always cite the specific guideline section and document version
  - Never provide a final loan approval or denial
  - If outside mortgage lending: "I can only assist with mortgage-related questions."
  - Respond in the same language as the user's question
  - Format: Summary → Details → Source Citations

  Guidelines version: {{guideline_version}}
  Current date: {{current_date}}
  Tenant: {{tenant_id}}
  User role: {{user_role}}

variables:
  required:
    - user_role
    - tenant_id
    - current_date
    - guideline_version
  optional:
    - user_name
    - preferred_language

token_estimate:
  template_tokens: 187
  max_variable_tokens: 50
  total_reserved: 237

tags:
  - mortgage
  - loan-officer
  - compliance-reviewed
  - pii-safe

Semantic versioning for prompts:

  • Patch (1.2.0 → 1.2.1): typo fix, minor wording, does not change behavior
  • Minor (1.2.0 → 1.3.0): adds new capability or constraint, backward compatible
  • Major (1.2.0 → 2.0.0): breaking change — different output format, removed capability, requires re-evaluation

Cosmos DB — Runtime Storage

Git is for history. Cosmos DB is for fast runtime retrieval. The CI pipeline promotes approved prompts from Git into Cosmos DB per environment.

Schema:

{
  "id": "mortgage-assistant-system-v1.2.0",
  "partitionKey": "mortgage-assistant",
  "promptName": "mortgage-assistant-system",
  "version": "1.2.0",
  "status": "stable",
  "environment": "production",
  "template": "You are SO, a mortgage loan assistant...",
  "config": {
    "model": "gpt-4o",
    "temperature": 0.1,
    "maxTokens": 1500
  },
  "variables": {
    "required": ["user_role", "tenant_id", "current_date", "guideline_version"],
    "optional": ["user_name"]
  },
  "tokenEstimate": 237,
  "owner": "ml-platform-team",
  "approvedBy": ["jane.smith@mortgageiq.com", "raj.patel@mortgageiq.com"],
  "approvedAt": "2026-04-03T14:22:00Z",
  "changelog": "v1.2.0: Added citation format requirement...",
  "tags": ["mortgage", "compliance-reviewed"],
  "createdAt": "2026-04-01T09:00:00Z",
  "_ts": 1743685340
}

Cosmos DB container design:

  • Partition key: promptName — all versions of a prompt in the same partition, fast version lookup
  • Unique constraint: promptName + version + environment — prevents duplicate deployments
  • TTL: none on production records — prompts are compliance artifacts, never auto-deleted
  • Change feed: triggers cache invalidation in the application layer when a new version is promoted

The Prompt SDK

The Prompt SDK is the interface between your application code and the prompt store. It abstracts storage, caching, variable resolution, and version pinning.

Azure Stack — Python SDK

# prompt_sdk/client.py
from azure.cosmos import CosmosClient
from azure.core.credentials import DefaultAzureCredential
from functools import lru_cache
import re
from datetime import datetime, date

class PromptClient:
    def __init__(self, cosmos_url: str, database: str, container: str, env: str):
        self.client = CosmosClient(
            url=cosmos_url,
            credential=DefaultAzureCredential()   # managed identity — no secrets
        )
        self.container = (
            self.client
            .get_database_client(database)
            .get_container_client(container)
        )
        self.env = env
        self._cache: dict[str, dict] = {}         # in-process cache

    def get_prompt(
        self,
        name: str,
        version: str = "stable",                 # "stable" | "latest" | "1.2.0"
        variables: dict = None
    ) -> "ResolvedPrompt":
        
        cache_key = f"{name}:{version}:{self.env}"
        
        # Cache hit — prompts change rarely, cache for 5 minutes
        if cache_key in self._cache:
            prompt_doc = self._cache[cache_key]
        else:
            prompt_doc = self._fetch(name, version)
            self._cache[cache_key] = prompt_doc
        
        # Resolve template variables
        resolved_template = self._resolve(prompt_doc["template"], variables or {})
        
        return ResolvedPrompt(
            name=name,
            version=prompt_doc["version"],
            template=resolved_template,
            config=prompt_doc["config"],
            token_estimate=prompt_doc["tokenEstimate"],
            prompt_id=prompt_doc["id"]            # for audit logging
        )

    def _fetch(self, name: str, version: str) -> dict:
        if version == "stable":
            query = """
                SELECT TOP 1 * FROM c
                WHERE c.promptName = @name
                  AND c.environment = @env
                  AND c.status = 'stable'
                ORDER BY c._ts DESC
            """
        elif version == "latest":
            query = """
                SELECT TOP 1 * FROM c
                WHERE c.promptName = @name
                  AND c.environment = @env
                ORDER BY c._ts DESC
            """
        else:
            # Exact version pin
            query = """
                SELECT TOP 1 * FROM c
                WHERE c.promptName = @name
                  AND c.version = @version
                  AND c.environment = @env
            """
        
        params = [
            {"name": "@name", "value": name},
            {"name": "@env", "value": self.env},
            {"name": "@version", "value": version}
        ]
        
        results = list(self.container.query_items(
            query=query,
            parameters=params,
            enable_cross_partition_query=False    # partition key = promptName
        ))
        
        if not results:
            raise PromptNotFoundError(f"Prompt '{name}' version '{version}' not found in {self.env}")
        
        return results[0]

    def _resolve(self, template: str, variables: dict) -> str:
        # Inject current_date automatically if not provided
        variables.setdefault("current_date", date.today().isoformat())
        
        # Replace {{variable}} placeholders
        def replace(match):
            key = match.group(1).strip()
            if key not in variables:
                raise MissingVariableError(f"Required variable '{key}' not provided")
            return str(variables[key])
        
        return re.sub(r'\{\{(\w+)\}\}', replace, template)

    def invalidate_cache(self, name: str = None):
        if name:
            self._cache = {k: v for k, v in self._cache.items() if not k.startswith(f"{name}:")}
        else:
            self._cache.clear()


class ResolvedPrompt:
    def __init__(self, name, version, template, config, token_estimate, prompt_id):
        self.name = name
        self.version = version
        self.template = template
        self.config = config
        self.token_estimate = token_estimate
        self.prompt_id = prompt_id              # logged with every LLM call

Using the Prompt SDK in Application Code

# loan_assistant/service.py
from openai import AzureOpenAI
from prompt_sdk.client import PromptClient

# Singleton — initialized once at app startup
prompt_client = PromptClient(
    cosmos_url=settings.COSMOS_URL,
    database="prompts-db",
    container="prompts",
    env=settings.ENVIRONMENT    # "development" | "staging" | "production"
)

openai_client = AzureOpenAI(
    azure_endpoint=settings.AZURE_OPENAI_ENDPOINT,
    api_version="2024-12-01"
)

async def answer_loan_question(
    user_query: str,
    user_role: str,
    tenant_id: str,
    rag_chunks: list[dict],
    guideline_version: str = "2026-Q1"
) -> dict:
    
    # Fetch system prompt — version pinned to "stable" in production
    system_prompt = prompt_client.get_prompt(
        name="mortgage-assistant-system",
        version="stable",
        variables={
            "user_role": user_role,
            "tenant_id": tenant_id,
            "guideline_version": guideline_version,
            "current_date": date.today().isoformat()
        }
    )
    
    # Fetch few-shot examples
    few_shot = prompt_client.get_prompt(
        name="mortgage-assistant-few-shot",
        version="stable"
    )
    
    # Build RAG context layer
    rag_context = build_rag_context(rag_chunks)
    
    # Compose messages
    messages = [
        {"role": "system", "content": system_prompt.template},
        {"role": "user", "content": few_shot.template},          # few-shot as first user turn
        {"role": "assistant", "content": "Understood. I will follow these examples."},
        {"role": "user", "content": f"{rag_context}\n\nQuestion: {user_query}"}
    ]
    
    response = openai_client.chat.completions.create(
        model=system_prompt.config["model"],
        messages=messages,
        temperature=system_prompt.config["temperature"],
        max_tokens=system_prompt.config["maxTokens"]
    )
    
    return {
        "answer": response.choices[0].message.content,
        "prompt_versions": {
            "system": system_prompt.version,
            "few_shot": few_shot.version,
        },
        "prompt_ids": {
            "system": system_prompt.prompt_id,  # logged for audit
            "few_shot": few_shot.prompt_id,
        },
        "usage": {
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens
        }
    }

How Azure Manages Prompts End-to-End

This is the question worth answering explicitly: yes, every get_prompt() call reads from Cosmos DB at runtime.

Here is the full flow:

What happens on every get_prompt() call:

  1. PromptClient checks the in-process dictionary cache (_cache) with key {name}:{version}:{env}
  2. On cache miss: CosmosClient.query_items() runs a parameterized SQL query against the prompts container, filtered by promptName, environment, and status = 'stable'
  3. Authentication is DefaultAzureCredential() — no secrets in code, uses the managed identity of the Azure app service or AKS pod
  4. The returned document's template field has {{variables}} resolved, producing the final prompt string
  5. The resolved document is cached for 5 minutes — Cosmos DB change feed invalidates the cache when a new prompt version is promoted

Why Cosmos DB and not just Git at runtime: Git is a history store, not a query store. Cosmos DB gives you sub-10ms reads at scale, per-environment isolation, and change-feed-driven cache invalidation that Git cannot provide.


Open Source Stack — Langfuse Prompt Management

Langfuse is commonly described as an LLM observability tool, but it also has a first-class prompt management SDK — the same get_prompt() / create_prompt() pattern as the Azure stack, backed by Langfuse's hosted or self-hosted store.

Create and version a prompt:

from langfuse import Langfuse

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host=os.environ["LANGFUSE_HOST"]   # https://cloud.langfuse.com or self-hosted
)

# Create a new version — Langfuse auto-increments version number
langfuse.create_prompt(
    name="mortgage-assistant-system",
    prompt=(
        "You are SO, a mortgage loan assistant for MortgageIQ. "
        "You help {{user_role}} understand loan guidelines and underwriting decisions.\n\n"
        "Rules:\n"
        "- Always cite the specific guideline section and document version\n"
        "- Never provide a final loan approval or denial\n"
        "- Tenant: {{tenant_id}}\n"
        "- Guidelines version: {{guideline_version}}\n"
        "- Current date: {{current_date}}"
    ),
    config={
        "model": "gpt-4o",
        "temperature": 0.1,
        "max_tokens": 1500
    },
    labels=["staging"]         # labels: staging | production
)

Promote to production (label management):

# Promote version 3 to production — removes 'production' label from previous version
langfuse.create_prompt(
    name="mortgage-assistant-system",
    prompt="...",  # same content
    config={...},
    labels=["production"]      # Langfuse moves 'production' label to this version
)

Read at runtime — langfuse.get_prompt():

# open_source/prompt_store.py
from langfuse import Langfuse
from datetime import date

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host=os.environ["LANGFUSE_HOST"]
)

def get_prompt(
    name: str,
    label: str = "production",
    variables: dict = None
) -> str:
    """
    Fetches prompt from Langfuse by name and label.
    Langfuse SDK caches responses locally — configurable TTL.
    """
    prompt = langfuse.get_prompt(name, label=label)
    
    # Compile resolves {{variable}} placeholders
    variables = variables or {}
    variables.setdefault("current_date", date.today().isoformat())
    
    return prompt.compile(**variables)

# Usage in application code
system_prompt = get_prompt(
    "mortgage-assistant-system",
    label="production",
    variables={
        "user_role": "loan_officer",
        "tenant_id": "acme-bank",
        "guideline_version": "2026-Q1"
    }
)

Langfuse label strategy (mirrors Cosmos DB status field):

LabelEquivalentUse
stagingstatus: stagingTesting in non-prod
productionstatus: stableActive runtime version
(no label / version number)status: deprecatedHistorical reference only

Langfuse vs MongoDB for open source prompt management:

LangfuseMongoDB
Prompt versioningBuilt-in, auto-incrementedManual — version field in schema
Label/environment promotionNative label APICustom status field + queries
Audit trailBuilt-in (who created, when)Custom — add created_by, approved_by fields
Observability integrationNative — prompt version auto-tagged on tracesSeparate setup required
Self-hostedYes (Docker Compose)Yes
CI/CD integrationlangfuse.create_prompt() in pipelineUpsert script against MongoDB

When to use Langfuse: teams that want prompt management and observability from one tool, or that are already using Langfuse for tracing. Langfuse's prompt version is automatically attached to every LLM trace, so you can see which prompt version produced which output without extra instrumentation.

When to use MongoDB: teams that need full control over the schema (custom fields, complex multi-tenant isolation, org hierarchy routing) or that already operate MongoDB infrastructure and want to avoid a new dependency.

Semantic Kernel — .NET Integration

// PromptService.cs
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.PromptTemplates.Handlebars;
using Microsoft.Azure.Cosmos;

public class PromptService
{
    private readonly Container _container;
    private readonly IMemoryCache _cache;
    private readonly string _env;

    public async Task<KernelFunction> GetPromptFunctionAsync(
        Kernel kernel,
        string promptName,
        string version = "stable",
        CancellationToken ct = default)
    {
        var cacheKey = $"{promptName}:{version}:{_env}";
        
        if (!_cache.TryGetValue(cacheKey, out PromptDocument doc))
        {
            doc = await FetchFromCosmosAsync(promptName, version, ct);
            _cache.Set(cacheKey, doc, TimeSpan.FromMinutes(5));
        }

        // Register as Semantic Kernel prompt function
        return kernel.CreateFunctionFromPrompt(
            promptTemplate: doc.Template,
            functionName: promptName.Replace("-", "_"),
            description: $"Prompt: {promptName} v{doc.Version}",
            executionSettings: new OpenAIPromptExecutionSettings
            {
                Temperature = doc.Config.Temperature,
                MaxTokens = doc.Config.MaxTokens,
                ModelId = doc.Config.Model
            }
        );
    }

    private async Task<PromptDocument> FetchFromCosmosAsync(
        string name, string version, CancellationToken ct)
    {
        var query = new QueryDefinition(
            version == "stable"
                ? "SELECT TOP 1 * FROM c WHERE c.promptName = @name AND c.environment = @env AND c.status = 'stable' ORDER BY c._ts DESC"
                : "SELECT TOP 1 * FROM c WHERE c.promptName = @name AND c.version = @ver AND c.environment = @env"
        )
        .WithParameter("@name", name)
        .WithParameter("@env", _env)
        .WithParameter("@ver", version);

        using var feed = _container.GetItemQueryIterator<PromptDocument>(query);
        if (feed.HasMoreResults)
        {
            var page = await feed.ReadNextAsync(ct);
            return page.FirstOrDefault() ?? throw new KeyNotFoundException($"Prompt '{name}' not found");
        }
        throw new KeyNotFoundException($"Prompt '{name}' not found");
    }
}

// Usage in loan service
public class LoanAssistantService
{
    private readonly Kernel _kernel;
    private readonly PromptService _promptService;

    public async Task<string> AnswerAsync(string query, string userRole, string tenantId)
    {
        var systemPromptFn = await _promptService.GetPromptFunctionAsync(
            _kernel, "mortgage-assistant-system");

        var result = await _kernel.InvokeAsync(systemPromptFn, new KernelArguments
        {
            ["user_role"] = userRole,
            ["tenant_id"] = tenantId,
            ["current_date"] = DateTime.UtcNow.ToString("yyyy-MM-dd"),
            ["guideline_version"] = "2026-Q1",
            ["user_query"] = query
        });

        return result.ToString();
    }
}

Environment Promotion Pipeline

Prompts flow through environments with the same discipline as application code:

CI validation script:

# ci/validate_prompt.py
import yaml, tiktoken, sys
from pathlib import Path

def validate_prompt_file(filepath: str) -> list[str]:
    errors = []
    doc = yaml.safe_load(Path(filepath).read_text())

    # Schema validation
    required_fields = ["metadata", "template", "variables", "token_estimate"]
    for f in required_fields:
        if f not in doc:
            errors.append(f"Missing required field: {f}")

    # Changelog required for non-patch versions
    version = doc["metadata"]["version"]
    if not doc["metadata"].get("changelog"):
        errors.append("Changelog entry required")

    # Token count validation
    enc = tiktoken.encoding_for_model("gpt-4o")
    actual_tokens = len(enc.encode(doc["template"]))
    budget = doc["token_estimate"]["template_tokens"]
    if actual_tokens > budget * 1.1:
        errors.append(f"Token count {actual_tokens} exceeds budget {budget}")

    # No hardcoded PII patterns
    import re
    pii_patterns = [r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
                    r'\b\d{16}\b']               # credit card
    for pattern in pii_patterns:
        if re.search(pattern, doc["template"]):
            errors.append(f"Potential PII pattern found in template")

    # Required variables documented
    template_vars = set(re.findall(r'\{\{(\w+)\}\}', doc["template"]))
    documented_vars = set(doc["variables"].get("required", []) + doc["variables"].get("optional", []))
    undocumented = template_vars - documented_vars
    if undocumented:
        errors.append(f"Undocumented variables in template: {undocumented}")

    return errors

if __name__ == "__main__":
    errors = validate_prompt_file(sys.argv[1])
    if errors:
        print("❌ Validation failed:")
        for e in errors: print(f"  - {e}")
        sys.exit(1)
    print("✓ Validation passed")

Rollback

When a prompt causes production issues, rollback must be instant — not a code deployment.

# Emergency rollback — promote previous stable version
async def rollback_prompt(name: str, target_version: str, env: str = "production"):
    """
    Sets the target version to 'stable' status.
    Demotes the current stable version to 'rolled-back'.
    Takes effect within 5 minutes (cache TTL).
    """
    # Find current stable
    current = await cosmos.query_single(
        f"SELECT TOP 1 * FROM c WHERE c.promptName='{name}' "
        f"AND c.environment='{env}' AND c.status='stable'"
    )
    
    # Demote current
    current["status"] = "rolled-back"
    current["rolled_back_at"] = datetime.utcnow().isoformat()
    current["rolled_back_reason"] = "emergency rollback"
    await cosmos.upsert_item(current)
    
    # Promote target version
    target = await cosmos.query_single(
        f"SELECT TOP 1 * FROM c WHERE c.promptName='{name}' "
        f"AND c.version='{target_version}' AND c.environment='{env}'"
    )
    target["status"] = "stable"
    target["promoted_at"] = datetime.utcnow().isoformat()
    await cosmos.upsert_item(target)
    
    # Invalidate all application caches via Cosmos change feed
    # (change feed triggers cache invalidation in all app instances)
    print(f"✓ Rolled back {name} to v{target_version} in {env}")

Blue/green prompt deployment: run two prompt versions simultaneously — route 5% of traffic to the new version while 95% stays on stable. Covered in Part 2 (multi-user routing).


Key Takeaways — Part 1

  • Prompts are configuration, not code — they belong in a versioned prompt store (Git for history, Cosmos DB for runtime retrieval), not hardcoded in application files.
  • Semantic versioning for prompts — patch for wording, minor for new behavior, major for format changes. Every version gets a changelog entry.
  • The Prompt SDK decouples application code from prompt versions — your service calls get_prompt("mortgage-assistant", version="stable") and never knows which version number that resolves to.
  • Environment promotion mirrors application deployment — dev → staging → prod with validation gates, automated eval, and soak periods.
  • Rollback is a database operation, not a code deployment — a status field change in Cosmos DB + cache invalidation takes effect within minutes.

What's Next

  • Part 2: Multi-user routing, multi-tenant isolation, organizational management across business units and teams, approval workflows, and fallback chains
  • Part 3: Security — prompt injection, jailbreaking, extraction attacks, indirect injection, and governance — audit trails, compliance archiving, drift detection
  • Part 4: Observability, A/B testing, feature flags, guardrails, cost governance, and structured output enforcement