A prompt is not a string in your code. In production, it is a versioned artifact — with an environment, an owner, an audit trail, a deployment pipeline, and a rollback procedure.
The teams that treat prompts as code comments ship fast and break things in ways they can't explain six months later. The teams that build prompt infrastructure ship slower initially and operate with confidence at scale.
This is Part 1 of a 4-part series on production prompt engineering. We start with the foundation: what a prompt actually is, how to store it, how to version it, and how your application code references it at runtime — in both open source and Azure stacks.
Part 1 covers:
- The four layers of a production prompt
- Why prompts must leave your codebase
- Storage: Cosmos DB vs Git-based versioning
- The Prompt SDK — how code references prompts at runtime
- Environment promotion: dev → staging → prod
- Rollback and blue/green prompt deployment
The Four Layers of a Production Prompt
Every LLM request is built from four distinct prompt layers. Each layer has a different owner, a different change frequency, and a different risk profile.
Why layer ownership matters: when a response is wrong, you need to know which layer caused it. A system prompt bug affects every user. A RAG context bug affects queries on a specific topic. A user message manipulation is a security event. Layered architecture is also layered debugging.
Layer 1 — System Prompt
The system prompt defines the model's identity, constraints, and behavior for the entire session. It's the highest-risk layer because it affects every user on every request.
You are SO, a mortgage loan assistant for MortgageIQ. You help loan officers
understand loan guidelines, eligibility requirements, and underwriting decisions.
Rules:
- Always cite the specific guideline section and document version for any factual claim
- Never provide a final loan approval or denial — flag exceptions for human review
- If a question is outside mortgage lending, respond: "I can only assist with
mortgage-related questions."
- Respond in the same language as the user's question
- Format responses with: Summary (2 sentences), Details (bullet points), Source Citations
You are operating under FHA/VA/Conventional loan guidelines as of {{guideline_version}}.
Current date: {{current_date}}
User role: {{user_role}}
Tenant: {{tenant_id}}
Template variables ({{guideline_version}}, {{user_role}}) are resolved at runtime from the prompt store — they make a single prompt template serve multiple contexts without hardcoding.
Layer 2 — Few-Shot Examples
Few-shot examples teach the model the expected input/output pattern. They're more stable than the system prompt but change when the desired behavior changes.
Example 1:
User: What's the maximum DTI for a conventional loan?
Assistant: **Summary:** Conventional loans generally allow a maximum DTI of 45%,
with exceptions up to 50% for strong compensating factors.
**Details:**
- Standard limit: 45% back-end DTI (Fannie Mae B3-6-02)
- Exception: up to 50% with 720+ credit score and 12 months reserves
- Front-end DTI: no hard limit, lender overlay typically 36%
**Source:** Fannie Mae Selling Guide B3-6-02 (2025-Q1)
Example 2:
User: Can a borrower use gift funds for FHA down payment?
Assistant: **Summary:** Yes — FHA allows 100% of the down payment to come from
gift funds with no minimum borrower contribution required.
...
Layer 3 — RAG Context
Retrieved chunks injected per query. The template wrapper matters as much as the chunks themselves:
Context from MortgageIQ Knowledge Base:
---
[Source: FHA Handbook 4000.1, Section II.A.4.c, p.147, v2025-Q1]
The maximum qualifying ratios for FHA loans are 31% housing expense ratio
and 43% total debt ratio. Exceptions may be granted up to 50% with documented
compensating factors...
---
[Source: FHA Handbook 4000.1, Section II.A.4.d, p.149, v2025-Q1]
Acceptable compensating factors include: verified and documented cash reserves
equal to at least three months of total housing payment...
---
Use only the above sources to answer the question. If the answer is not in the
sources, state: "I don't have reliable information on this in the current guidelines."
Layer 4 — User Message
The raw user query. This is the attack surface for prompt injection — covered in Part 3.
Why Prompts Must Leave Your Codebase
The worst pattern in production LLM systems:
# ❌ Anti-pattern — prompt hardcoded in application code
def get_response(user_query: str) -> str:
system_prompt = """You are a mortgage assistant. Be helpful.
Always cite sources. Don't approve loans.""" # v??? deployed when???
return openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
]
)
Problems with this:
- No versioning — what prompt was live on March 3rd when the compliance audit found a wrong answer?
- No rollback — a bad prompt change requires a code deployment to fix
- No environment separation — dev and prod run the same prompt (or different hardcoded ones scattered across branches)
- No A/B testing — you can't test two system prompts without a code branch
- No non-engineer access — product and legal can't update prompt behavior without a PR
- No audit trail — no record of who changed what and when
The correct pattern treats prompts as configuration — externalized, versioned, and retrieved at runtime.
Storage Architecture
Production prompt storage needs two things: durable versioned history (Git) and fast runtime retrieval (database).
Git-Based Versioning
Prompts live in a dedicated Git repository (or a /prompts directory in your monorepo). Each prompt is a YAML file with full metadata:
# prompts/mortgage-assistant/system-prompt.v1.2.0.yaml
apiVersion: prompts/v1
kind: SystemPrompt
metadata:
name: mortgage-assistant-system
version: "1.2.0"
previous_version: "1.1.3"
status: stable # draft | review | staging | stable | deprecated
owner: ml-platform-team
approvers:
- jane.smith@mortgageiq.com # compliance
- raj.patel@mortgageiq.com # product
created_at: "2026-04-01T09:00:00Z"
approved_at: "2026-04-03T14:22:00Z"
changelog: |
v1.2.0: Added explicit citation format requirement per compliance review CR-2048.
Restricted loan approval language per legal review LR-0312.
v1.1.3: Added Spanish language support via respond-in-user-language rule.
v1.1.0: Added user role context variable.
config:
model: gpt-4o
temperature: 0.1 # low — deterministic for compliance
max_tokens: 1500
top_p: 0.95
template: |
You are SO, a mortgage loan assistant for MortgageIQ.
You help {{user_role}} understand loan guidelines and underwriting decisions.
Rules:
- Always cite the specific guideline section and document version
- Never provide a final loan approval or denial
- If outside mortgage lending: "I can only assist with mortgage-related questions."
- Respond in the same language as the user's question
- Format: Summary → Details → Source Citations
Guidelines version: {{guideline_version}}
Current date: {{current_date}}
Tenant: {{tenant_id}}
User role: {{user_role}}
variables:
required:
- user_role
- tenant_id
- current_date
- guideline_version
optional:
- user_name
- preferred_language
token_estimate:
template_tokens: 187
max_variable_tokens: 50
total_reserved: 237
tags:
- mortgage
- loan-officer
- compliance-reviewed
- pii-safe
Semantic versioning for prompts:
- Patch (1.2.0 → 1.2.1): typo fix, minor wording, does not change behavior
- Minor (1.2.0 → 1.3.0): adds new capability or constraint, backward compatible
- Major (1.2.0 → 2.0.0): breaking change — different output format, removed capability, requires re-evaluation
Cosmos DB — Runtime Storage
Git is for history. Cosmos DB is for fast runtime retrieval. The CI pipeline promotes approved prompts from Git into Cosmos DB per environment.
Schema:
{
"id": "mortgage-assistant-system-v1.2.0",
"partitionKey": "mortgage-assistant",
"promptName": "mortgage-assistant-system",
"version": "1.2.0",
"status": "stable",
"environment": "production",
"template": "You are SO, a mortgage loan assistant...",
"config": {
"model": "gpt-4o",
"temperature": 0.1,
"maxTokens": 1500
},
"variables": {
"required": ["user_role", "tenant_id", "current_date", "guideline_version"],
"optional": ["user_name"]
},
"tokenEstimate": 237,
"owner": "ml-platform-team",
"approvedBy": ["jane.smith@mortgageiq.com", "raj.patel@mortgageiq.com"],
"approvedAt": "2026-04-03T14:22:00Z",
"changelog": "v1.2.0: Added citation format requirement...",
"tags": ["mortgage", "compliance-reviewed"],
"createdAt": "2026-04-01T09:00:00Z",
"_ts": 1743685340
}
Cosmos DB container design:
- Partition key:
promptName— all versions of a prompt in the same partition, fast version lookup - Unique constraint:
promptName + version + environment— prevents duplicate deployments - TTL: none on production records — prompts are compliance artifacts, never auto-deleted
- Change feed: triggers cache invalidation in the application layer when a new version is promoted
The Prompt SDK
The Prompt SDK is the interface between your application code and the prompt store. It abstracts storage, caching, variable resolution, and version pinning.
Azure Stack — Python SDK
# prompt_sdk/client.py
from azure.cosmos import CosmosClient
from azure.core.credentials import DefaultAzureCredential
from functools import lru_cache
import re
from datetime import datetime, date
class PromptClient:
def __init__(self, cosmos_url: str, database: str, container: str, env: str):
self.client = CosmosClient(
url=cosmos_url,
credential=DefaultAzureCredential() # managed identity — no secrets
)
self.container = (
self.client
.get_database_client(database)
.get_container_client(container)
)
self.env = env
self._cache: dict[str, dict] = {} # in-process cache
def get_prompt(
self,
name: str,
version: str = "stable", # "stable" | "latest" | "1.2.0"
variables: dict = None
) -> "ResolvedPrompt":
cache_key = f"{name}:{version}:{self.env}"
# Cache hit — prompts change rarely, cache for 5 minutes
if cache_key in self._cache:
prompt_doc = self._cache[cache_key]
else:
prompt_doc = self._fetch(name, version)
self._cache[cache_key] = prompt_doc
# Resolve template variables
resolved_template = self._resolve(prompt_doc["template"], variables or {})
return ResolvedPrompt(
name=name,
version=prompt_doc["version"],
template=resolved_template,
config=prompt_doc["config"],
token_estimate=prompt_doc["tokenEstimate"],
prompt_id=prompt_doc["id"] # for audit logging
)
def _fetch(self, name: str, version: str) -> dict:
if version == "stable":
query = """
SELECT TOP 1 * FROM c
WHERE c.promptName = @name
AND c.environment = @env
AND c.status = 'stable'
ORDER BY c._ts DESC
"""
elif version == "latest":
query = """
SELECT TOP 1 * FROM c
WHERE c.promptName = @name
AND c.environment = @env
ORDER BY c._ts DESC
"""
else:
# Exact version pin
query = """
SELECT TOP 1 * FROM c
WHERE c.promptName = @name
AND c.version = @version
AND c.environment = @env
"""
params = [
{"name": "@name", "value": name},
{"name": "@env", "value": self.env},
{"name": "@version", "value": version}
]
results = list(self.container.query_items(
query=query,
parameters=params,
enable_cross_partition_query=False # partition key = promptName
))
if not results:
raise PromptNotFoundError(f"Prompt '{name}' version '{version}' not found in {self.env}")
return results[0]
def _resolve(self, template: str, variables: dict) -> str:
# Inject current_date automatically if not provided
variables.setdefault("current_date", date.today().isoformat())
# Replace {{variable}} placeholders
def replace(match):
key = match.group(1).strip()
if key not in variables:
raise MissingVariableError(f"Required variable '{key}' not provided")
return str(variables[key])
return re.sub(r'\{\{(\w+)\}\}', replace, template)
def invalidate_cache(self, name: str = None):
if name:
self._cache = {k: v for k, v in self._cache.items() if not k.startswith(f"{name}:")}
else:
self._cache.clear()
class ResolvedPrompt:
def __init__(self, name, version, template, config, token_estimate, prompt_id):
self.name = name
self.version = version
self.template = template
self.config = config
self.token_estimate = token_estimate
self.prompt_id = prompt_id # logged with every LLM call
Using the Prompt SDK in Application Code
# loan_assistant/service.py
from openai import AzureOpenAI
from prompt_sdk.client import PromptClient
# Singleton — initialized once at app startup
prompt_client = PromptClient(
cosmos_url=settings.COSMOS_URL,
database="prompts-db",
container="prompts",
env=settings.ENVIRONMENT # "development" | "staging" | "production"
)
openai_client = AzureOpenAI(
azure_endpoint=settings.AZURE_OPENAI_ENDPOINT,
api_version="2024-12-01"
)
async def answer_loan_question(
user_query: str,
user_role: str,
tenant_id: str,
rag_chunks: list[dict],
guideline_version: str = "2026-Q1"
) -> dict:
# Fetch system prompt — version pinned to "stable" in production
system_prompt = prompt_client.get_prompt(
name="mortgage-assistant-system",
version="stable",
variables={
"user_role": user_role,
"tenant_id": tenant_id,
"guideline_version": guideline_version,
"current_date": date.today().isoformat()
}
)
# Fetch few-shot examples
few_shot = prompt_client.get_prompt(
name="mortgage-assistant-few-shot",
version="stable"
)
# Build RAG context layer
rag_context = build_rag_context(rag_chunks)
# Compose messages
messages = [
{"role": "system", "content": system_prompt.template},
{"role": "user", "content": few_shot.template}, # few-shot as first user turn
{"role": "assistant", "content": "Understood. I will follow these examples."},
{"role": "user", "content": f"{rag_context}\n\nQuestion: {user_query}"}
]
response = openai_client.chat.completions.create(
model=system_prompt.config["model"],
messages=messages,
temperature=system_prompt.config["temperature"],
max_tokens=system_prompt.config["maxTokens"]
)
return {
"answer": response.choices[0].message.content,
"prompt_versions": {
"system": system_prompt.version,
"few_shot": few_shot.version,
},
"prompt_ids": {
"system": system_prompt.prompt_id, # logged for audit
"few_shot": few_shot.prompt_id,
},
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens
}
}
How Azure Manages Prompts End-to-End
This is the question worth answering explicitly: yes, every get_prompt() call reads from Cosmos DB at runtime.
Here is the full flow:
What happens on every get_prompt() call:
PromptClientchecks the in-process dictionary cache (_cache) with key{name}:{version}:{env}- On cache miss:
CosmosClient.query_items()runs a parameterized SQL query against thepromptscontainer, filtered bypromptName,environment, andstatus = 'stable' - Authentication is
DefaultAzureCredential()— no secrets in code, uses the managed identity of the Azure app service or AKS pod - The returned document's
templatefield has{{variables}}resolved, producing the final prompt string - The resolved document is cached for 5 minutes — Cosmos DB change feed invalidates the cache when a new prompt version is promoted
Why Cosmos DB and not just Git at runtime: Git is a history store, not a query store. Cosmos DB gives you sub-10ms reads at scale, per-environment isolation, and change-feed-driven cache invalidation that Git cannot provide.
Open Source Stack — Langfuse Prompt Management
Langfuse is commonly described as an LLM observability tool, but it also has a first-class prompt management SDK — the same get_prompt() / create_prompt() pattern as the Azure stack, backed by Langfuse's hosted or self-hosted store.
Create and version a prompt:
from langfuse import Langfuse
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host=os.environ["LANGFUSE_HOST"] # https://cloud.langfuse.com or self-hosted
)
# Create a new version — Langfuse auto-increments version number
langfuse.create_prompt(
name="mortgage-assistant-system",
prompt=(
"You are SO, a mortgage loan assistant for MortgageIQ. "
"You help {{user_role}} understand loan guidelines and underwriting decisions.\n\n"
"Rules:\n"
"- Always cite the specific guideline section and document version\n"
"- Never provide a final loan approval or denial\n"
"- Tenant: {{tenant_id}}\n"
"- Guidelines version: {{guideline_version}}\n"
"- Current date: {{current_date}}"
),
config={
"model": "gpt-4o",
"temperature": 0.1,
"max_tokens": 1500
},
labels=["staging"] # labels: staging | production
)
Promote to production (label management):
# Promote version 3 to production — removes 'production' label from previous version
langfuse.create_prompt(
name="mortgage-assistant-system",
prompt="...", # same content
config={...},
labels=["production"] # Langfuse moves 'production' label to this version
)
Read at runtime — langfuse.get_prompt():
# open_source/prompt_store.py
from langfuse import Langfuse
from datetime import date
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host=os.environ["LANGFUSE_HOST"]
)
def get_prompt(
name: str,
label: str = "production",
variables: dict = None
) -> str:
"""
Fetches prompt from Langfuse by name and label.
Langfuse SDK caches responses locally — configurable TTL.
"""
prompt = langfuse.get_prompt(name, label=label)
# Compile resolves {{variable}} placeholders
variables = variables or {}
variables.setdefault("current_date", date.today().isoformat())
return prompt.compile(**variables)
# Usage in application code
system_prompt = get_prompt(
"mortgage-assistant-system",
label="production",
variables={
"user_role": "loan_officer",
"tenant_id": "acme-bank",
"guideline_version": "2026-Q1"
}
)
Langfuse label strategy (mirrors Cosmos DB status field):
| Label | Equivalent | Use |
|---|---|---|
staging | status: staging | Testing in non-prod |
production | status: stable | Active runtime version |
| (no label / version number) | status: deprecated | Historical reference only |
Langfuse vs MongoDB for open source prompt management:
| Langfuse | MongoDB | |
|---|---|---|
| Prompt versioning | Built-in, auto-incremented | Manual — version field in schema |
| Label/environment promotion | Native label API | Custom status field + queries |
| Audit trail | Built-in (who created, when) | Custom — add created_by, approved_by fields |
| Observability integration | Native — prompt version auto-tagged on traces | Separate setup required |
| Self-hosted | Yes (Docker Compose) | Yes |
| CI/CD integration | langfuse.create_prompt() in pipeline | Upsert script against MongoDB |
When to use Langfuse: teams that want prompt management and observability from one tool, or that are already using Langfuse for tracing. Langfuse's prompt version is automatically attached to every LLM trace, so you can see which prompt version produced which output without extra instrumentation.
When to use MongoDB: teams that need full control over the schema (custom fields, complex multi-tenant isolation, org hierarchy routing) or that already operate MongoDB infrastructure and want to avoid a new dependency.
Semantic Kernel — .NET Integration
// PromptService.cs
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.PromptTemplates.Handlebars;
using Microsoft.Azure.Cosmos;
public class PromptService
{
private readonly Container _container;
private readonly IMemoryCache _cache;
private readonly string _env;
public async Task<KernelFunction> GetPromptFunctionAsync(
Kernel kernel,
string promptName,
string version = "stable",
CancellationToken ct = default)
{
var cacheKey = $"{promptName}:{version}:{_env}";
if (!_cache.TryGetValue(cacheKey, out PromptDocument doc))
{
doc = await FetchFromCosmosAsync(promptName, version, ct);
_cache.Set(cacheKey, doc, TimeSpan.FromMinutes(5));
}
// Register as Semantic Kernel prompt function
return kernel.CreateFunctionFromPrompt(
promptTemplate: doc.Template,
functionName: promptName.Replace("-", "_"),
description: $"Prompt: {promptName} v{doc.Version}",
executionSettings: new OpenAIPromptExecutionSettings
{
Temperature = doc.Config.Temperature,
MaxTokens = doc.Config.MaxTokens,
ModelId = doc.Config.Model
}
);
}
private async Task<PromptDocument> FetchFromCosmosAsync(
string name, string version, CancellationToken ct)
{
var query = new QueryDefinition(
version == "stable"
? "SELECT TOP 1 * FROM c WHERE c.promptName = @name AND c.environment = @env AND c.status = 'stable' ORDER BY c._ts DESC"
: "SELECT TOP 1 * FROM c WHERE c.promptName = @name AND c.version = @ver AND c.environment = @env"
)
.WithParameter("@name", name)
.WithParameter("@env", _env)
.WithParameter("@ver", version);
using var feed = _container.GetItemQueryIterator<PromptDocument>(query);
if (feed.HasMoreResults)
{
var page = await feed.ReadNextAsync(ct);
return page.FirstOrDefault() ?? throw new KeyNotFoundException($"Prompt '{name}' not found");
}
throw new KeyNotFoundException($"Prompt '{name}' not found");
}
}
// Usage in loan service
public class LoanAssistantService
{
private readonly Kernel _kernel;
private readonly PromptService _promptService;
public async Task<string> AnswerAsync(string query, string userRole, string tenantId)
{
var systemPromptFn = await _promptService.GetPromptFunctionAsync(
_kernel, "mortgage-assistant-system");
var result = await _kernel.InvokeAsync(systemPromptFn, new KernelArguments
{
["user_role"] = userRole,
["tenant_id"] = tenantId,
["current_date"] = DateTime.UtcNow.ToString("yyyy-MM-dd"),
["guideline_version"] = "2026-Q1",
["user_query"] = query
});
return result.ToString();
}
}
Environment Promotion Pipeline
Prompts flow through environments with the same discipline as application code:
CI validation script:
# ci/validate_prompt.py
import yaml, tiktoken, sys
from pathlib import Path
def validate_prompt_file(filepath: str) -> list[str]:
errors = []
doc = yaml.safe_load(Path(filepath).read_text())
# Schema validation
required_fields = ["metadata", "template", "variables", "token_estimate"]
for f in required_fields:
if f not in doc:
errors.append(f"Missing required field: {f}")
# Changelog required for non-patch versions
version = doc["metadata"]["version"]
if not doc["metadata"].get("changelog"):
errors.append("Changelog entry required")
# Token count validation
enc = tiktoken.encoding_for_model("gpt-4o")
actual_tokens = len(enc.encode(doc["template"]))
budget = doc["token_estimate"]["template_tokens"]
if actual_tokens > budget * 1.1:
errors.append(f"Token count {actual_tokens} exceeds budget {budget}")
# No hardcoded PII patterns
import re
pii_patterns = [r'\b\d{3}-\d{2}-\d{4}\b', # SSN
r'\b\d{16}\b'] # credit card
for pattern in pii_patterns:
if re.search(pattern, doc["template"]):
errors.append(f"Potential PII pattern found in template")
# Required variables documented
template_vars = set(re.findall(r'\{\{(\w+)\}\}', doc["template"]))
documented_vars = set(doc["variables"].get("required", []) + doc["variables"].get("optional", []))
undocumented = template_vars - documented_vars
if undocumented:
errors.append(f"Undocumented variables in template: {undocumented}")
return errors
if __name__ == "__main__":
errors = validate_prompt_file(sys.argv[1])
if errors:
print("❌ Validation failed:")
for e in errors: print(f" - {e}")
sys.exit(1)
print("✓ Validation passed")
Rollback
When a prompt causes production issues, rollback must be instant — not a code deployment.
# Emergency rollback — promote previous stable version
async def rollback_prompt(name: str, target_version: str, env: str = "production"):
"""
Sets the target version to 'stable' status.
Demotes the current stable version to 'rolled-back'.
Takes effect within 5 minutes (cache TTL).
"""
# Find current stable
current = await cosmos.query_single(
f"SELECT TOP 1 * FROM c WHERE c.promptName='{name}' "
f"AND c.environment='{env}' AND c.status='stable'"
)
# Demote current
current["status"] = "rolled-back"
current["rolled_back_at"] = datetime.utcnow().isoformat()
current["rolled_back_reason"] = "emergency rollback"
await cosmos.upsert_item(current)
# Promote target version
target = await cosmos.query_single(
f"SELECT TOP 1 * FROM c WHERE c.promptName='{name}' "
f"AND c.version='{target_version}' AND c.environment='{env}'"
)
target["status"] = "stable"
target["promoted_at"] = datetime.utcnow().isoformat()
await cosmos.upsert_item(target)
# Invalidate all application caches via Cosmos change feed
# (change feed triggers cache invalidation in all app instances)
print(f"✓ Rolled back {name} to v{target_version} in {env}")
Blue/green prompt deployment: run two prompt versions simultaneously — route 5% of traffic to the new version while 95% stays on stable. Covered in Part 2 (multi-user routing).
Key Takeaways — Part 1
- Prompts are configuration, not code — they belong in a versioned prompt store (Git for history, Cosmos DB for runtime retrieval), not hardcoded in application files.
- Semantic versioning for prompts — patch for wording, minor for new behavior, major for format changes. Every version gets a changelog entry.
- The Prompt SDK decouples application code from prompt versions — your service calls
get_prompt("mortgage-assistant", version="stable")and never knows which version number that resolves to. - Environment promotion mirrors application deployment — dev → staging → prod with validation gates, automated eval, and soak periods.
- Rollback is a database operation, not a code deployment — a status field change in Cosmos DB + cache invalidation takes effect within minutes.
What's Next
- Part 2: Multi-user routing, multi-tenant isolation, organizational management across business units and teams, approval workflows, and fallback chains
- Part 3: Security — prompt injection, jailbreaking, extraction attacks, indirect injection, and governance — audit trails, compliance archiving, drift detection
- Part 4: Observability, A/B testing, feature flags, guardrails, cost governance, and structured output enforcement