A system prompt that's perfect for a loan officer produces dangerous output for a compliance auditor. A prompt configured for Tenant A must never bleed into Tenant B's context. A new prompt version from the retail banking team should not require the mortgage team's approval.
This is Part 2 of the production prompt engineering series. We cover how enterprise RAG systems route users to the right prompt configuration, isolate tenants at the prompt layer, and manage prompts across business units, teams, and approval tiers.
Part 2 covers:
- Role-based prompt routing — different prompts for different user roles
- Multi-tenant prompt isolation — Tenant A's context never reaches Tenant B
- Organizational prompt management — business units, teams, inheritance
- Approval workflows — who approves what, and when
- Fallback chains — what happens when the primary prompt fails
- Blue/green prompt deployment — traffic splitting between versions
Role-Based Prompt Routing
Different user roles have legitimately different needs — and different risk profiles. The same underlying LLM must behave differently depending on who is asking.
Why different prompts per role — not just different instructions:
A loan officer prompt allows product recommendations. An underwriter prompt restricts to decision support only — no recommendations. A compliance auditor prompt adds explicit audit-mode instructions: "Flag any response that cannot be traced to a cited source." A borrower-facing prompt uses plain language, avoids jargon, and has stricter length limits.
These are behavioral differences that cannot be handled with a single prompt and conditional instructions — the permutation complexity explodes, and a single misconfiguration exposes the wrong behavior to the wrong role.
# prompt_router.py
from dataclasses import dataclass
from prompt_sdk.client import PromptClient
ROLE_PROMPT_MAP = {
"loan_officer": "mortgage-assistant-loan-officer",
"underwriter": "mortgage-assistant-underwriter",
"compliance_auditor":"mortgage-assistant-compliance",
"borrower": "mortgage-assistant-borrower",
"admin": "mortgage-assistant-admin",
}
FALLBACK_PROMPT = "mortgage-assistant-default" # safe, restrictive baseline
@dataclass
class PromptContext:
user_id: str
role: str
tenant_id: str
business_unit: str
permissions: list[str]
class PromptRouter:
def __init__(self, prompt_client: PromptClient):
self.prompt_client = prompt_client
def resolve(self, ctx: PromptContext) -> "ResolvedPrompt":
prompt_name = ROLE_PROMPT_MAP.get(ctx.role, FALLBACK_PROMPT)
# Check for tenant-specific override
tenant_override = f"{prompt_name}-{ctx.tenant_id}"
try:
return self.prompt_client.get_prompt(
name=tenant_override,
variables=self._build_variables(ctx)
)
except PromptNotFoundError:
pass # No tenant override — use role default
# Check for business unit override
bu_override = f"{prompt_name}-{ctx.business_unit}"
try:
return self.prompt_client.get_prompt(
name=bu_override,
variables=self._build_variables(ctx)
)
except PromptNotFoundError:
pass
# Role default
return self.prompt_client.get_prompt(
name=prompt_name,
variables=self._build_variables(ctx)
)
def _build_variables(self, ctx: PromptContext) -> dict:
return {
"user_role": ctx.role,
"tenant_id": ctx.tenant_id,
"business_unit": ctx.business_unit,
"current_date": date.today().isoformat(),
"guideline_version": "2026-Q1",
"permissions": ",".join(ctx.permissions)
}
Multi-Tenant Prompt Isolation
Multi-tenancy at the prompt layer has two distinct problems:
- Context isolation — Tenant A's data must never appear in Tenant B's context
- Configuration isolation — Tenant A can have a custom prompt without affecting Tenant B
Tenant Context Injection
Every prompt template includes a tenant context block that is injected at runtime — never hardcoded:
def build_tenant_context(tenant_id: str, tenant_config: dict) -> str:
"""
Tenant context is injected into the system prompt at runtime.
It is NOT stored in the prompt template — it's resolved from
the tenant configuration service.
"""
return f"""
Tenant Context:
- Organization: {tenant_config['name']}
- Tenant ID: {tenant_id}
- Allowed loan types: {', '.join(tenant_config['allowed_loan_types'])}
- Geographic scope: {', '.join(tenant_config['states'])}
- Custom guidelines: {tenant_config.get('custom_guideline_ref', 'None')}
- Escalation contact: {tenant_config['escalation_email']}
"""
# Usage — tenant context is appended to the resolved system prompt
system_prompt = prompt_client.get_prompt("mortgage-assistant-loan-officer", variables=ctx_vars)
tenant_ctx = build_tenant_context(tenant_id, await tenant_service.get_config(tenant_id))
final_system = system_prompt.template + "\n" + tenant_ctx
Tenant-Specific Prompt Overrides
Tenants with premium tiers can have custom prompt behavior — different tone, additional constraints, custom output format:
# prompts/mortgage-assistant-loan-officer-tenant-acme.v1.0.0.yaml
metadata:
name: mortgage-assistant-loan-officer-tenant-acme
version: "1.0.0"
tenant: acme-mortgage
inherits: mortgage-assistant-loan-officer:1.2.0 # base prompt
override_fields:
- tone
- output_format
status: stable
owner: tenant-acme-admin
template: |
{{base_template}}
Additional ACME Mortgage requirements:
- Always mention ACME Mortgage's First-Time Buyer Program when DTI < 40%
- Use "team member" instead of "loan officer"
- Include ACME's NMLS ID: 9876543 in every response footer
- Responses must not exceed 200 words
Inheritance — tenant overrides extend the base role prompt. The CI pipeline resolves inheritance at promotion time, producing a fully composed template stored in Cosmos DB. Runtime code never resolves inheritance — it always gets the fully composed template.
RAG Index Isolation at the Prompt Layer
The prompt must enforce that RAG retrieval is scoped to the tenant's knowledge base:
async def get_rag_context(query: str, tenant_id: str, user_role: str) -> str:
"""
RAG retrieval is always scoped to the tenant.
The filter is set at the retrieval layer — not enforceable by the prompt alone.
Defense in depth: both the retrieval filter AND the prompt instruct tenant scoping.
"""
results = await search_client.search(
search_text=query,
vector_queries=[...],
filter=f"tenant_id eq '{tenant_id}'", # primary enforcement — index filter
top=5
)
if not results:
return "No relevant documents found in your organization's knowledge base."
chunks = []
for r in results:
# Verify tenant_id on each chunk before including
assert r["tenant_id"] == tenant_id, f"Cross-tenant data leak: {r['tenant_id']}"
chunks.append(f"[{r['doc_title']}, {r['section']}]\n{r['content']}")
return (
f"Context from {tenant_id} knowledge base:\n"
+ "\n---\n".join(chunks)
+ "\n\nUse only these sources. Do not reference external knowledge."
)
Organizational Prompt Management
At enterprise scale, prompt management is a multi-team, multi-business-unit challenge. The prompt hierarchy mirrors the org chart.
Inheritance rules:
- Child prompts inherit and extend parent prompts — they cannot override safety constraints set at the global level
- Global prompt changes cascade to all children — triggers re-evaluation of all child prompts
- Team-level prompt changes require BU ML Lead approval only — not CISO
- Application-level changes (tone, format) can be self-approved by the owning team if the parent template is unchanged
Prompt Registry — Organizational View
# Prompt registry — tracks ownership and approval chains
class PromptRegistry:
async def get_approval_chain(self, prompt_name: str) -> list[str]:
"""
Returns required approvers based on prompt level and change type.
"""
doc = await self.cosmos.get_prompt_metadata(prompt_name)
level = doc["level"] # "global" | "business_unit" | "team" | "application"
change_type = doc["pending_change_type"] # "major" | "minor" | "patch"
chain = []
if level == "global" or change_type == "major":
chain.extend(["ciso@mortgageiq.com", "legal@mortgageiq.com"])
if level in ("global", "business_unit") or change_type in ("major", "minor"):
chain.append(doc["bu_compliance_contact"])
chain.append(doc["bu_ml_lead"])
if change_type == "patch":
chain = [doc["team_lead"]] # patch: team lead only
return chain
async def submit_for_approval(self, prompt_name: str, version: str, author: str):
approval_chain = await self.get_approval_chain(prompt_name)
# Create approval record in Cosmos DB
approval_doc = {
"id": f"approval-{prompt_name}-{version}",
"partitionKey": "approvals",
"prompt_name": prompt_name,
"version": version,
"author": author,
"status": "pending",
"approvers_required": approval_chain,
"approvers_completed": [],
"submitted_at": datetime.utcnow().isoformat()
}
await self.cosmos.upsert_item(approval_doc)
# Notify first approver
await self.notify(approval_chain[0], prompt_name, version)
async def approve(self, prompt_name: str, version: str, approver: str):
approval = await self.cosmos.get_approval(prompt_name, version)
if approver not in approval["approvers_required"]:
raise PermissionError(f"{approver} is not in the approval chain")
approval["approvers_completed"].append({
"approver": approver,
"approved_at": datetime.utcnow().isoformat()
})
# Check if all required approvals are complete
required = set(approval["approvers_required"])
completed = {a["approver"] for a in approval["approvers_completed"]}
if required == completed:
approval["status"] = "approved"
await self.promote_to_staging(prompt_name, version)
else:
# Notify next approver in chain
remaining = [a for a in approval["approvers_required"]
if a not in completed]
await self.notify(remaining[0], prompt_name, version)
await self.cosmos.upsert_item(approval)
Fallback Chains
Production systems must degrade gracefully when a prompt is unavailable, fails validation, or produces an error.
# fallback_chain.py
class PromptFallbackChain:
def __init__(self, prompt_client: PromptClient):
self.client = prompt_client
async def get_with_fallback(
self,
name: str,
variables: dict,
fallback_chain: list[str] = None
) -> tuple["ResolvedPrompt", dict]:
# Default fallback chain
if fallback_chain is None:
fallback_chain = [
(name, "stable"),
(name, "previous_stable"), # special alias for N-1 stable
("mortgage-assistant-default", "stable"),
]
last_error = None
for fallback_name, fallback_version in fallback_chain:
try:
prompt = self.client.get_prompt(
name=fallback_name,
version=fallback_version,
variables=variables
)
meta = {
"prompt_used": fallback_name,
"version_used": fallback_version,
"is_fallback": fallback_name != name or fallback_version != "stable",
"fallback_level": fallback_chain.index((fallback_name, fallback_version))
}
if meta["is_fallback"]:
logger.warning(f"Prompt fallback: {name} → {fallback_name}@{fallback_version}")
if meta["fallback_level"] >= 2:
await alerting.fire("prompt_fallback_critical", meta)
return prompt, meta
except (PromptNotFoundError, Exception) as e:
last_error = e
logger.error(f"Prompt {fallback_name}@{fallback_version} failed: {e}")
continue
# All fallbacks exhausted
await alerting.fire("prompt_all_fallbacks_failed", {"name": name})
raise RuntimeError(f"All prompt fallbacks exhausted. Last error: {last_error}")
Blue/Green Prompt Deployment
Test a new prompt version on a percentage of live traffic before full rollout.
# blue_green.py
import hashlib
class BlueGreenRouter:
def __init__(self, prompt_client: PromptClient, experiment_store):
self.prompt_client = prompt_client
self.experiments = experiment_store
def get_prompt_for_user(
self,
prompt_name: str,
user_id: str,
variables: dict
) -> tuple["ResolvedPrompt", str]:
experiment = self.experiments.get_active(prompt_name)
if experiment is None:
# No active experiment — return stable
prompt = self.prompt_client.get_prompt(prompt_name, "stable", variables)
return prompt, "stable"
# Deterministic assignment — same user always gets same variant
bucket = int(hashlib.md5(f"{user_id}:{experiment['id']}".encode()).hexdigest(), 16) % 100
if bucket < experiment["blue_traffic_pct"]:
version = experiment["blue_version"]
variant = "blue"
else:
version = experiment["green_version"]
variant = "green"
prompt = self.prompt_client.get_prompt(prompt_name, version, variables)
return prompt, variant
Experiment configuration in Cosmos DB:
{
"id": "experiment-mortgage-assistant-v1.3.0",
"prompt_name": "mortgage-assistant-loan-officer",
"status": "active",
"green_version": "1.2.0",
"blue_version": "1.3.0",
"blue_traffic_pct": 5,
"start_date": "2026-04-19",
"success_metrics": {
"primary": "faithfulness",
"threshold": 0.90,
"min_samples": 500
},
"auto_promote": false,
"owner": "ml-platform-team"
}
Prompt Management Across the Organization
Immutable global constraints — the AI Platform team defines a set of constraints that cannot be overridden at any level:
# prompts/global/safety-constraints.v1.0.0.yaml
# These constraints are injected into EVERY prompt at composition time
# No business unit, team, or tenant can remove or weaken them
immutable_constraints:
- "Never generate, reproduce, or assist with personally identifiable information (PII) of third parties"
- "Never claim to be a human when sincerely asked"
- "Never provide legal, medical, or financial advice presented as professional opinion"
- "Always acknowledge uncertainty — never state a guess as a fact"
- "Content safety: follow Azure OpenAI content policy at all times"
injected_at: "system_prompt_footer"
override_allowed: false
version: "1.0.0"
Key Takeaways — Part 2
- Role-based prompt routing is an architectural requirement — a single prompt serving all user roles is a misconfiguration waiting to happen. Route by role first, then apply tenant overrides.
- Multi-tenant isolation requires enforcement at two layers — the RAG retrieval filter (primary) and the system prompt instruction (secondary). The prompt instruction alone is not sufficient — a sufficiently adversarial user can bypass prompt-only isolation.
- Prompt hierarchy mirrors the org chart — global → business unit → team → application → tenant. Changes at each level trigger re-evaluation of all children.
- Fallback chains prevent outages — always define N-1 stable version as fallback, and a restrictive default as the last resort. Log and alert on every fallback activation.
- Blue/green prompt deployment — test new prompt versions on 5% of live traffic with deterministic user assignment before full rollout. Collect faithfulness and latency metrics before deciding.
What's Next
- Part 3: Security — prompt injection, jailbreaking, extraction attacks, indirect injection via RAG, governance, audit trails, compliance archiving, and drift detection
- Part 4: Observability, A/B testing, feature flags, guardrails, cost governance, structured output enforcement, and the full open source vs Azure tooling comparison