Prompt Engineering in Production — Part 2: Multi-User, Multi-Tenant, and Organizational Management

A system prompt that's perfect for a loan officer produces dangerous output for a compliance auditor. A prompt configured for Tenant A must never bleed into Tenant B's context. A new prompt version from the retail banking team should not require the mortgage team's approval.

This is Part 2 of the production prompt engineering series. We cover how enterprise RAG systems route users to the right prompt configuration, isolate tenants at the prompt layer, and manage prompts across business units, teams, and approval tiers.

Part 2 covers:

Role-based prompt routing — different prompts for different user roles
Multi-tenant prompt isolation — Tenant A's context never reaches Tenant B
Organizational prompt management — business units, teams, inheritance
Approval workflows — who approves what, and when
Fallback chains — what happens when the primary prompt fails
Blue/green prompt deployment — traffic splitting between versions

Role-Based Prompt Routing

Different user roles have legitimately different needs — and different risk profiles. The same underlying LLM must behave differently depending on who is asking.

Why different prompts per role — not just different instructions:

A loan officer prompt allows product recommendations. An underwriter prompt restricts to decision support only — no recommendations. A compliance auditor prompt adds explicit audit-mode instructions: "Flag any response that cannot be traced to a cited source." A borrower-facing prompt uses plain language, avoids jargon, and has stricter length limits.

These are behavioral differences that cannot be handled with a single prompt and conditional instructions — the permutation complexity explodes, and a single misconfiguration exposes the wrong behavior to the wrong role.

# prompt_router.py
from dataclasses import dataclass
from prompt_sdk.client import PromptClient

ROLE_PROMPT_MAP = {
    "loan_officer":      "mortgage-assistant-loan-officer",
    "underwriter":       "mortgage-assistant-underwriter",
    "compliance_auditor":"mortgage-assistant-compliance",
    "borrower":          "mortgage-assistant-borrower",
    "admin":             "mortgage-assistant-admin",
}

FALLBACK_PROMPT = "mortgage-assistant-default"  # safe, restrictive baseline

@dataclass
class PromptContext:
    user_id: str
    role: str
    tenant_id: str
    business_unit: str
    permissions: list[str]

class PromptRouter:
    def __init__(self, prompt_client: PromptClient):
        self.prompt_client = prompt_client

    def resolve(self, ctx: PromptContext) -> "ResolvedPrompt":
        prompt_name = ROLE_PROMPT_MAP.get(ctx.role, FALLBACK_PROMPT)
        
        # Check for tenant-specific override
        tenant_override = f"{prompt_name}-{ctx.tenant_id}"
        try:
            return self.prompt_client.get_prompt(
                name=tenant_override,
                variables=self._build_variables(ctx)
            )
        except PromptNotFoundError:
            pass  # No tenant override — use role default
        
        # Check for business unit override
        bu_override = f"{prompt_name}-{ctx.business_unit}"
        try:
            return self.prompt_client.get_prompt(
                name=bu_override,
                variables=self._build_variables(ctx)
            )
        except PromptNotFoundError:
            pass
        
        # Role default
        return self.prompt_client.get_prompt(
            name=prompt_name,
            variables=self._build_variables(ctx)
        )

    def _build_variables(self, ctx: PromptContext) -> dict:
        return {
            "user_role": ctx.role,
            "tenant_id": ctx.tenant_id,
            "business_unit": ctx.business_unit,
            "current_date": date.today().isoformat(),
            "guideline_version": "2026-Q1",
            "permissions": ",".join(ctx.permissions)
        }

Multi-Tenant Prompt Isolation

Multi-tenancy at the prompt layer has two distinct problems:

Context isolation — Tenant A's data must never appear in Tenant B's context
Configuration isolation — Tenant A can have a custom prompt without affecting Tenant B

Tenant Context Injection

Every prompt template includes a tenant context block that is injected at runtime — never hardcoded:

def build_tenant_context(tenant_id: str, tenant_config: dict) -> str:
    """
    Tenant context is injected into the system prompt at runtime.
    It is NOT stored in the prompt template — it's resolved from
    the tenant configuration service.
    """
    return f"""
Tenant Context:
- Organization: {tenant_config['name']}
- Tenant ID: {tenant_id}
- Allowed loan types: {', '.join(tenant_config['allowed_loan_types'])}
- Geographic scope: {', '.join(tenant_config['states'])}
- Custom guidelines: {tenant_config.get('custom_guideline_ref', 'None')}
- Escalation contact: {tenant_config['escalation_email']}
"""

# Usage — tenant context is appended to the resolved system prompt
system_prompt = prompt_client.get_prompt("mortgage-assistant-loan-officer", variables=ctx_vars)
tenant_ctx = build_tenant_context(tenant_id, await tenant_service.get_config(tenant_id))

final_system = system_prompt.template + "\n" + tenant_ctx

Tenant-Specific Prompt Overrides

Tenants with premium tiers can have custom prompt behavior — different tone, additional constraints, custom output format:

# prompts/mortgage-assistant-loan-officer-tenant-acme.v1.0.0.yaml
metadata:
  name: mortgage-assistant-loan-officer-tenant-acme
  version: "1.0.0"
  tenant: acme-mortgage
  inherits: mortgage-assistant-loan-officer:1.2.0  # base prompt
  override_fields:
    - tone
    - output_format
  status: stable
  owner: tenant-acme-admin

template: |
  {{base_template}}

  Additional ACME Mortgage requirements:
  - Always mention ACME Mortgage's First-Time Buyer Program when DTI < 40%
  - Use "team member" instead of "loan officer"
  - Include ACME's NMLS ID: 9876543 in every response footer
  - Responses must not exceed 200 words

Inheritance — tenant overrides extend the base role prompt. The CI pipeline resolves inheritance at promotion time, producing a fully composed template stored in Cosmos DB. Runtime code never resolves inheritance — it always gets the fully composed template.

RAG Index Isolation at the Prompt Layer

The prompt must enforce that RAG retrieval is scoped to the tenant's knowledge base:

async def get_rag_context(query: str, tenant_id: str, user_role: str) -> str:
    """
    RAG retrieval is always scoped to the tenant.
    The filter is set at the retrieval layer — not enforceable by the prompt alone.
    Defense in depth: both the retrieval filter AND the prompt instruct tenant scoping.
    """
    results = await search_client.search(
        search_text=query,
        vector_queries=[...],
        filter=f"tenant_id eq '{tenant_id}'",   # primary enforcement — index filter
        top=5
    )
    
    if not results:
        return "No relevant documents found in your organization's knowledge base."
    
    chunks = []
    for r in results:
        # Verify tenant_id on each chunk before including
        assert r["tenant_id"] == tenant_id, f"Cross-tenant data leak: {r['tenant_id']}"
        chunks.append(f"[{r['doc_title']}, {r['section']}]\n{r['content']}")
    
    return (
        f"Context from {tenant_id} knowledge base:\n"
        + "\n---\n".join(chunks)
        + "\n\nUse only these sources. Do not reference external knowledge."
    )

Organizational Prompt Management

At enterprise scale, prompt management is a multi-team, multi-business-unit challenge. The prompt hierarchy mirrors the org chart.

Inheritance rules:

Child prompts inherit and extend parent prompts — they cannot override safety constraints set at the global level
Global prompt changes cascade to all children — triggers re-evaluation of all child prompts
Team-level prompt changes require BU ML Lead approval only — not CISO
Application-level changes (tone, format) can be self-approved by the owning team if the parent template is unchanged

Prompt Registry — Organizational View

# Prompt registry — tracks ownership and approval chains
class PromptRegistry:
    
    async def get_approval_chain(self, prompt_name: str) -> list[str]:
        """
        Returns required approvers based on prompt level and change type.
        """
        doc = await self.cosmos.get_prompt_metadata(prompt_name)
        level = doc["level"]    # "global" | "business_unit" | "team" | "application"
        change_type = doc["pending_change_type"]  # "major" | "minor" | "patch"
        
        chain = []
        
        if level == "global" or change_type == "major":
            chain.extend(["ciso@mortgageiq.com", "legal@mortgageiq.com"])
        
        if level in ("global", "business_unit") or change_type in ("major", "minor"):
            chain.append(doc["bu_compliance_contact"])
        
        chain.append(doc["bu_ml_lead"])
        
        if change_type == "patch":
            chain = [doc["team_lead"]]  # patch: team lead only
        
        return chain

    async def submit_for_approval(self, prompt_name: str, version: str, author: str):
        approval_chain = await self.get_approval_chain(prompt_name)
        
        # Create approval record in Cosmos DB
        approval_doc = {
            "id": f"approval-{prompt_name}-{version}",
            "partitionKey": "approvals",
            "prompt_name": prompt_name,
            "version": version,
            "author": author,
            "status": "pending",
            "approvers_required": approval_chain,
            "approvers_completed": [],
            "submitted_at": datetime.utcnow().isoformat()
        }
        await self.cosmos.upsert_item(approval_doc)
        
        # Notify first approver
        await self.notify(approval_chain[0], prompt_name, version)
        
    async def approve(self, prompt_name: str, version: str, approver: str):
        approval = await self.cosmos.get_approval(prompt_name, version)
        
        if approver not in approval["approvers_required"]:
            raise PermissionError(f"{approver} is not in the approval chain")
        
        approval["approvers_completed"].append({
            "approver": approver,
            "approved_at": datetime.utcnow().isoformat()
        })
        
        # Check if all required approvals are complete
        required = set(approval["approvers_required"])
        completed = {a["approver"] for a in approval["approvers_completed"]}
        
        if required == completed:
            approval["status"] = "approved"
            await self.promote_to_staging(prompt_name, version)
        else:
            # Notify next approver in chain
            remaining = [a for a in approval["approvers_required"] 
                        if a not in completed]
            await self.notify(remaining[0], prompt_name, version)
        
        await self.cosmos.upsert_item(approval)

Fallback Chains

Production systems must degrade gracefully when a prompt is unavailable, fails validation, or produces an error.

# fallback_chain.py
class PromptFallbackChain:
    def __init__(self, prompt_client: PromptClient):
        self.client = prompt_client

    async def get_with_fallback(
        self,
        name: str,
        variables: dict,
        fallback_chain: list[str] = None
    ) -> tuple["ResolvedPrompt", dict]:
        
        # Default fallback chain
        if fallback_chain is None:
            fallback_chain = [
                (name, "stable"),
                (name, "previous_stable"),    # special alias for N-1 stable
                ("mortgage-assistant-default", "stable"),
            ]
        
        last_error = None
        for fallback_name, fallback_version in fallback_chain:
            try:
                prompt = self.client.get_prompt(
                    name=fallback_name,
                    version=fallback_version,
                    variables=variables
                )
                
                meta = {
                    "prompt_used": fallback_name,
                    "version_used": fallback_version,
                    "is_fallback": fallback_name != name or fallback_version != "stable",
                    "fallback_level": fallback_chain.index((fallback_name, fallback_version))
                }
                
                if meta["is_fallback"]:
                    logger.warning(f"Prompt fallback: {name} → {fallback_name}@{fallback_version}")
                    if meta["fallback_level"] >= 2:
                        await alerting.fire("prompt_fallback_critical", meta)
                
                return prompt, meta
                
            except (PromptNotFoundError, Exception) as e:
                last_error = e
                logger.error(f"Prompt {fallback_name}@{fallback_version} failed: {e}")
                continue
        
        # All fallbacks exhausted
        await alerting.fire("prompt_all_fallbacks_failed", {"name": name})
        raise RuntimeError(f"All prompt fallbacks exhausted. Last error: {last_error}")

Blue/Green Prompt Deployment

Test a new prompt version on a percentage of live traffic before full rollout.

# blue_green.py
import hashlib

class BlueGreenRouter:
    def __init__(self, prompt_client: PromptClient, experiment_store):
        self.prompt_client = prompt_client
        self.experiments = experiment_store

    def get_prompt_for_user(
        self,
        prompt_name: str,
        user_id: str,
        variables: dict
    ) -> tuple["ResolvedPrompt", str]:
        
        experiment = self.experiments.get_active(prompt_name)
        
        if experiment is None:
            # No active experiment — return stable
            prompt = self.prompt_client.get_prompt(prompt_name, "stable", variables)
            return prompt, "stable"
        
        # Deterministic assignment — same user always gets same variant
        bucket = int(hashlib.md5(f"{user_id}:{experiment['id']}".encode()).hexdigest(), 16) % 100
        
        if bucket < experiment["blue_traffic_pct"]:
            version = experiment["blue_version"]
            variant = "blue"
        else:
            version = experiment["green_version"]
            variant = "green"
        
        prompt = self.prompt_client.get_prompt(prompt_name, version, variables)
        return prompt, variant

Experiment configuration in Cosmos DB:

{
    "id": "experiment-mortgage-assistant-v1.3.0",
    "prompt_name": "mortgage-assistant-loan-officer",
    "status": "active",
    "green_version": "1.2.0",
    "blue_version": "1.3.0",
    "blue_traffic_pct": 5,
    "start_date": "2026-04-19",
    "success_metrics": {
        "primary": "faithfulness",
        "threshold": 0.90,
        "min_samples": 500
    },
    "auto_promote": false,
    "owner": "ml-platform-team"
}

Prompt Management Across the Organization

Immutable global constraints — the AI Platform team defines a set of constraints that cannot be overridden at any level:

# prompts/global/safety-constraints.v1.0.0.yaml
# These constraints are injected into EVERY prompt at composition time
# No business unit, team, or tenant can remove or weaken them

immutable_constraints:
  - "Never generate, reproduce, or assist with personally identifiable information (PII) of third parties"
  - "Never claim to be a human when sincerely asked"
  - "Never provide legal, medical, or financial advice presented as professional opinion"
  - "Always acknowledge uncertainty — never state a guess as a fact"
  - "Content safety: follow Azure OpenAI content policy at all times"

injected_at: "system_prompt_footer"
override_allowed: false
version: "1.0.0"

Key Takeaways — Part 2

Role-based prompt routing is an architectural requirement — a single prompt serving all user roles is a misconfiguration waiting to happen. Route by role first, then apply tenant overrides.
Multi-tenant isolation requires enforcement at two layers — the RAG retrieval filter (primary) and the system prompt instruction (secondary). The prompt instruction alone is not sufficient — a sufficiently adversarial user can bypass prompt-only isolation.
Prompt hierarchy mirrors the org chart — global → business unit → team → application → tenant. Changes at each level trigger re-evaluation of all children.
Fallback chains prevent outages — always define N-1 stable version as fallback, and a restrictive default as the last resort. Log and alert on every fallback activation.
Blue/green prompt deployment — test new prompt versions on 5% of live traffic with deterministic user assignment before full rollout. Collect faithfulness and latency metrics before deciding.

What's Next

Part 3: Security — prompt injection, jailbreaking, extraction attacks, indirect injection via RAG, governance, audit trails, compliance archiving, and drift detection
Part 4: Observability, A/B testing, feature flags, guardrails, cost governance, structured output enforcement, and the full open source vs Azure tooling comparison