LangChain, LangGraph, LangFlow, and LangSmith are not competing frameworks. They are four layers of one enterprise AI stack — and using any one of them without the others leaves a gap that will cost you in production.
Most teams pick LangChain because they saw it in a tutorial. They build an agent. It works in development. Then production arrives: concurrent users, cost spikes, hallucinated financial advice, non-deterministic agent behavior, and no trace of what happened. The three missing layers were always LangGraph, LangSmith, and a governed deployment path.
This post builds a complete reference architecture around a real business case — an Investment Coach AI — and shows exactly where each component lives, what data flows through it, and how the entire system runs on Azure with production-grade governance.
The Business Case — Investment Coach AI
An Investment Coach AI answers questions like:
- "Based on my risk profile, should I rebalance my portfolio this quarter?"
- "Explain the tax implications of selling my tech holdings before year-end."
- "What is the historical performance of my current allocation vs. S&P 500?"
This is not a simple chatbot. It requires:
- Multi-source retrieval: regulatory documents, live market data, user portfolio from a database, historical performance APIs
- Multi-step reasoning: understand risk profile → retrieve relevant data → reason over it → generate grounded advice → validate against compliance rules
- Stateful conversation: context carries across turns (the user's portfolio, previous questions, session intent)
- Compliance enforcement: financial advice is regulated. Every answer must cite sources, stay within licensed advisor boundaries, and log the full reasoning chain for audit
This is exactly the use case that exposes the limits of a single-layer implementation — and shows why all four components are necessary.
The Enterprise AI Architecture — Where Each Tool Fits
Each tool has one primary job. Each job is indispensable in production.
Layer 1 — LangFlow: Visual Prototyping and Stakeholder Alignment
What it is: LangFlow is a visual, drag-and-drop interface for building LangChain flows. It generates LangChain-compatible Python code. It is not a production serving layer — it is a design and alignment tool.
Where it fits: before writing code. A compliance officer, product manager, or business analyst can open LangFlow, see the flow of data through the Investment Coach system, and validate the logic without reading Python. When they approve the flow, an engineer exports it to code.
What LangFlow produces: a JSON flow definition that exports to LangChain Python. The business stakeholder sees the intent-routing logic. The compliance team sees that validation runs before output. Engineers get a working first draft they refine into production code.
When to use LangFlow:
- Sprint 0: align product, compliance, and engineering on the flow before writing code
- Stakeholder demos: show the AI reasoning path without live code
- Rapid prototyping: validate retrieval strategy before committing to an architecture
- Non-technical team members who need to understand or modify simple flows
When NOT to use LangFlow for production: LangFlow is not a production serving runtime. It lacks stateful agent coordination, production-grade error handling, multi-tenancy, and the concurrency model needed for enterprise scale. Export the flow, refine in code, deploy via LangGraph + Azure.
Layer 2 — LangChain: Application Logic and LLM Abstraction
What it is: LangChain provides the building blocks — LLM wrappers, prompt templates, retrieval chains, tool integrations, memory abstractions, and output parsers. It is the application logic layer that connects your LLM to your data and tools.
Where it fits: inside every agent in the LangGraph orchestration. Each agent uses LangChain components internally — it does not run standalone in production for a multi-agent system.
LangChain Components for Investment Coach
# investment_coach/chains.py
from langchain_openai import AzureChatOpenAI
from langchain_community.vectorstores import AzureSearch
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.tools import tool
import os
# --- LLM setup via Azure OpenAI ---
llm = AzureChatOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_deployment="gpt-4o",
api_version="2024-12-01",
temperature=0.1, # low — financial advice needs determinism
max_tokens=1500
)
# --- Azure AI Search vector store for portfolio docs ---
portfolio_store = AzureSearch(
azure_search_endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
azure_search_key=os.environ["AZURE_SEARCH_KEY"],
index_name="investment-portfolio-index",
embedding_function=embeddings.embed_query
)
# --- Azure AI Search for regulatory / tax documents ---
regulatory_store = AzureSearch(
azure_search_endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
azure_search_key=os.environ["AZURE_SEARCH_KEY"],
index_name="financial-regulatory-index",
embedding_function=embeddings.embed_query
)
# --- Prompt template for portfolio analysis ---
portfolio_prompt = ChatPromptTemplate.from_messages([
("system", """You are a licensed investment analysis assistant.
Answer questions about portfolio performance and allocation using ONLY the provided context.
Rules:
- Never recommend specific securities by name without citing data
- Always state the data date for market figures
- Flag when data may be stale (older than 30 days)
- If question requires licensed advice beyond analysis, say:
"This requires review by a licensed financial advisor."
User risk profile: {risk_profile}
Portfolio as of: {data_date}
Context:
{context}"""),
("human", "{question}")
])
# --- RAG chain for portfolio questions ---
portfolio_retriever = portfolio_store.as_retriever(
search_type="hybrid",
search_kwargs={"k": 6, "score_threshold": 0.70}
)
def format_docs(docs):
return "\n\n".join(
f"[Source: {d.metadata.get('source', 'unknown')} | "
f"Date: {d.metadata.get('data_date', 'unknown')}]\n{d.page_content}"
for d in docs
)
portfolio_chain = (
{
"context": portfolio_retriever | format_docs,
"question": RunnablePassthrough(),
"risk_profile": RunnablePassthrough(),
"data_date": RunnablePassthrough()
}
| portfolio_prompt
| llm
| StrOutputParser()
)
# --- Tool: Live market data ---
@tool
def get_market_data(ticker: str) -> dict:
"""Fetch current price, 52-week range, and P/E ratio for a ticker symbol."""
# Alpha Vantage / Yahoo Finance integration
response = market_data_client.get_quote(ticker)
return {
"ticker": ticker,
"price": response["price"],
"52w_high": response["52w_high"],
"52w_low": response["52w_low"],
"pe_ratio": response["pe_ratio"],
"data_timestamp": response["timestamp"]
}
@tool
def get_portfolio_allocation(user_id: str) -> dict:
"""Retrieve current portfolio allocation for a user from Cosmos DB."""
container = cosmos_client.get_container_client("portfolios")
portfolio = container.read_item(user_id, partition_key=user_id)
return {
"user_id": user_id,
"allocations": portfolio["allocations"], # [{ticker, weight, cost_basis}]
"total_value": portfolio["total_value"],
"last_rebalanced": portfolio["last_rebalanced"],
"risk_profile": portfolio["risk_profile"] # conservative | moderate | aggressive
}
@tool
def get_tax_implications(action: str, ticker: str, user_id: str) -> str:
"""Look up tax implications for a trade action — uses regulatory RAG."""
query = f"{action} {ticker} tax implications capital gains"
docs = regulatory_store.similarity_search(query, k=4)
return format_docs(docs)
What LangChain provides here:
AzureChatOpenAI— Azure OpenAI connection with managed identity supportAzureSearch— hybrid retrieval (BM25 + vector) from Azure AI SearchChatPromptTemplate— structured prompt with system + human turns@tooldecorator — converts Python functions into LLM-callable tools with schema generation- LCEL (
|pipe operator) — compose retrieval + prompt + LLM + parser into a typed, streamable chain
Data flowing into LangChain: user question, risk profile, user_id, retrieved chunks from Azure AI Search, live market data from external APIs, portfolio data from Cosmos DB.
Data flowing out: grounded answer string with source citations embedded, or structured tool call results that LangGraph routes between agents.
Layer 3 — LangGraph: Stateful Multi-Agent Orchestration
What it is: LangGraph models an agent system as a typed state graph — nodes are agents or functions, edges define routing logic, and state persists across all nodes in the graph. It is the orchestration layer that LangChain chains cannot provide by themselves.
Where it fits: LangGraph is the top-level runtime. It receives the user query, routes it through specialist agents, manages conversation memory across turns, and coordinates parallel agent execution.
Why LangChain alone is not enough for this use case:
A single LangChain chain cannot:
- Run portfolio analysis and market data retrieval in parallel
- Route to a compliance agent only when the primary answer touches regulated advice
- Maintain multi-turn conversation state (what the user said two turns ago affects this answer)
- Retry a failed tool call without re-running the entire chain
- Branch to a human escalation node when confidence is low
LangGraph solves all of these.
Investment Coach — Agent Graph Design
LangGraph Implementation
# investment_coach/graph.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.aiosqlite import AsyncSqliteSaver
from typing import TypedDict, Annotated, Literal
import operator
import asyncio
# --- Typed state — every node reads from and writes to this ---
class InvestmentCoachState(TypedDict):
# Input
user_id: str
session_id: str
question: str
# User context (loaded at intake)
risk_profile: str # conservative | moderate | aggressive
portfolio: dict # current allocation
conversation_history: list[dict]
# Agent outputs
intent: str # portfolio | market | tax | general
portfolio_analysis: str
market_data: dict
tax_guidance: str
# Synthesis
draft_answer: str
compliance_status: str # PASS | FAIL
compliance_failures: list[str]
compliance_attempts: int
# Final output
final_answer: str
citations: list[dict]
requires_human_review: bool
# --- Node: Intake Agent ---
async def intake_agent(state: InvestmentCoachState) -> dict:
"""Parse intent and load user profile from Cosmos DB."""
# Classify intent
intent_chain = intent_classifier_prompt | llm | StrOutputParser()
intent = await intent_chain.ainvoke({
"question": state["question"],
"history": state["conversation_history"][-3:] # last 3 turns
})
# Load portfolio from Cosmos DB
portfolio = await get_portfolio_allocation.ainvoke({"user_id": state["user_id"]})
return {
"intent": intent.strip().lower(),
"portfolio": portfolio,
"risk_profile": portfolio["risk_profile"]
}
# --- Node: Portfolio Analyst Agent ---
async def portfolio_analyst_agent(state: InvestmentCoachState) -> dict:
"""Run RAG over portfolio documents and allocation data."""
answer = await portfolio_chain.ainvoke({
"question": state["question"],
"risk_profile": state["risk_profile"],
"data_date": state["portfolio"].get("last_rebalanced", "unknown")
})
return {"portfolio_analysis": answer}
# --- Node: Market Data Agent ---
async def market_data_agent(state: InvestmentCoachState) -> dict:
"""Fetch live market data for tickers in user's portfolio."""
tickers = [a["ticker"] for a in state["portfolio"].get("allocations", [])]
market_results = await asyncio.gather(*[
get_market_data.ainvoke({"ticker": t}) for t in tickers[:10]
])
return {"market_data": {r["ticker"]: r for r in market_results}}
# --- Node: Tax Advisor Agent ---
async def tax_advisor_agent(state: InvestmentCoachState) -> dict:
"""Retrieve tax guidance from regulatory RAG index."""
tax_chain = tax_prompt | llm | StrOutputParser()
guidance = await tax_chain.ainvoke({
"question": state["question"],
"portfolio": state["portfolio"]
})
return {"tax_guidance": guidance}
# --- Node: Synthesizer Agent ---
async def synthesizer_agent(state: InvestmentCoachState) -> dict:
"""Combine outputs from all specialist agents into a unified answer."""
synthesis_chain = synthesis_prompt | llm | StrOutputParser()
draft = await synthesis_chain.ainvoke({
"question": state["question"],
"portfolio_analysis": state.get("portfolio_analysis", ""),
"market_data": state.get("market_data", {}),
"tax_guidance": state.get("tax_guidance", ""),
"risk_profile": state["risk_profile"]
})
return {"draft_answer": draft, "compliance_attempts": 0}
# --- Node: Compliance Agent ---
async def compliance_agent(state: InvestmentCoachState) -> dict:
"""Validate answer against financial compliance rules."""
compliance_chain = compliance_prompt | llm | JsonOutputParser()
result = await compliance_chain.ainvoke({
"answer": state["draft_answer"],
"risk_profile": state["risk_profile"]
})
# result: {"status": "PASS"|"FAIL", "failures": [...], "revised_answer": "..."}
return {
"compliance_status": result["status"],
"compliance_failures": result.get("failures", []),
"draft_answer": result.get("revised_answer", state["draft_answer"]),
"compliance_attempts": state.get("compliance_attempts", 0) + 1
}
# --- Node: Response Formatter ---
async def format_response(state: InvestmentCoachState) -> dict:
"""Attach citations, disclaimer, and data timestamps."""
citations = extract_citations(state["draft_answer"])
final = f"""{state['draft_answer']}
---
**Sources:** {', '.join(c['source'] for c in citations)}
**Disclaimer:** This analysis is for informational purposes only and does not constitute licensed financial advice. Consult a licensed financial advisor before making investment decisions.
**Data as of:** {state['portfolio'].get('last_rebalanced', 'unknown')}"""
return {
"final_answer": final,
"citations": citations,
"requires_human_review": False
}
# --- Node: Human Escalation ---
async def escalate_to_human(state: InvestmentCoachState) -> dict:
return {
"final_answer": (
"This question requires review by a licensed financial advisor. "
"A member of our advisory team will follow up within one business day. "
f"Reference: {state['session_id']}"
),
"requires_human_review": True
}
# --- Routing functions ---
def route_by_intent(state: InvestmentCoachState) -> Literal["portfolio", "market", "tax", "synthesizer"]:
intent = state["intent"]
if "portfolio" in intent: return "portfolio"
if "market" in intent: return "market"
if "tax" in intent: return "tax"
return "synthesizer" # general questions go straight to synthesis
def route_compliance(state: InvestmentCoachState) -> Literal["format", "rewrite", "escalate"]:
if state["compliance_status"] == "PASS":
return "format"
if state.get("compliance_attempts", 0) >= 2:
return "escalate"
return "rewrite"
# --- Build the graph ---
def build_investment_coach_graph():
workflow = StateGraph(InvestmentCoachState)
# Add nodes
workflow.add_node("intake", intake_agent)
workflow.add_node("portfolio", portfolio_analyst_agent)
workflow.add_node("market", market_data_agent)
workflow.add_node("tax", tax_advisor_agent)
workflow.add_node("synthesizer", synthesizer_agent)
workflow.add_node("compliance", compliance_agent)
workflow.add_node("format", format_response)
workflow.add_node("escalate", escalate_to_human)
# Entry point
workflow.set_entry_point("intake")
# Conditional routing after intake
workflow.add_conditional_edges(
"intake",
route_by_intent,
{
"portfolio": "portfolio",
"market": "market",
"tax": "tax",
"synthesizer": "synthesizer"
}
)
# All specialist agents feed into synthesizer
workflow.add_edge("portfolio", "synthesizer")
workflow.add_edge("market", "synthesizer")
workflow.add_edge("tax", "synthesizer")
# Synthesizer → Compliance
workflow.add_edge("synthesizer", "compliance")
# Compliance routing
workflow.add_conditional_edges(
"compliance",
route_compliance,
{
"format": "format",
"rewrite": "compliance", # loop back for rewrite
"escalate": "escalate"
}
)
workflow.add_edge("format", END)
workflow.add_edge("escalate", END)
# Persistent memory — multi-turn conversation state
memory = AsyncSqliteSaver.from_conn_string("checkpoints.db")
return workflow.compile(checkpointer=memory)
# --- Entry point ---
graph = build_investment_coach_graph()
async def run_investment_coach(
user_id: str,
session_id: str,
question: str,
conversation_history: list[dict] = None
) -> dict:
result = await graph.ainvoke(
{
"user_id": user_id,
"session_id": session_id,
"question": question,
"conversation_history": conversation_history or []
},
config={"configurable": {"thread_id": session_id}} # LangGraph checkpoint key
)
return {
"answer": result["final_answer"],
"citations": result["citations"],
"requires_human_review": result["requires_human_review"],
"compliance_attempts": result["compliance_attempts"]
}
What LangGraph provides that LangChain alone cannot:
- Typed state graph: every node reads and writes to a shared, typed state object — no passing arguments manually between chains
- Conditional routing: compliance failure routes back for rewrite, second failure escalates — impossible in a linear LangChain chain
- Persistent memory:
AsyncSqliteSaver(or Cosmos DB checkpointer) persists conversation state across turns — the user's risk profile is loaded once and carried through the session - Parallel execution: portfolio, market, and tax agents can run concurrently when the question spans all three
- Retry loops: compliance rewrite loop with max-attempt guard prevents infinite retries
Data flowing through the state graph:
| State Field | Set By | Read By |
|---|---|---|
intent | Intake Agent | Router |
portfolio, risk_profile | Intake Agent | Portfolio, Market, Tax, Synthesizer |
portfolio_analysis | Portfolio Agent | Synthesizer |
market_data | Market Data Agent | Synthesizer |
tax_guidance | Tax Advisor Agent | Synthesizer |
draft_answer | Synthesizer | Compliance |
compliance_status | Compliance Agent | Router |
final_answer, citations | Formatter | API response |
requires_human_review | Escalation Node | Downstream workflow |
Layer 4 — LangSmith: Observability, Evaluation, and Production Reliability
What it is: LangSmith traces every node execution in the LangGraph graph, measures cost and latency per node, stores runs in a dataset for evaluation, and monitors production for regressions. It is the production reliability layer — without it, you are flying blind.
Where it fits: instrumented at startup, invisible at runtime. Every graph.ainvoke() call automatically emits a trace to LangSmith. No code changes required after initial setup.
Setup — One-Line Instrumentation
# investment_coach/config.py
import os
# LangSmith is enabled by setting environment variables before any LangChain import
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.environ["LANGSMITH_API_KEY"]
os.environ["LANGCHAIN_PROJECT"] = "investment-coach-production"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# That's it — every LangGraph node, LangChain chain, and tool call is now traced
What LangSmith Captures Per Run
LangSmith captures for every run:
- Full input/output at every node
- Token usage and cost (prompt + completion) per LLM call
- Latency per node — identifies which agent is the bottleneck
- Tool call inputs and outputs (market data API, Cosmos DB reads)
- Compliance attempts count — high values signal prompt quality issues
- Error traces with full stack for failed runs
Evaluation Datasets and Regression Testing
# eval/langsmith_evaluation.py
from langsmith import Client
from langsmith.evaluation import evaluate, LangChainStringEvaluator
client = Client()
# Create a labeled evaluation dataset
dataset = client.create_dataset(
"investment-coach-eval-v1",
description="200 investment questions with expected answer quality labels"
)
# Add test cases
examples = [
{
"inputs": {
"user_id": "test-user-conservative",
"question": "Should I rebalance my bond allocation given current rates?",
},
"outputs": {
"expected_citations": ["Federal Reserve Policy Statement", "Bond Duration Guide"],
"must_include": ["interest rate risk", "duration"],
"must_not_include": ["guaranteed", "certain return"],
"compliance_required": True
}
},
# ... 199 more examples
]
client.create_examples(inputs=[e["inputs"] for e in examples],
outputs=[e["outputs"] for e in examples],
dataset_id=dataset.id)
# Run evaluation against the graph
results = evaluate(
lambda inputs: run_investment_coach(**inputs), # the LangGraph graph
data=dataset.name,
evaluators=[
LangChainStringEvaluator("criteria", config={
"criteria": {
"groundedness": "Is every factual claim supported by a cited source?",
"compliance": "Does the answer avoid making specific investment guarantees?",
"completeness": "Does the answer address the full question?",
"citation_present": "Does the answer include source citations?"
}
}),
],
experiment_prefix="investment-coach-v1.3",
num_repetitions=3 # run each test 3x — LLM non-determinism
)
Production Monitoring — Automated Alerts
# monitoring/langsmith_monitor.py
from langsmith import Client
from langsmith.schemas import RunTypeEnum
client = Client()
# Create a production monitor
monitor = client.create_monitor(
name="investment-coach-production",
project_name="investment-coach-production",
rules=[
{
"name": "High compliance failure rate",
"filter": "compliance_attempts >= 2",
"threshold": 0.05, # alert if >5% of runs need 2+ compliance attempts
"window_hours": 24,
"severity": "warning"
},
{
"name": "Escalation rate spike",
"filter": "requires_human_review == true",
"threshold": 0.10, # alert if >10% escalate to human
"window_hours": 1,
"severity": "critical"
},
{
"name": "P99 latency breach",
"filter": "total_latency_ms > 15000",
"threshold": 0.02, # alert if >2% of runs exceed 15s
"window_hours": 1,
"severity": "warning"
},
{
"name": "Daily cost spike",
"filter": "cost_usd > 0.05", # per-run cost anomaly
"threshold": 0.01,
"window_hours": 24,
"severity": "info"
}
]
)
Key LangSmith production capabilities:
| Capability | What It Enables |
|---|---|
| Full trace per run | Debug any production failure — see exactly which node failed, with what inputs |
| Cost per node | Identify which agent burns the most tokens — portfolio? synthesis? compliance? |
| Evaluation datasets | Run 200 labeled questions against every new prompt version before deployment |
| Regression testing in CI/CD | langsmith test run as a deployment gate — fails the pipeline if quality drops |
| Production monitors | Alert on compliance failure rate, latency, escalation rate, cost anomalies |
| Human feedback loop | Tag runs as good/bad from the dashboard — builds a labeled dataset for fine-tuning |
The Full Azure Deployment Architecture
Azure-specific decisions in this architecture:
- APIM as the front door: rate limiting protects against cost abuse (financial AI can be expensive per query). Subscription tiers give institutional clients higher limits than retail users.
- Managed identity throughout: LangGraph app → Azure OpenAI, LangGraph app → Cosmos DB, LangGraph app → Azure AI Search — zero secrets in environment variables, all authenticated via Entra ID managed identity.
- Private endpoints for all AI services: Azure OpenAI and Azure AI Search on private endpoints inside a VNet — financial data never crosses the public internet.
- Content Safety at the gateway: screens both input (jailbreak attempts, prompt injection) and output (harmful financial advice) before it reaches or leaves the LangGraph runtime.
- LangSmith alongside Azure Monitor: LangSmith for LLM-specific observability (cost per node, groundedness, compliance failures), Azure Monitor for infrastructure (CPU, memory, HTTP error rates, APIM throttling).
Tool Comparison — When to Use Which
| LangChain | LangGraph | LangFlow | LangSmith | |
|---|---|---|---|---|
| Primary job | LLM + tool abstraction | Stateful multi-agent orchestration | Visual flow prototyping | Tracing, eval, monitoring |
| Who uses it | Engineers | Engineers | Engineers + Business | Engineers + ML teams |
| When | Building chains, tools, RAG | Building multi-step agents | Design, prototype, demo | Testing, scaling, monitoring |
| Requires | Python | Python + LangChain | Browser | LangChain / LangGraph |
| Production-ready | Yes (as a library) | Yes (as a runtime) | No (export first) | Yes (always on) |
| Azure integration | AzureChatOpenAI, AzureSearch | Via LangChain components | Via LangChain export | Azure Monitor integration |
| Investment Coach role | Chain logic, tools, prompts | Agent graph, routing, memory | Prototype → stakeholder sign-off | Every run traced and evaluated |
Decision rules:
- Start with LangChain when the task is a single-step or linear LLM workflow — one chain, one retriever, one output
- Upgrade to LangGraph when you have multiple agents, conditional routing, loops, or multi-turn state that must persist
- Use LangFlow when you need to show non-technical stakeholders the flow, validate logic before writing code, or prototype quickly without scaffolding
- Add LangSmith always — not when you scale, from the first day. Cost data, traces, and eval datasets built in development are valuable from day one, and retrofitting observability after production launch is always painful
Key Takeaways
- The four tools are layers, not alternatives — LangChain is the building block, LangGraph is the orchestration runtime, LangFlow is the design interface, LangSmith is the production safety net. Choosing one over the others leaves a gap.
- LangGraph is the most underused of the four — most teams stop at LangChain chains and wonder why their agents behave non-deterministically. LangGraph's typed state graph, conditional routing, and persistent memory are what make multi-agent systems reliable.
- LangFlow does not belong in your serving path — it is a design and alignment tool. Export the flow, refine it in code, serve via LangGraph + Azure App Service.
- LangSmith pays for itself on day one — the cost breakdown by node shows you exactly which agent is burning your token budget. In most investment coach deployments, the synthesis agent is the most expensive — and also the easiest to optimize with better few-shot examples.
- Compliance-critical AI needs the compliance agent loop — financial, medical, and legal AI cannot rely on a single LLM call to get it right. LangGraph's conditional loop (synthesize → validate → rewrite → validate → escalate) is the pattern that keeps regulated AI deployments out of liability exposure.
- Azure private endpoints + managed identity + APIM is the enterprise control plane — the LangChain/LangGraph application layer is stateless and replaceable; the Azure control plane provides identity, governance, networking, and rate limiting that the OSS tools cannot.
- Observability and evaluation are not optional at production scale — LangSmith evaluation datasets running in CI/CD, combined with Azure Monitor for infrastructure, give you the feedback loop that turns an LLM prototype into a maintained production system.