Writing | Shiv Ojha

Writing

Long-form thinking on Azure, AI systems, enterprise architecture, and the craft of building things that last.

April 27, 2026

Python Runs AI — TensorFlow, PyTorch, and Scikit-learn Decoded for Enterprise Architects

Python is not just a language for AI — it is the operating system every framework, cloud SDK, and ML pipeline is built on top of, and choosing the wrong framework for the wrong problem costs teams months of rework.

pythontensorflowpytorchscikit-learnmachine-learningdeep-learningenterprise-aiazuremlops

April 25, 2026

AI Ethics, Bias Detection, and Fairness in Production — The Architect's Complete Guide

Undetected bias in production AI does not stay hidden — it surfaces as discriminatory loan decisions, inequitable healthcare triage, and regulatory violations. This post covers what AI bias is, why it is a production architecture problem not just an ethics problem, and how to implement bias detection and fairness controls on Azure and open source.

ai-ethicsbias-detectionfairnessazureresponsible-aillmenterprise-aihipaafintechmlopsllmops

April 24, 2026

AI Compliance in Production — Finance, Healthcare, and the Enterprise Architect's Playbook

AI compliance is not a checkbox. In financial services, SR 11-7 requires model validation, risk tiering, and ongoing monitoring for every LLM in production. In healthcare, HIPAA mandates PHI controls and audit trails. The EU AI Act adds conformity assessments for high-risk AI. This post covers the full compliance stack — regulation, architecture, code, and real examples.

ai-compliancesr-11-7hipaaeu-ai-actazurellmenterprise-aifintechhealthcareauditgovernanceresponsible-ai

April 24, 2026

LangChain, LangGraph, LangFlow, and LangSmith — The Complete Enterprise AI Stack

LangChain, LangGraph, LangFlow, and LangSmith are not competing tools — they are four layers of the same enterprise AI stack. This post shows where each fits, how they wire together, and what the architecture looks like in production using an Investment Coach AI as the reference system.

langchainlanggraphlangflowlangsmithazureenterprise-aiai-agentsragllmarchitectureinvestment-aiproduction

April 22, 2026

MLOps vs LLMOps — The Complete Architect's Deep Dive

MLOps and LLMOps share the same goal — reliable AI in production — but solve fundamentally different problems. This deep dive covers the full lifecycle, tooling, cost models, and what an AI architect must get right for each.

mlopsllmopsazuremlflowkubeflowlangfuseazure-ai-foundrydatabricksproductionenterprise-aigovernanceobservabilitycost

April 21, 2026

RAG Streaming Responses in Production — How to Fix the 30-Second Freeze

A RAG system that makes users wait 30 seconds before showing anything is not a latency problem — it's a UX architecture problem. Here's how to design progressive streaming so users see useful tokens in under 2 seconds while the full pipeline continues running.

ragstreamingssewebsocketllmazure-openaiproductionenterprise-ailatencyobservability

April 20, 2026

Azure Prompt Flow — The Orchestration Layer Your Enterprise AI Platform Is Missing

Prompt engineering without Prompt Flow is scripting. With it, it's platform engineering. Here's how Azure Prompt Flow fits into enterprise AI architecture — orchestration, evaluation, governance, and production deployment controls.

azureprompt-flowazure-ai-foundryllmragenterprise-aiorchestrationevaluationgovernanceobservability

April 20, 2026

Enterprise AI Platform Comparison 2026 — The Architect's Decision Guide

Most enterprises pick a great AI framework for one layer and discover they have no governance, tracing, or release gates for the rest. This guide compares Azure, AWS, Google, Databricks, Snowflake, Oracle, IBM, and the key open-source tools across the six jobs every enterprise AI platform must cover — with a final recommended architecture.

azureawsgcpdatabrickssnowflakelangchainllamaindexenterprise-aiplatform-engineeringarchitectureai-governance

April 19, 2026

Embedding Models in Production: How They Work, How They're Built, and Which One to Use

A deep technical guide to embedding models — how transformers produce vectors, how embedding models are trained, and a complete production comparison of open source vs Azure models across cost, latency, and multilingual support.

embeddingsragazure-openaisentence-transformersmultilingualvector-searchenterprise-ai

April 19, 2026

Prompt Engineering in Production — Part 1: Anatomy, Storage, and Versioning

Prompts are not strings in your code. They are versioned, audited, environment-aware artifacts stored in Cosmos DB and Git, retrieved via a Prompt SDK, and deployed with the same discipline as application code. Part 1 of 4.

prompt-engineeringllmazure-openaicosmos-dbsemantic-kernellangfuseproductionenterprise-aiversioning

April 19, 2026

Prompt Engineering in Production — Part 2: Multi-User, Multi-Tenant, and Organizational Management

How to route different users to different prompt configurations, isolate tenants at the prompt layer, manage prompts across business units and teams, and build approval workflows with fallback chains. Part 2 of 4.

prompt-engineeringmulti-tenantrbacenterprise-aillmazure-openaisemantic-kernelproductiongovernance

April 19, 2026

Prompt Engineering in Production — Part 3: Security, Governance, and Compliance

Prompt injection, jailbreaking, extraction attacks, indirect injection via RAG chunks, compliance audit trails, change governance, and drift detection. The security and governance layer every enterprise LLM system needs. Part 3 of 4.

prompt-engineeringprompt-injectionllm-securityai-governancecomplianceaudit-traildrift-detectionazure-content-safetyenterprise-ai

April 19, 2026

Prompt Engineering in Production — Part 4: Observability, Cost Governance, and Testing

How to observe prompt behavior in production, govern token costs with caching and compression, run statistically valid A/B tests on prompt versions, use feature flags for prompt components, and enforce structured output. Part 4 of 4 — the complete open source vs Azure tooling reference.

prompt-engineeringobservabilityllm-costprompt-cachinga-b-testingfeature-flagsstructured-outputguardrailsazure-openaienterprise-ai

April 19, 2026

Chunking Is the Most Underestimated Decision in RAG — Here's How to Get It Right

Wrong chunk size breaks retrieval before a single query runs. A complete guide to every chunking strategy — fixed-size, recursive, semantic, document-aware, late chunking — with open source and Azure implementation and what we run in production at MortgageIQ.

ragchunkingllama-indexazure-ai-searchembeddingsretrievalenterprise-ai

April 19, 2026

Your Knowledge Isn't in PDFs. How to Index Every Enterprise Data Source into RAG.

Enterprise knowledge lives in SQL databases, SharePoint, emails, APIs, Teams, and code repos — not just PDFs. A complete guide to extracting, chunking, and embedding every major data source type into a production RAG index.

ragdata-ingestionsharepointazure-ai-searchllama-indexembeddingsenterprise-aistructured-dataunstructured-data

April 19, 2026

Why Your LLM Doesn't Know What Happened Last Tuesday

Standard LLMs have hard limits: training cutoffs, no private data, no source traceability. RAG fixes all three — but only if your retrieval layer is built right.

ragretrievalembeddingshybrid-searchrerankingllmazure-openai

April 19, 2026

Every RAG Pattern Explained — and Which One to Run in Production

From naive RAG to auto-merging and hierarchical retrieval — every major pattern mapped to real open source and Azure tooling, plus what we run in production at MortgageIQ.

ragretrievalllama-indexsemantic-kernelazure-ai-searchembeddingshybrid-searchrerankingenterprise-ai

April 19, 2026

Every Search Algorithm Explained — Azure AI Search vs Open Source, With Code

A complete guide to every search algorithm used in enterprise RAG — keyword BM25, vector HNSW/eKNN, hybrid RRF, semantic reranking, and multimodal search — with Python code for Azure AI Search and open source stacks, and a performance comparison of each.

ragazure-ai-searchvector-searchhybrid-searchbm25hnswrrfsemantic-rankermultimodalqdrantelasticsearchenterprise-ai

April 19, 2026

RAG vs Fine-Tuning vs AI Agents vs Traditional ML — How to Choose the Right AI Strategy

The most consequential AI architecture decision isn't which model to use — it's which paradigm to build. A complete decision framework for RAG vs fine-tuning vs AI agents vs traditional ML, with real enterprise examples and the three RAG paradigms explained.

ragfine-tuningai-agentsmachine-learningazure-openaienterprise-aillmarchitecture-decision

April 19, 2026

Vector Database Showdown: Pinecone vs pgvector vs Azure AI Search vs Weaviate vs Qdrant in Production

A production-focused comparison of the five major vector databases — indexing speed, query latency, hybrid search, filtering precision, cost at scale, and enterprise readiness. What we actually run at MortgageIQ and when each database wins.

vector-databasepineconepgvectorazure-ai-searchweaviateqdrantragembeddingsenterprise-aiproduction

April 3, 2026

Why Your Multi-Region Azure Architecture Will Fail — And the Three Rings That Prevent It

Most teams deploy to two Azure regions and call it resilient. True resilience is three concentric rings — each owning a different failure scope, each protecting a different blast radius.

azureaksapimazure-front-doorkubernetesresiliencemulti-region

March 24, 2026

Two Models, One Platform: Fine-Tuning XGBoost and GPT-4o in Azure for ABC Pizza

ABC Pizza needed two custom models — one to predict dispatch time, one to explain it. XGBoost on Azure ML for the number. GPT-4o on Azure AI Foundry for the language. Same platform, two completely different fine-tuning paradigms. Here's how both work and when to use each.

fine-tuningazureopenaimachine-learningxgboostazure-mlai-foundryabc-pizzatabular-ml

March 24, 2026

Fine-Tuning GPT-4o on Borrower Conversations: How MortgageIQ Learned to Speak Like a Loan Officer

GPT-4o could explain a payment change. But it explained it like a contract, not like a loan officer. Fine-tuning on 50,000 scrubbed borrower Q&A transcripts changed that — and cut repeat call volume by 34%.

fine-tuningazureopenaimortgageiqfintechresponsible-ainlpai-foundryborrower-experience

March 24, 2026

The Complete Azure AI Stack: 9 Layers, 40+ Services, One Reference

A complete end-to-end map of the Azure AI stack — from user channels through governance — with What/Why/How/When/Who for every layer and Key Notes for architecture reviews.

azureaiarchitectureazure-openaimlopssecurityenterprise

March 23, 2026

AI Governance for Regulated Industries: How to Build Accountability into the Architecture

AI governance in a regulated industry is not a compliance checkbox. It is an architectural property — either the system produces an auditable evidence chain for every AI decision, or it doesn't.

ai-governanceresponsible-aiazurefintechcompliancemodel-riskaudit-trail

March 23, 2026

AI Guardrails in Production: How to Build Safety into the Inference Path

Guardrails are not a feature you add at the end. They are constraints you design into the inference path from day one — and in a regulated domain, the architecture enforces them, not the model.

azureopenaiguardrailsresponsible-airaghallucinationgrounding

March 23, 2026

Azure AI Foundry Model Selection: The Mental Model for Picking the Right Model

There are four model families in Azure AI Foundry, each with a distinct job. Picking the wrong one is the most common — and most expensive — mistake in enterprise AI.

azureopenaiai-foundrygpt-4oragmodel-routingfinops

March 23, 2026

Event-Driven AI: How Kafka Delivers Real-Time Context to LLMs

Most enterprise AI reads stale data from a database. The architectures that scale read live events from a stream — and Kafka is what makes the difference between an AI that knows what happened and one that knows what's happening.

kafkaazureopenaievent-drivenragai-agentsuwmfintech

March 23, 2026

Supervised, Unsupervised, Reinforcement Learning: How ABC Pizza Picked the Wrong One First

There are four learning paradigms in ML. Picking the wrong one doesn't mean your model fails — it means your model succeeds at solving the wrong problem. ABC Pizza learned this the hard way.

machine-learningsupervised-learningunsupervised-learningreinforcement-learningdeep-learningabc-pizzafoundations

March 23, 2026

What Is Machine Learning? A Production Engineer's Map of the Whole Landscape

ML is not one thing — it is a stack of decisions. This is the map of the entire landscape, from raw data to deployed model, told through the story of how ABC Pizza replaced a rules engine that broke at 3AM in Tokyo.

machine-learningmlopsazurefoundationsabc-pizzasupervised-learningdeep-learning

March 22, 2026

FinOps for AI: Why Your GPT-4o Bill Will Surprise Your CFO

Token costs are predictable — if you design for them. Most teams don't. Here's how to build a model routing and cost governance strategy before the invoice arrives.

azureopenaifinopscostmodel-routingprompt-cachinggenai

March 22, 2026

Prompt Engineering Is Software Engineering. Treat It That Way.

A prompt is an API contract with the model. Version it, test it, and evaluate it — the same way you would any other interface your system depends on.

prompt-engineeringazureopenairagstructured-outputgrounding

March 22, 2026

RAG Is Not the Hard Part. Retrieval Is.

Most RAG failures are retrieval failures — the model receives bad context and generates a bad answer. Here's how to diagnose and fix the retrieval layer.

ragazureopenairetrievalchunkingembeddingsgrounding

March 22, 2026

The Enterprise GenAI Stack on Azure: What Actually Works in Production

The Azure GenAI platform is five services working together — and most teams assemble them wrong. Here's what the integrated stack looks like and why each component exists.

azureopenaigenairagsemantic-kernelai-foundryarchitecture

March 18, 2026

Service-to-Service Communication in a Microservices World

How modern architectures solve east-west service communication — mTLS, SPIFFE SVIDs, sidecar proxies, and zero-trust authorization at scale.

microservicesservice-meshmtlsconsulzero-trustarchitecture