← All Projects
ai-agentsproductionMarch 22, 2026azureopenairagdotnetreactprompt-engineeringgrounding

MortgageIQ — Azure AI Loan Copilot

A domain-grounded mortgage assistant that grounds every GPT-4o answer in a versioned loan knowledge base — with clickable citations back to the source document.

GitHub →
.NET 10C# 13Azure OpenAI GPT-4oReact 19TypeScript 5.9ASP.NET Core

Problem

Mortgage borrowers face hundreds of questions before closing — credit scores, DTI limits, down payments, closing costs. Generic AI answers are unreliable because the model generates from training data, not from actual loan guidelines. The answer to "What credit score do I need for an FHA loan?" changes depending on lender, year, and loan type. A hallucinated answer in a financial context isn't just wrong — it erodes trust and exposes the institution to liability.

The challenge isn't connecting to an LLM. It's connecting the LLM to the right knowledge, reliably, with a traceable evidence chain.


Solution

MortgageIQ is a RAG-based loan assistant built on Azure OpenAI GPT-4o. It answers borrower questions by first retrieving relevant passages from a curated, version-controlled loan knowledge base — then composing a grounded prompt that tells the model: answer only from what you've been given, and cite your source.

Three properties distinguish this from a ChatGPT wrapper:

  1. Groundedness — every answer is drawn from retrieved document chunks, not the model's parametric memory
  2. Traceability — every response includes structured sources[] metadata; borrowers can click through to the exact source document
  3. Auditability — the knowledge base lives in the repository alongside the code; every change goes through a pull request review

MortgageIQ — Home Screen with grounded answer and source chips


What It Answers (Phase 4A — Basic)

The knowledge base covers the five most common pre-approval question categories:

Borrower asksKnowledge base source
What credit score do I need for an FHA loan?fha-loan-requirements.md — Credit Score Requirements
How much should I save for closing costs?closing-costs.md — Closing Cost Breakdown
What documents do I need for pre-approval?pre-approval-process.md
What is the difference between pre-qual and pre-approval?pre-approval-process.md
How does my credit score affect my interest rate?credit-score-guidelines.md

Questions outside this domain — live rates, underwriter decisions, property appraisals — fall through to the model's general knowledge with a disclaimer. This boundary is deliberate: the system is designed to be confidently grounded within scope and transparently limited outside it.


Architecture

System Context

The request flow has one rule: the model never answers from memory alone.

Component Architecture

The retrieval layer is intentionally separated from the API layer behind an interface. This is the architectural decision that makes Phase 4B (Azure AI Search) a one-line dependency injection swap.


How RAG Works — Inside the Retrieval Pipeline

This is where the architectural decisions live. The retrieval pipeline has four stages: chunking, tokenization, scoring, and token budgeting. Understanding each stage explains both why the system works and exactly where it breaks.

Stage 1: Chunking

Each knowledge base document is split into sections at ## markdown headers. This produces semantically coherent chunks — each chunk covers one concept (Credit Score Requirements, Down Payment Requirements, etc.) rather than arbitrary character counts.

fha-loan-requirements.md  →  6 chunks
  ├── fha loan requirements — Credit Score Requirements
  ├── fha loan requirements — Down Payment Requirements
  ├── fha loan requirements — Debt-to-Income Ratio
  ├── fha loan requirements — Loan Limits
  ├── fha loan requirements — Mortgage Insurance Premium
  └── fha loan requirements — Property Requirements

Across 5 files, this produces approximately 28 chunks held in memory and scored on every query.

Why section-based chunking? Fixed-size chunking (e.g., 512 tokens) splits mid-sentence and loses section context. Semantic chunking at section boundaries keeps related content together — a borrower question about FHA credit scores gets the full Credit Score Requirements section, not a fragment that starts mid-paragraph.

Stage 2: Tokenization and Scoring

The query "What credit score do I need for an FHA loan?" is tokenized with stop word filtering:

Raw terms:     [what, credit, score, do, i, need, for, an, fha, loan]
After filter:  { credit, score, need, fha, loan }  ← 5 meaningful terms

Each chunk is scored by keyword overlap:

score = overlapping query terms in chunk / total query terms

For the FHA Credit Score chunk:

"credit" ✓  "score" ✓  "need" ✗  "fha" ✓  "loan" ✓
score = 4/5 = 0.80

Top-scoring chunks for this query:

ChunkScore
fha loan requirements — Credit Score Requirements0.80
credit score guidelines — Credit Score Ranges0.80
fha loan requirements — Down Payment Requirements0.40
conventional loan requirements — Credit Score Reqs0.40
closing costs — What Are Closing Costs0.00 → filtered

Stage 3: Token Budget Enforcement

Before assembling the prompt, each surviving chunk is checked against a 2,000-token budget. The budget prevents context overflow — GPT-4o's context window is large, but unbounded context injection degrades answer quality and drives up cost.

Chunk 1: ~155 tokens  ✓  (running total: 155)
Chunk 2: ~185 tokens  ✓  (running total: 340)
Chunk 3: ~140 tokens  ✓  (running total: 480)
→ All 3 fit. Budget enforced before prompt assembly.

Stage 4: Prompt Composition

The retrieved chunks are injected into the system message as structured context. The model is instructed to answer only from the provided context and to cite its source:

System:
  You are a helpful mortgage and loan assistant. Give concise, practical guidance,
  note when rules vary by lender or location, and avoid pretending to know
  borrower-specific facts you were not given.

  Use the following loan knowledge to answer accurately:

  [fha loan requirements — Credit Score Requirements]
  FHA loans require a minimum credit score of 580 to qualify for the 3.5% down payment option.
  Borrowers with scores between 500–579 may be eligible with 10% down...

  [credit score guidelines — Credit Score Ranges]
  ...

  If the provided context does not address the question, answer from your general
  knowledge and say so.

User: What credit score do I need for an FHA loan?

This pattern — grounded system prompt, constrained generation instruction, explicit citation requirement — is what separates a production RAG system from a demo. The model can only hallucinate if it ignores the system message; the system is designed to make that path explicit and detectable.


Request Flows

Grounded Answer (Retrieval Hit)

Graceful Degradation (Retrieval Miss)

The retrieval-miss tag is a first-class observability signal. It tells you which questions fall outside your knowledge base — which is exactly the input you need to decide what to add next.


Code

The Retrieval Abstraction

The most important line of code in this project is the interface:

// src/RetrievalService/IRetrievalService.cs
public interface IRetrievalService
{
    Task<IReadOnlyList<RetrievalResult>> QueryAsync(
        string question,
        CancellationToken cancellationToken);
}

Every retrieval backend — local keyword search today, Azure AI Search in Phase 4B, a vector database in Advanced — implements this interface. AzureOpenAiChatResponder depends on IRetrievalService, not on any concrete retrieval implementation. The upgrade path is a dependency injection registration change:

// Phase 4A (Basic) — local keyword search
builder.Services.AddSingleton<IRetrievalService>(sp =>
    new LocalFileRetriever(sp.GetRequiredService<IOptions<RetrievalOptions>>().Value));

// Phase 4B (Intermediate) — Azure AI Search hybrid retrieval
builder.Services.AddSingleton<IRetrievalService>(sp =>
    new AzureSearchRetriever(sp.GetRequiredService<IOptions<AzureSearchOptions>>().Value));

AzureOpenAiChatResponder, the REST endpoints, ChatResult, and the React frontend do not change between phases.

Prompt Composition

// src/ChatApi/Services/AzureOpenAiChatResponder.cs
private static string BuildSystemPrompt(string basePrompt, IReadOnlyList<RetrievalResult> sources)
{
    if (sources.Count == 0)
        return basePrompt;

    var contextBlock = string.Join("\n\n", sources.Select(s =>
        $"[{s.SourceName}]\n{s.Snippet}"));

    return $"{basePrompt}\n\nUse the following loan knowledge to answer accurately:" +
           $"\n\n{contextBlock}\n\n" +
           $"If the provided context does not address the question, " +
           $"answer from your general knowledge and say so.";
}

The citation requirement in the system prompt is reinforced by returning structured sources[] from the retrieval layer — the UI renders these as clickable chips regardless of what the model says. Citations are retrieval metadata, not model output. This eliminates the hallucination risk in the citation chain entirely.


Tradeoffs

These are the tradeoffs accepted deliberately in Phase 4A. Each one is a conscious scope decision, not a gap.

AreaWhat was acceptedWhyFixed in
Retrieval qualityKeyword overlap only — "cash upfront" won't match "closing costs"Zero infrastructure; proves the RAG composition pattern end-to-endPhase 4B (Azure AI Search vector + BM25 hybrid)
Chunking at query timeFiles re-read and chunked on every requestSimplicity over performance; acceptable at demo scalePhase 4B (pre-built persistent index)
Token estimatelength / 4 approximation, no real tokenizerAvoids a tokenizer dependency in Phase 4A; accurate enough for English prosePhase 4B
No streamingFull response buffered before sending to UIFrontend simplicity; streaming requires coordinated backend + frontend changeAdvanced
No semantic cacheEvery query calls Azure OpenAILow volume; cache adds Redis/Cosmos dependency not justified at demo scaleAdvanced
No authAPI has no authenticationLocal development onlyAdvanced
No evaluation pipelineResponse quality tracked via tags onlyTags provide sufficient signal at this scale; formal RAGAS eval deferredAdvanced

The tradeoff that matters most for Phase 4A to 4B: keyword retrieval misses semantic queries. "How much cash do I need upfront?" fails to match "closing costs" because no words overlap after tokenization. This is a known, bounded limitation — not a bug. It's the exact problem Azure AI Search's vector retrieval solves.


Phase Roadmap

The abstraction investment in Phase 4A — the IRetrievalService interface — exists entirely to make this progression smooth. The architecture is designed to evolve, not to be replaced.

Phase 4B changes: Swap LocalFileRetrieverAzureSearchRetriever. Add text-embedding-3-small for query embedding. CI-triggered re-indexing on data/loan-kb/ file changes. No changes to ChatApi, no changes to the React frontend.


Tech Stack

LayerTechnologyVersionRole
FrontendReact19.2.4Chat UI, message rendering, source chips
FrontendTypeScript5.9Type-safe API contract
FrontendVite8.xDev server, build, HMR
BackendASP.NET Core.NET 10REST API, static file serving
BackendC#13Application language
BackendAzure.AI.OpenAI SDK2.1.0Azure OpenAI client
AIAzure OpenAI GPT-4oChat completion
RetrievalLocalFileRetrieverPhase 4AKeyword scoring, section chunking, token budget
Knowledge BaseMarkdown files5 documentsVersioned in repo, human-auditable

Impact

This project demonstrates the foundational pattern that every enterprise GenAI system is built on: grounding LLM output in a controlled knowledge source with a traceable evidence chain. The techniques here — RAG, section-based chunking, token budget management, system prompt composition, retrieval abstraction — are the same patterns used at production scale in regulated financial and healthcare systems.

The architecture is sized for a demo. The decisions are sized for production.


Related Blog Articles

Each concept in this project is covered in depth as a standalone post: