Azure AI Document Pipeline for SMB Medical Claim Processing
Ingest EOBs and insurance claims from PDFs, repair malformed LLM outputs, and route low-confidence extractions for human review—all with cost-aware caching.
Small medical practices and billing companies manually extract data from Explanation of Benefits (EOB) documents and insurance claims, leading to errors, delays, and high administrative costs.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an automated document pipeline for medical claim processing. You’ll create a Next.js App Router API that ingests claim PDFs and images, extracts text with Azure Document Intelligence, converts it to structured JSON via Azure OpenAI, repairs malformed LLM outputs with @reaatech/structured-repair-core, routes low-confidence extractions to a Postgres-backed human review queue using @reaatech/confidence-router-core, and enforces daily spend limits with @reaatech/agent-budget-engine.
Prerequisites
Node.js >= 22 and pnpm >= 10
An Azure Document Intelligence resource (endpoint + key)
An Azure OpenAI resource with a GPT-4o deployment (endpoint + key + deployment name)
A Postgres database (local or remote) with a review_queue table
A Redis instance (optional — caching degrades gracefully if absent)
A Langfuse project (optional — instrumentation initializes Langfuse at startup if keys are set)
Familiarity with TypeScript, Next.js App Router, and basic Azure resource provisioning
Step 1: Configure environment variables
The pipeline reads its configuration from environment variables. Every variable has a placeholder in .env.example. Copy the file and fill in your real values:
terminal
cp .env.example .env.local
Open .env.local and replace the placeholders. Here’s the full set of variables the pipeline expects:
The Postgres connection string and Langfuse keys are optional — the pipeline works without them, though the review queue and observability features won’t be available.
Expected output: The file exists at the project root and the pipeline reads from process.env at runtime.
Step 2: Install dependencies and verify the scaffold
The project was scaffolded with all dependencies already pinned in package.json. Install them:
terminal
pnpm install
The scaffold includes a few files you’ll work with or replace:
app/page.tsx — a default Next.js page (replace with your own UI later)
src/index.ts — a barrel re-export for all shared types and the schema
tests/index.test.ts — a placeholder test
Run the type checker to confirm the scaffold is healthy:
terminal
pnpm typecheck
Expected output: TypeScript exits with no errors.
Step 3: Define the claim schema and shared types
Medical claims have a fixed structure: patient information, diagnosis codes, procedure codes, amounts. Define it as a Zod schema so every stage of the pipeline — LLM extraction, repair, confidence routing — validates against the same contract.
src/lib/id.ts — wraps the uuid package for generating unique IDs:
ts
import { v4 as uuidv4 } from "uuid";export const generateId = uuidv4;
src/lib/json-utils.ts — a fast first-pass syntax fixer using jsonrepair before handing data to structured-repair-core:
ts
import { jsonrepair } from "jsonrepair";export function preRepair(raw: string): string { return jsonrepair(raw);}
Expected output: Four files under src/lib/ that each export one or two small utilities.
Step 5: Build the Azure Document Intelligence service
src/services/document-intelligence.ts sends a PDF buffer to Azure’s prebuilt-layout model and extracts paragraphs and table cells into a flat key-value map.
Expected output: A module that takes a Buffer and returns a ProcessingResult with structured document fields or an error message.
Step 6: Build the Azure OpenAI LLM service
src/services/llm.ts takes the raw text from Document Intelligence, sends it to an Azure OpenAI GPT-4o deployment with a system prompt that asks for JSON matching the ClaimSchema, and returns the parsed result. It retries up to 3 times on failure via p-retry and truncates text over 128K characters.
ts
import "@azure/openai";import pRetry from "p-retry";import { ClaimSchema, type ExtractedClaim } from "../types/claim.js";import type { ProcessingResult } from "../types/document.js";const MAX_CHARS = 128_000;export async function extractClaimJson(rawText: string): Promise<string> { const truncated = rawText.length > MAX_CHARS ? rawText.slice(0, MAX_CHARS) : rawText; const run = async () => { const endpoint = (process.env.AZURE_OPENAI_ENDPOINT ?? "").replace(/\/$/, ""); const deployment = process.env.AZURE_OPENAI_DEPLOYMENT ?? ""; const url = `${endpoint}/openai/deployments/${deployment}/chat/completions?api-version=${String(2024)}-10-21`; const response = await fetch(url, { method: "POST", headers: { "Content-Type": "application/json", "api-key": process.env.AZURE_OPENAI_KEY ?? "" }, body: JSON.stringify({ messages: [ { role: "system", content: `You are a medical claim extraction assistant. Extract structured JSON from the following document text. The JSON must match this Zod schema exactly:{ patientName: string, patientId: string, dateOfService: string, providerName: string, diagnosisCodes: string[], procedureCodes: string[], totalAmount: number, claimNumber: string, insuranceProvider: string}Return ONLY valid JSON without markdown fences or explanation.` }, { role: "user", content: truncated }, ], max_tokens: 4096, temperature: 0.1, }), }); if (!response.ok) { throw new Error(`Azure OpenAI API error ${String(response.status)}`); } const data = await response.json() as { choices?: Array<{ message?: { content?: string | null } }> }; return data.choices?.[0]?.message?.content ?? ""; }; return pRetry(run, { retries: 3 });}export function mapToSchema(rawJson: string): ProcessingResult<ExtractedClaim> { try { return { success: true, data: ClaimSchema.parse(JSON.parse(rawJson)) }; } catch (err) { return { success: false, error: err instanceof Error ? err.message : "llm returned non-json" }; }}
Note the use of String(2024) in the API version — the Azure OpenAI endpoint expects a string, and this explicit conversion satisfies TypeScript’s strict checking.
Expected output: A module that sends document text to Azure OpenAI and returns either a valid JSON string or a processing error.
Step 7: Build the claim repair service
The LLM sometimes returns malformed JSON — trailing commas, single-quoted strings, or markdown-fenced blocks. The @reaatech/structured-repair-core package handles these cases. src/services/claim-repair.ts wraps it:
ts
import { repair, repairOutput, isValid, analyzeInput } from "@reaatech/structured-repair-core";import { ClaimSchema, type ExtractedClaim } from "../types/claim.js";import type { ProcessingResult } from "../types/document.js";export async function repairExtraction(rawJson: string): Promise<ProcessingResult<ExtractedClaim>> { if (rawJson === "") { return { success: false, error: "empty input" }; } if (isValid(ClaimSchema, rawJson)) { const data = ClaimSchema.parse(JSON.parse(rawJson)); return { success: true, data }; } analyzeInput(rawJson); try { const data = await repair(ClaimSchema, rawJson); return { success: true, data }; } catch (err) { return { success: false, error: err instanceof Error ? err.message : "unknown repair error" }; }}export function repairWithDiagnostics(rawJson: string) { return repairOutput({ schema: ClaimSchema, input: rawJson, debug: false });}
If the JSON is already valid against the schema, it skips repair entirely. If not, analyzeInput() profiles the quality of the input before repair() attempts to fix it.
Expected output: A module that returns repaired claim data or a descriptive error for unrepairable inputs.
Step 8: Build the confidence router
Every extracted claim gets a confidence score based on how many fields were filled. The @reaatech/confidence-router-core package decides whether to route the claim straight through, request human review, or reject it outright.
The engine returns a RoutingDecision whose type field is "ROUTE" (pass through), "CLARIFY" (queue for human review), or "FALLBACK" (rejected). The thresholds come from environment variables, defaulting to 0.8 for route and 0.3 for fallback.
Expected output: A module that takes an extracted claim and a confidence score and returns a routing decision.
Step 9: Build the LLM cache service with Redis
To avoid re-processing similar documents, the pipeline checks a cache before calling Azure Document Intelligence or the LLM. src/services/llm-cache-service.ts wraps the @reaatech/llm-cache package with a Redis storage adapter and a fallback to in-memory storage if Redis is unavailable:
The InMemoryEmbedder returns zero-filled embedding vectors, so exact-match caching works even without an external embedding API. If Redis is unreachable, the cache falls back to pure in-memory storage.
Expected output: A cache module that stores and retrieves claim extractions keyed by document hash.
Step 10: Build cost telemetry and budget enforcement
Every LLM call records its cost. src/services/cost-telemetry.ts uses @reaatech/llm-cost-telemetry to build a CostSpan:
src/services/budget-service.ts enforces a daily spend limit using @reaatech/agent-budget-engine. It defines a budget for the claims-pipeline scope and checks every extraction against it:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { BudgetScope } from "@reaatech/agent-budget-types";import { generateId } from "@reaatech/llm-cost-telemetry";import { BudgetExceededError } from "../lib/errors.js";const DAILY_BUDGET = Number(process.env.BUDGET_DAILY_LIMIT) || 10.0;class InMemorySpendStore extends SpendStore { private _map = new Map<string, number>(); record(entry: { scopeType: string; scopeKey: string; cost: number }): number { const key = `${entry.scopeType}:${entry.scopeKey}`; const current = this._map.get(key) ?? 0; this._map.set(key, current + entry.cost); return 1; } getSpend(scopeType: string, scopeKey: string): number { return this._map.get(`${scopeType}:${scopeKey}`) ?? 0; } getTotal(scopeType: string, scopeKey: string): number { return this._map.get(`${scopeType}:${scopeKey}`) ?? 0; } reset(scopeType: string, scopeKey: string): void { this._map.delete(`${scopeType}:${scopeKey}`); }}const store = new InMemorySpendStore();const controller = new BudgetController({ spendTracker: store });controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: "claims-pipeline", limit: DAILY_BUDGET, policy: { softCap: 0.8, hardCap: 1.0 },});export function checkBudget(estimatedCost: number) { const result = controller.check({ scopeType: BudgetScope.User, scopeKey: "claims-pipeline", estimatedCost, modelId: "azure-gpt-4o", tools: [], }); if (!result.allowed) { throw new BudgetExceededError("budget exceeded"); } return result;}export function recordSpend(cost: number, inputTokens: number, outputTokens: number): void { controller.record({ requestId: generateId(), scopeType: BudgetScope.User, scopeKey: "claims-pipeline", cost, inputTokens, outputTokens, modelId: "azure-gpt-4o", provider: "azure", timestamp: new Date(), });}export function getBudgetState() { return controller.getState(BudgetScope.User, "claims-pipeline") ?? { state: "Active" as const, spent: 0, remaining: DAILY_BUDGET };}
Expected output: Two modules — one records per-call cost telemetry, the other enforces a daily dollar budget and throws BudgetExceededError when exhausted.
Step 11: Build the image preprocessor and review queue
Image files (JPEG, PNG, TIFF, BMP) are preprocessed with sharp before document analysis. src/services/image-preprocessor.ts:
ts
import sharp from "sharp";export async function preprocessDocument(buffer: Buffer): Promise<Buffer> { try { const result = await sharp(buffer) .resize({ width: 2000, withoutEnlargement: true }) .jpeg({ mozjpeg: true }) .toBuffer(); return result; } catch { console.warn("image preprocessing failed, returning original buffer"); return buffer; }}export function isImage(filename: string): boolean { return /\.(jpe?g|png|tiff?|bmp)$/i.test(filename);}
The Postgres-backed review queue stores low-confidence claims. src/services/review-queue.ts uses the postgres package:
ts
import postgres from "postgres";import type { ExtractedClaim } from "../types/claim.js";import type { ReviewQueueEntry } from "../types/document.js";const sql = postgres(process.env.POSTGRES_URL ?? "");export async function queueForReview(entry: ReviewQueueEntry): Promise<void> { try { await sql` insert into review_queue (id, claim_data, confidence, reason, created_at) values (${entry.id}, ${sql.json(entry.claim)}, ${entry.confidence}, ${entry.reason}, ${entry.createdAt}) `; } catch (err) { console.error("failed to queue review entry:", err); }}export async function getReviewQueue(): Promise<ReviewQueueEntry[]> { try { const rows = await sql` select id, claim_data, confidence, reason, created_at from review_queue where status = 'pending' order by created_at asc `; return rows.map((row: Record<string, unknown>) => ({ id: row.id as string, claim: row.claim_data as ExtractedClaim, confidence: row.confidence as number, reason: row.reason as string, createdAt: row.created_at as Date, status: "pending" as const, })); } catch (err) { console.error("failed to get review queue:", err); return []; }}export async function resolveReview(id: string, approved: boolean): Promise<void> { try { await sql` update review_queue set status = ${approved ? "approved" : "rejected"} where id = ${id} `; } catch (err) { console.error("failed to resolve review:", err); }}
Make sure your Postgres database has a review_queue table. If process.env.POSTGRES_URL is not set, the review queue operations will silently fail and claims will not be queued for review.
Expected output: Two modules — one preprocesses scanned images, the other provides CRUD operations on a Postgres review queue.
Step 12: Build the pipeline orchestrator
src/services/pipeline.ts is the central orchestrator that wires everything together. It runs inside a concurrency limiter (max 3 documents at once). Here’s the flow:
If the file is an image, preprocess it with sharp.
Compute a SHA-256 hash and check the Redis-backed cache.
If cache hit, return immediately — skip all downstream work.
Check the daily budget — if exceeded, return error without calling Azure.
Analyze the document with Azure Document Intelligence.
Send the raw text to Azure OpenAI for structured JSON extraction.
Repair malformed JSON with structured-repair-core.
Parse the result through the Zod schema.
Record LLM cost telemetry and update the budget tracker.
Store the result in the cache.
Evaluate confidence — if below threshold, queue for human review.
Return the final result with confidence and status.
ts
import { createHash } from "node:crypto";import { analyzeDocument } from "./document-intelligence.js";import { extractClaimJson } from "./llm.js";import { repairExtraction } from "./claim-repair.js";import { evaluateClaim } from "./confidence-router.js";import { checkBudget, recordSpend } from "./budget-service.js";import { recordLlmCall } from "./cost-telemetry.js";import { preprocessDocument, isImage } from "./image-preprocessor.js";import { queueForReview } from "./review-queue.js";import { generateId } from "../lib/id.js";import { ClaimSchema, type ExtractedClaim }
Expected output: A single processDocument() function that runs the full pipeline — cache check → budget → DI → LLM → repair → confidence → review queue — and never throws uncaught errors.
Step 13: Create the API routes
The pipeline is exposed through four Next.js App Router route handlers.
app/api/claims/upload/route.ts — accepts a file upload via multipart/form-data:
Expected output: A configured next.config.ts with instrumentationHook: true and a register() function that initializes Langfuse at server startup.
Step 15: Run the type checker and tests
Verify the whole project compiles:
terminal
pnpm typecheckpnpm lint
Then run the test suite — every service, route handler, and utility has tests covering happy paths, error paths, and edge cases:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
The test suite covers:
The full pipeline happy path (buffer in → claim data out)
Cache hit short-circuit (no DI or LLM calls on repeat documents)
Budget denial (pipeline returns error without calling Azure)
LLM failure and repair recovery
Low-confidence extraction queueing for review
Image preprocessing for JPEG/PNG/TIFF/BMP files
Document Intelligence failure propagation
Each route handler (200, 400, 413, 422, 500 status codes)
Review queue CRUD operations
Confidence router at boundary thresholds (0.8, 0.5, 0.15)
Expected output:numFailedTests=0 and all coverage metrics (lines, branches, functions, statements) at 90% or higher for the src/ and app/api/ runtime code.
Next steps
Add an embedding model for semantic caching — replace the stub InMemoryEmbedder with a real Azure OpenAI embedding deployment so the cache can match semantically similar claims, not just exact duplicates.
Deploy with a Postgres migration — create a db/migrations/ folder with a SQL script that creates the review_queue table, and integrate node-pg-migrate or a similar tool so the pipeline is self-provisioning on deploy.
Build a review dashboard — replace the placeholder app/page.tsx with a React dashboard that shows pending reviews, lets human reviewers approve or reject claims, and displays budget usage and cost telemetry.
from
"../types/claim.js"
;
import type { ProcessingResult, ReviewQueueEntry } from "../types/document.js";
import { limit } from "../lib/concurrency.js";
import { createCacheEngine, checkCache, storeCache } from "./llm-cache-service.js";
const ESTIMATED_COST = 0.05;
let cacheEngine: Awaited<ReturnType<typeof createCacheEngine>> | null = null;