Google Gemini Medical Claim Extraction for SMB Practices
Automatically pull patient demographics, diagnosis codes, and billing line items from scanned claim forms and PDFs, with built-in PII redaction and audit.
Small medical practices spend 5–8 hours per week manually rekeying data from faxed or scanned claim forms. Errors in insurance coding cause denials and delayed payments, while privacy regulations demand strict data handling.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an end-to-end medical claim extraction pipeline for small medical practices. You’ll create a Next.js API that accepts uploaded claim-form PDFs, extracts text via LlamaParse with an OCR fallback, sends the text to Google Gemini for structured data extraction, repairs and validates the JSON output against a Zod schema, redacts PII through a guardrail chain, tracks cost via LLM cost telemetry, enforces a daily budget cap, and maintains session state across batch processing. The pipeline is backed by BullMQ for async job processing and Supabase for storage.
This tutorial is for developers familiar with TypeScript and Next.js who want to see how multiple REAA packages snap together into a document pipeline.
A Supabase project with storage and a claim_extractions table
A LlamaCloud API key (for LlamaParse PDF parsing)
Redis server running locally on port 6379 (for BullMQ)
Basic familiarity with Next.js App Router route handlers
Step 1: Set up environment variables
The scaffold ships a .env.example with placeholder entries. Copy it to .env.local and fill in your credentials:
terminal
cp .env.example .env.local
The file defines every variable the pipeline reads:
env
# Env vars used by google-gemini-medical-claim-extraction-for-smb-practices.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentGEMINI_API_KEY=<your-gemini-api-key>GOOGLE_GENAI_USE_ENTERPRISE=falseGOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1LLAMA_CLOUD_API_KEY=<your-llamacloud-key>SUPABASE_URL=<your-supabase-project-url>SUPABASE_SERVICE_ROLE_KEY=<your-service-role-key>REDIS_URL=redis://127.0.0.1:6379
GEMINI_API_KEY is for consumer API key mode. If you use GCP Vertex AI, set GOOGLE_GENAI_USE_ENTERPRISE=true and fill in GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION. LLAMA_CLOUD_API_KEY powers LlamaParse for PDF text extraction. SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY point to your Supabase project. REDIS_URL defaults to a local Redis instance.
Step 2: Define the claim extraction schemas
Every service in the pipeline shares the same type definitions. Create src/schemas/claim.ts with Zod schemas for patient demographics, diagnoses, billing line items, and the top-level claim form:
PatientDemographicsSchema covers the patient info block with optional address and phone fields. DiagnosisCodeSchema and BillingLineItemSchema model ICD codes and CPT-coded line items. ClaimFormSchema is the top-level shape Gemini must produce — patient, diagnoses, line items, provider NPI, claim date, and total charges. ExtractionResultSchema wraps the structured output with metadata: confidence score, repair steps, and any field-level errors.
Expected output: Type-safe Zod schemas and inferred TypeScript types that the repair service, guardrail, and extraction pipeline all reference.
Step 3: Create the PDF ingestion service
The src/services/pdf-ingest.ts service handles text extraction from uploaded claim-form PDFs. It first tries LlamaParseReader from @llamaindex/cloud, and if that returns empty text, falls back to Tesseract OCR on an image rendered via sharp:
extractTextFromPdf wraps the full pipeline in pRetry for resilience. Inside, it first tries LlamaParseReader — this handles modern PDFs with embedded text. If LlamaParse throws (e.g., the PDF is scanned), it falls back to unpdf’s native text extraction. extractWithOcrFallback is the top-level function: if the combined PDF extraction returns zero-length text, it preprocesses the page image with sharp (grayscale + contrast boost) and runs Tesseract OCR.
Expected output: A service that returns extracted text and a usedOcr boolean, callable by the pipeline.
Step 4: Create the Google Gemini extraction service
The src/services/gemini-extractor.ts service calls Google Gemini with a structured prompt to produce JSON from raw claim text:
ts
import { GoogleGenAI } from "@google/genai";import pRetry from "p-retry";function createGenAIClient(): GoogleGenAI { if (process.env.GOOGLE_GENAI_USE_ENTERPRISE === "true") { return new GoogleGenAI({ enterprise: true, project: process.env.GOOGLE_CLOUD_PROJECT, location: process.env.GOOGLE_CLOUD_LOCATION ?? "us-central1", }); } return new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY ?? "" });}const ai = createGenAIClient();export async function extractClaimFromText(rawText: string): Promise<string> { const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: `Extract structured patient demographics, diagnosis codes, and billing line items from the following medical claim text. Return the data as JSON:\n\n${rawText}`, }); return response.text ?? "";}export async function extractClaimWithRetry(rawText: string): Promise<string> { return pRetry(() => extractClaimFromText(rawText), { retries: 3, minTimeout: 2000 });}
createGenAIClient switches between consumer API key mode and GCP Vertex AI enterprise mode based on the GOOGLE_GENAI_USE_ENTERPRISE env var. The extraction prompt asks Gemini to return a JSON object. extractClaimWithRetry wraps the call in pRetry with a 2-second backoff and three retries — Gemini rate limits can hit even small practices during batch processing.
Expected output: Raw JSON string from Gemini that can be passed to the repair service for validation.
Step 5: Wire up structured-repair-core for output repair
Gemini’s output is JSON-shaped but might contain markdown fences, trailing commas, truncated objects, or extra fields. The @reaatech/structured-repair-core package handles these edge cases. The src/services/repair-service.ts wraps its three main functions:
ts
import { repair, repairOutput, isValid, type RepairResult } from "@reaatech/structured-repair-core";import { ClaimFormSchema, type ClaimForm } from "../schemas/claim.js";export function repairClaimOutput(rawLlmOutput: string): RepairResult<ClaimForm> { return repairOutput({ schema: ClaimFormSchema, input: rawLlmOutput, debug: false });}export async function quickRepair(rawLlmOutput: string): Promise<ClaimForm> { return repair(ClaimFormSchema, rawLlmOutput);}export function validateClaimJson(input: string): boolean { return isValid(ClaimFormSchema, input);}
repairOutput takes the raw LLM string, strips markdown fences, and attempts to fix common JSON issues (trailing commas, truncation, extra fields). quickRepair is the async variant that throws UnrepairableError when the input can’t be salvaged. validateClaimJson is a fast boolean check you can use before attempting repair.
Expected output: A RepairResult with a success boolean, a data field containing a validated ClaimForm, and a steps array documenting what was fixed.
Step 6: Build the PII redaction guardrail chain
HIPAA requires that protected health information (PHI) like SSNs, dates of birth, phone numbers, and patient names be redacted before storage. The @reaatech/guardrail-chain package provides a composable guardrail system. The src/services/guardrail-service.ts defines a PIIRedactionGuardrail and assembles it into a GuardrailChain:
ts
import { ChainBuilder, setLogger, ConsoleLogger, GuardrailChain, type Guardrail, type GuardrailResult, type ChainContext } from "@reaatech/guardrail-chain";setLogger(new ConsoleLogger());export class PIIRedactionGuardrail implements Guardrail<string, string> { readonly id = "pii-redaction"; readonly name = "PII Redaction"; readonly type = "output" as const; enabled = true; execute(input: string, _context: ChainContext): Promise<GuardrailResult<string>> { void _context; const redactedText = input .replace(/\d{3}-\d{2}-\d{4}/g, "[REDACTED-SSN]") .replace(/\d{2}\/\d{2}\/\d{4}/g, "[REDACTED-DOB]") .replace(/\d{3}-\d{3}-\d{4}/g, "[REDACTED-PHONE]") .replace(/\b[A-Z][a-z]+(?:\s[A-Z][a-z]+)+\b/g, "[REDACTED-NAME]"); return Promise.resolve({ passed: true, output: redactedText, confidence: 1.0 }); }}export function createPIIRedactionChain(): GuardrailChain { return new ChainBuilder() .withBudget({ maxLatencyMs: 2000, maxTokens: 16000 }) .withGuardrail(new PIIRedactionGuardrail()) .withSlowGuardrailSkipping(true) .build();}export async function redactPii(text: string): Promise<string> { const result = await createPIIRedactionChain().execute(text); return result.success ? (result.output as string) : text;}
The guardrail uses four regexes targeting SSNs (123-45-6789), DOBs (01/15/1990), phone numbers (555-123-4567), and two-word capitalized names (John Doe). The ChainBuilder sets a 2-second latency budget and enables slow-guardrail skipping so extraction isn’t blocked if PII redaction takes too long. redactPii is fail-open — if the chain errors, it returns the original text rather than blocking the pipeline.
Expected output: Redacted JSON with [REDACTED-SSN], [REDACTED-DOB], [REDACTED-PHONE], and [REDACTED-NAME] placeholders.
Step 7: Track extraction cost with llm-cost-telemetry
Every Gemini API call costs money, and small practices need visibility into per-claim processing costs. The src/services/cost-telemetry.ts service builds cost spans and persists them to Supabase:
createGeminiCostSpan builds a CostSpan with a unique ID, provider/model metadata, and a dollar cost calculated at $7.50 per million tokens (Gemini 2.5 Flash pricing). recordExtractionSpan inserts the span into a cost_spans table in Supabase for later analysis.
Expected output: A CostSpan object with costUsd > 0 and a persisted row in the cost_spans table.
Step 8: Cap daily spend with the agent budget engine
To prevent runaway costs, the @reaatech/agent-budget-engine package enforces a daily budget. The src/services/budget-service.ts wraps BudgetController with a SpendStore:
defineDailyBudget sets a dollar limit with an 80% soft cap (triggers a warning) and a 100% hard cap (rejects requests). The pipeline calls checkExtractionBudget before every extraction; if allowed is false, the pipeline throws "Budget exceeded". After extraction, recordExtractionSpend logs the actual cost.
Expected output: Budget checks return { allowed: true } when under the cap and { allowed: false, action: "hard-stop" } when exceeded.
Step 9: Track batch sessions with session-continuity
Processing a batch of claim forms is async — a practice uploads five PDFs and the pipeline processes them over several minutes. The @reaatech/session-continuity package tracks progress. The src/services/session-service.ts implements a SupabaseSessionAdapter and wraps it with the SessionManager:
ts
import { SessionManager, type IStorageAdapter, type TokenCounter, type Session, type Message, type MessageQueryOptions, type SessionFilters, type UpdateSessionOptions, type HealthStatus } from "@reaatech/session-continuity";import { getSupabaseClient } from "./storage.js";class SupabaseSessionAdapter implements IStorageAdapter { async createSession(sessionData: Omit<Session, "id" | "createdAt" | "lastActivityAt">): Promise<Session> { const supabase = getSupabaseClient(); const now = new
SupabaseSessionAdapter implements the IStorageAdapter interface from @reaatech/session-continuity, mapping every CRUD operation to Supabase tables (sessions, messages). CharBasedTokenCounter provides a simple 4:1 character-to-token ratio useful for estimating context window usage. The SessionManager is configured with an 8K token budget and sliding-window compression. The exported helper functions (createClaimBatchSession, trackClaimProgress, getClaimBatchStatus, endClaimBatchSession) are the public API the pipeline uses.
Expected output: A session created in Supabase, messages appended as each claim finishes, and the session ended when the batch is complete.
Step 10: Wire up the Supabase storage layer
The src/services/storage.ts provides a thin wrapper around Supabase for PDF upload/download and extraction result persistence:
The Supabase client is lazily initialized and cached. uploadClaimPdf stores raw PDFs in a claims storage bucket. downloadClaimPdf retrieves them by path. storeExtractionResult upserts into the claim_extractions table, while getClaimStatus queries the same table and returns null (not an error) for missing claims, which the status route handler uses for its 404 response.
Expected output: PDFs stored and retrieved from Supabase Storage, extraction results persisted to the claim_extractions table.
Step 11: Wire up the BullMQ job queue
The src/services/queue-service.ts creates the Redis-backed BullMQ queue that decouples upload from processing:
ts
import { Queue, QueueEvents, type Job } from "bullmq";import { Redis } from "ioredis";const connection = new Redis(process.env.REDIS_URL ?? "redis://127.0.0.1:6379", { maxRetriesPerRequest: null });const claimExtractionQueue = new Queue("claim-extraction", { connection });const queueEvents = new QueueEvents("claim-extraction", { connection });async function enqueueClaimExtraction(claimId: string, storagePath: string): Promise<Job> { return claimExtractionQueue.add("extract", { claimId, storagePath });}async function getQueueCounts(): Promise<{ waiting: number; active: number; completed: number; failed: number }> { const { waiting = 0, active = 0, completed = 0, failed = 0 } = await claimExtractionQueue.getJobCounts(); return { waiting, active, completed, failed };}export { connection, claimExtractionQueue, enqueueClaimExtraction, queueEvents, getQueueCounts };
The connection is a Redis ioredis instance pointed at REDIS_URL with maxRetriesPerRequest: null (BullMQ-managed retry). The queue name is "claim-extraction". enqueueClaimExtraction adds jobs with the "extract" processor name. getQueueCounts returns visibility into how many claims are waiting, active, completed, or failed.
The worker lives in src/pipeline/worker.ts and consumes from the same queue:
processSingleClaim runs the full pipeline: budget check, PDF ingestion, Gemini extraction, structured output repair, PII redaction, cost telemetry logging, and storage. If repair fails entirely, it uses emptyClaimForm() as a fallback and sets confidence to 0.5 instead of 0.9. processClaimBatch wraps this in a SessionManager-tracked batch, reporting progress after each claim.
Expected output: A fully processed ExtractionResult with redacted, validated claim data persisted to Supabase.
Step 13: Create the API route handlers
Two Next.js App Router route handlers expose the pipeline. The upload endpoint at app/api/claim-upload/route.ts:
The handler validates that a non-empty file was uploaded, generates a UUID, uploads the PDF to Supabase Storage, and enqueues a BullMQ job. It returns a 202 Accepted response with the claimId so the frontend can poll for results.
The status endpoint at app/api/claim/[id]/status/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { getClaimStatus } from "../../../../../src/services/storage.js";export async function GET( _req: NextRequest, { params }: { params: Promise<{ id: string }> },): Promise<NextResponse> { const { id } = await params; const data = await getClaimStatus(id); if (!data) { return NextResponse.json({ error: "Claim not found" }, { status: 404 }); } return NextResponse.json({ claimId: id, status: data.status, result: data.result, updatedAt: data.updatedAt, });}
Because the recipe targets Next.js 16, params is a Promise and must be awaited. Both handlers use NextRequest and NextResponse from next/server (not bare Request or new Response) to ensure proper Content-Type headers and access to Next.js extensions.
Expected output:POST /api/claim-upload returns 202 { claimId, status: "queued" } and GET /api/claim/:id/status returns 200 { claimId, status, result, updatedAt } or 404 { error }.
Step 14: Run the tests
The recipe ships with a comprehensive test suite under tests/. Run it with:
terminal
pnpm test
This runs vitest with coverage. The test output includes these suites:
Schemas test validates that a fully-formed claim parses, a missing diagnoses array throws a ZodError, and empty line items with zero total charges are accepted.
Guardrail test calls PIIRedactionGuardrail.execute directly with SSN, DOB, phone, and name patterns and asserts the output contains the four [REDACTED-*] tokens.
Pipeline ingest test mocks every external dependency and verifies that processClaimBatch with two files returns two results, sets confidence to 0.5 on repair failure, and throws "Budget exceeded" when the budget check fails.
Session service test exercises the full SupabaseSessionAdapter lifecycle: create, add messages, list, update, delete, health check.
Storage test covers PDF upload/download, getClaimStatus returning null for missing claims (Supabase error code PGRST116), and extraction upserts.
Add a webhook or polling endpoint so the frontend can be notified when extraction completes instead of polling /api/claim/:id/status on a timer.
Extend the guardrail chain with additional PHI patterns — medical record numbers (MRN), health plan beneficiary numbers, and device identifiers — by adding new Guardrail implementations.
Replace the char-based token counter with a real tokenizer (cl100k_base or similar) for more accurate cost estimation in the telemetry and session services.