Files · Google Gemini Medical Claim Extraction for SMB Practices
75 (1 binary, 598.0 kB total)attempt 1
README.md·4535 B·markdown
markdown
# Google Gemini Medical Claim Extraction for SMB Practices
> Automatically pull patient demographics, diagnosis codes, and billing line items from scanned claim forms and PDFs, with built-in PII redaction and audit.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Problem
Small practices spend 5-8 hours per week rekeying faxed claim forms; errors cause denials and delayed reimbursement. Manual data entry from scanned PDFs and paper forms is error-prone, expensive, and scales poorly.
## Architecture
The extraction pipeline runs entirely on your infrastructure:
```
POST /api/claim-upload ──→ Supabase Storage ──→ BullMQ Queue
│
┌─────────▼─────────┐
│ Worker (concurrent=2)
│ │
│ unpdf / sharp + │
│ Tesseract OCR │
│ │ │
│ Gemini 2.5 Flash │
│ │ │
│ structured-repair │
│ -core validation │
│ │ │
│ guardrail-chain │
│ PII redaction │
│ │ │
│ cost telemetry + │
│ budget gating │
│ │ │
└─────────▼─────────┘
Supabase
claim_extractions
```
1. Client uploads a PDF via `POST /api/claim-upload`
2. File stored in Supabase Storage; job enqueued in BullMQ
3. Worker downloads PDF, attempts text extraction via `unpdf`
4. If extracted text is empty, falls back to OCR (sharp + Tesseract.js)
5. Raw text sent to Gemini 2.5 Flash for structured claim extraction
6. LLM output validated and repaired via `@reaatech/structured-repair-core`
7. PII redacted via `@reaatech/guardrail-chain`
8. Cost recorded via `@reaatech/llm-cost-telemetry`; budget checked via `@reaatech/agent-budget-engine`
9. Result persisted to Supabase `claim_extractions` table
## Setup
### Prerequisites
- Node.js >= 22
- Redis server (for BullMQ job queue)
- Supabase project (storage bucket + database table)
- Google Gemini API key (AI Studio or Vertex AI)
### Environment Variables
Configure these in `.env` (see `.env.example`):
| Variable | Description |
|---|---|
| `GEMINI_API_KEY` | Google Gemini API key (AI Studio) |
| `GOOGLE_GENAI_USE_ENTERPRISE` | Set `true` for Vertex AI, `false` for AI Studio |
| `GOOGLE_CLOUD_PROJECT` | GCP project ID (Vertex only) |
| `GOOGLE_CLOUD_LOCATION` | GCP location (default `us-central1`) |
| `LLAMA_CLOUD_API_KEY` | LlamaCloud API key |
| `SUPABASE_URL` | Supabase project URL |
| `SUPABASE_SERVICE_ROLE_KEY` | Supabase service role key |
| `REDIS_URL` | Redis connection string |
### Install & Run
```bash
pnpm install
pnpm dev # next dev — http://localhost:3000
```
## API
### `POST /api/claim-upload`
Upload a claim PDF for extraction.
- **Method:** POST
- **Content-Type:** `multipart/form-data`
- **Body:** `file` — the PDF file to process
- **Success (202):** `{ "claimId": "<uuid>", "status": "queued" }`
- **Error (400):** `{ "error": "No file uploaded" }`
### `GET /api/claim/:id/status`
Check extraction status for a previously uploaded claim.
- **Success (200):** `{ "claimId": "<uuid>", "status": "completed", "result": { ... }, "updatedAt": "<iso-timestamp>" }`
- **Not Found (404):** `{ "error": "Claim not found" }`
## Testing
```bash
pnpm install
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
pnpm test # vitest run with coverage (≥90%)
```
## License
MIT — see [LICENSE](./LICENSE).