Files · Google Gemini Medical Claim Extraction for SMB Practices

75 (1 binary, 598.0 kB total)attempt 1

README.md·4535 B·markdown

markdown

# Google Gemini Medical Claim Extraction for SMB Practices
 
> Automatically pull patient demographics, diagnosis codes, and billing line items from scanned claim forms and PDFs, with built-in PII redaction and audit.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem
 
Small practices spend 5-8 hours per week rekeying faxed claim forms; errors cause denials and delayed reimbursement. Manual data entry from scanned PDFs and paper forms is error-prone, expensive, and scales poorly.
 
## Architecture
 
The extraction pipeline runs entirely on your infrastructure:
 
```
POST /api/claim-upload ──→ Supabase Storage ──→ BullMQ Queue
                                                    │
                                          ┌─────────▼─────────┐
                                          │   Worker (concurrent=2)
                                          │                    │
                                          │  unpdf / sharp +   │
                                          │  Tesseract OCR     │
                                          │        │           │
                                          │  Gemini 2.5 Flash  │
                                          │        │           │
                                          │  structured-repair │
                                          │  -core validation  │
                                          │        │           │
                                          │  guardrail-chain   │
                                          │  PII redaction     │
                                          │        │           │
                                          │  cost telemetry +  │
                                          │  budget gating     │
                                          │        │           │
                                          └─────────▼─────────┘
                                               Supabase
                                          claim_extractions
```
 
1. Client uploads a PDF via `POST /api/claim-upload`
2. File stored in Supabase Storage; job enqueued in BullMQ
3. Worker downloads PDF, attempts text extraction via `unpdf`
4. If extracted text is empty, falls back to OCR (sharp + Tesseract.js)
5. Raw text sent to Gemini 2.5 Flash for structured claim extraction
6. LLM output validated and repaired via `@reaatech/structured-repair-core`
7. PII redacted via `@reaatech/guardrail-chain`
8. Cost recorded via `@reaatech/llm-cost-telemetry`; budget checked via `@reaatech/agent-budget-engine`
9. Result persisted to Supabase `claim_extractions` table
 
## Setup
 
### Prerequisites
 
- Node.js >= 22
- Redis server (for BullMQ job queue)
- Supabase project (storage bucket + database table)
- Google Gemini API key (AI Studio or Vertex AI)
 
### Environment Variables
 
Configure these in `.env` (see `.env.example`):
 
| Variable | Description |
|---|---|
| `GEMINI_API_KEY` | Google Gemini API key (AI Studio) |
| `GOOGLE_GENAI_USE_ENTERPRISE` | Set `true` for Vertex AI, `false` for AI Studio |
| `GOOGLE_CLOUD_PROJECT` | GCP project ID (Vertex only) |
| `GOOGLE_CLOUD_LOCATION` | GCP location (default `us-central1`) |
| `LLAMA_CLOUD_API_KEY` | LlamaCloud API key |
| `SUPABASE_URL` | Supabase project URL |
| `SUPABASE_SERVICE_ROLE_KEY` | Supabase service role key |
| `REDIS_URL` | Redis connection string |
 
### Install & Run
 
```bash
pnpm install
pnpm dev             # next dev — http://localhost:3000
```
 
## API
 
### `POST /api/claim-upload`
 
Upload a claim PDF for extraction.
 
- **Method:** POST
- **Content-Type:** `multipart/form-data`
- **Body:** `file` — the PDF file to process
- **Success (202):** `{ "claimId": "<uuid>", "status": "queued" }`
- **Error (400):** `{ "error": "No file uploaded" }`
 
### `GET /api/claim/:id/status`
 
Check extraction status for a previously uploaded claim.
 
- **Success (200):** `{ "claimId": "<uuid>", "status": "completed", "result": { ... }, "updatedAt": "<iso-timestamp>" }`
- **Not Found (404):** `{ "error": "Claim not found" }`
 
## Testing
 
```bash
pnpm install
pnpm typecheck        # TypeScript type checking
pnpm lint             # ESLint
pnpm test             # vitest run with coverage (≥90%)
```
 
## License
 
MIT — see [LICENSE](./LICENSE).