Files · Google Gemini Bank Statement Extraction for SMB Accounting
69 (1 binary, 596.5 kB total)attempt 1
README.md·1990 B·markdown
markdown
# Google Gemini Bank Statement Extraction for SMB Accounting
Upload scanned bank statements and receipts, automatically extract line-item transactions with Gemini, and output categorized accounting entries ready for QuickBooks or Xero.
## Architecture
1. **Upload** — POST /api/extract accepts multipart form data (PDF or image)
2. **Extract** — unpdf renders PDF pages as text; sharp pre-processes images
3. **Pipeline** — @reaatech/media-pipeline-mcp-core orchestrates extraction steps
4. **Gemini** — @google/genai calls Gemini 2.5 Flash with structured extraction prompts
5. **Repair** — @reaatech/structured-repair-core fixes JSON formatting errors
6. **Cache** — @reaatech/llm-cache avoids reprocessing identical documents
7. **Telemetry** — @reaatech/llm-cost-telemetry tracks token usage and cost per tenant
### API
`POST /api/extract` — Upload a bank statement (PDF or image) and receive extracted transactions.
```bash
curl -X POST http://localhost:3000/api/extract \
-F "file=@statement.pdf" \
-F "tenantId=acme-corp"
```
Response:
```json
{
"transactions": [
{
"id": "tx-1",
"date": "2024-01-15",
"description": "Office supplies",
"debit": 50.0,
"credit": null,
"balance": 1000.0,
"memo": "Paid via check",
"category": "office"
}
],
"totalDebits": 50.0,
"totalCredits": 0,
"pageCount": 1,
"costUsd": 0.00015,
"cached": false
}
```
## Packages
| Package | Role |
|---------|------|
| @reaatech/media-pipeline-mcp-core | Pipeline orchestration engine |
| @reaatech/media-pipeline-mcp-doc-extraction | Document extraction operations |
| @reaatech/structured-repair-core | Zod-schema-driven JSON repair |
| @reaatech/llm-cache | Exact-match cache for LLM responses |
| @reaatech/llm-cost-telemetry | Cost tracking and telemetry types |
## Running locally
```bash
pnpm install
pnpm dev
```
## Environment variables
See `.env.example`. GEMINI_API_KEY is required.
## License
MIT