Files · Automated Receipt Classifier for Small CPA Firms
73 (1 binary, 595.2 kB total)attempt 1
README.md·4352 B·markdown
markdown
# Automated Receipt Classifier for Small CPA Firms
> Eliminate manual receipt categorization with an AI agent that extracts vendor, amount, and GL category from client uploads. CPA bookkeepers spend 10+ hours per week manually sorting receipts — this agent automates extraction and GL classification end-to-end.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
---
## Architecture
```
Upload (PDF/image) → OCR/Extraction → LLM Classification → Guardrail Chain → Spend Tracking → Eval Trajectories
```
1. **Upload** — Receipt file (PDF or image) submitted via Next.js App Router or Fastify endpoint.
2. **OCR/Extraction** — Text extracted via `unpdf` (PDF) or `tesseract.js` (image). Source type detected from magic bytes.
3. **LLM Classification** — Vercel AI SDK (`generateObject`) with OpenAI extracts vendor, amount, and GL category.
4. **Guardrail Chain** — `@reaatech/guardrail-chain` validates amount sanity, GL category membership, and vendor presence.
5. **Spend Tracking** — `@reaatech/agent-budget-spend-tracker` records per-user/per-client LLM costs and detects spikes.
6. **Eval Trajectories** — `@reaatech/agent-eval-harness-golden` stores golden trajectories for regression detection and curation.
---
## Tech Stack
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| API Server | Fastify (optional, toggled via `ENABLE_FASTIFY`) |
| AI SDK | `ai` + `@ai-sdk/openai` |
| PDF Extraction | `unpdf` |
| Image OCR | `tesseract.js` |
| Validation | `zod` |
| Observability | `langfuse` |
## REAA Packages
| Package | Version | Role |
|---|---|---|
| `@reaatech/agents-markdown` | 1.0.1 | Shared types (`ValidationResult`, `Finding`), `randomId` |
| `@reaatech/agent-mesh` | 1.0.0 | Context packets (`ContextPacket`, `IncomingRequest`, `AgentResponse`) |
| `@reaatech/llm-router-core` | 1.0.0 | `CostTelemetry` type for LLM cost accounting |
| `@reaatech/agent-eval-harness-golden` | 0.1.0 | Golden trajectory creation, comparison, curation |
| `@reaatech/guardrail-chain` | 0.1.0 | Guardrail composition, execution, latency management |
| `@reaatech/agent-budget-spend-tracker` | 0.1.1 | Per-user/per-client spend recording and spike detection |
---
## Quick Start
```bash
pnpm install
cp .env.example .env
# Fill in OPENAI_API_KEY and other values in .env
pnpm dev
```
Open [http://localhost:3000](http://localhost:3000) — the upload form lets you submit a receipt PDF or image and see the classification result.
---
## API Reference
### Next.js App Router Endpoints
| Method | Path | Description | Status Codes |
|---|---|---|---|
| POST | `/api/upload` | Upload receipt file (multipart) | 200, 400, 413, 422 |
| POST | `/api/classify` | Classify receipt text `{ text: string }` | 200, 400, 500 |
| GET | `/api/health` | Health check | 200 |
### Fastify Endpoints (when `ENABLE_FASTIFY=true`)
| Method | Path | Description | Status Codes |
|---|---|---|---|
| POST | `/api/receipts/upload` | Upload receipt (multipart) | 200, 400, 413, 422 |
| POST | `/api/receipts/classify` | Classify receipt text | 200, 400, 500 |
| GET | `/api/receipts/spend/:userId` | Get total spend for user | 200 |
| GET | `/api/health` | Health check | 200 |
---
## Testing
```bash
pnpm test # vitest run with coverage (target >= 90% on all metrics)
```
All external calls are mocked — no live network required. Uses `vi.mock` for module mocking and MSW for HTTP mocking. Test files live under `tests/` mirroring the `src/` structure.
---
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `OPENAI_API_KEY` | — | OpenAI API key for classification |
| `ANTHROPIC_API_KEY` | — | Anthropic API key (reserved) |
| `GOOGLE_GENERATIVE_AI_API_KEY` | — | Google AI API key (reserved) |
| `RECEIPT_CLASSIFIER_MODEL` | `gpt-5.2-mini` | Model ID passed to AI SDK |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse observability public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse observability secret key |
| `ENABLE_FASTIFY` | `false` | Toggle Fastify server alongside Next.js |
| `FASTIFY_PORT` | `3001` | Port for Fastify server |
| `NODE_ENV` | `development` | Node environment |
---
## License
MIT — see [LICENSE](./LICENSE).