Skip to content
reaatech

Files · Automated Receipt Classifier for Small CPA Firms

73 (1 binary, 595.2 kB total)attempt 1

README.md·4352 B·markdown
markdown
# Automated Receipt Classifier for Small CPA Firms
 
> Eliminate manual receipt categorization with an AI agent that extracts vendor, amount, and GL category from client uploads. CPA bookkeepers spend 10+ hours per week manually sorting receipts — this agent automates extraction and GL classification end-to-end.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
---
 
## Architecture
 
```
Upload (PDF/image) → OCR/Extraction → LLM Classification → Guardrail Chain → Spend Tracking → Eval Trajectories
```
 
1. **Upload** — Receipt file (PDF or image) submitted via Next.js App Router or Fastify endpoint.
2. **OCR/Extraction** — Text extracted via `unpdf` (PDF) or `tesseract.js` (image). Source type detected from magic bytes.
3. **LLM Classification** — Vercel AI SDK (`generateObject`) with OpenAI extracts vendor, amount, and GL category.
4. **Guardrail Chain**`@reaatech/guardrail-chain` validates amount sanity, GL category membership, and vendor presence.
5. **Spend Tracking**`@reaatech/agent-budget-spend-tracker` records per-user/per-client LLM costs and detects spikes.
6. **Eval Trajectories**`@reaatech/agent-eval-harness-golden` stores golden trajectories for regression detection and curation.
 
---
 
## Tech Stack
 
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| API Server | Fastify (optional, toggled via `ENABLE_FASTIFY`) |
| AI SDK | `ai` + `@ai-sdk/openai` |
| PDF Extraction | `unpdf` |
| Image OCR | `tesseract.js` |
| Validation | `zod` |
| Observability | `langfuse` |
 
## REAA Packages
 
| Package | Version | Role |
|---|---|---|
| `@reaatech/agents-markdown` | 1.0.1 | Shared types (`ValidationResult`, `Finding`), `randomId` |
| `@reaatech/agent-mesh` | 1.0.0 | Context packets (`ContextPacket`, `IncomingRequest`, `AgentResponse`) |
| `@reaatech/llm-router-core` | 1.0.0 | `CostTelemetry` type for LLM cost accounting |
| `@reaatech/agent-eval-harness-golden` | 0.1.0 | Golden trajectory creation, comparison, curation |
| `@reaatech/guardrail-chain` | 0.1.0 | Guardrail composition, execution, latency management |
| `@reaatech/agent-budget-spend-tracker` | 0.1.1 | Per-user/per-client spend recording and spike detection |
 
---
 
## Quick Start
 
```bash
pnpm install
cp .env.example .env
# Fill in OPENAI_API_KEY and other values in .env
pnpm dev
```
 
Open [http://localhost:3000](http://localhost:3000) — the upload form lets you submit a receipt PDF or image and see the classification result.
 
---
 
## API Reference
 
### Next.js App Router Endpoints
 
| Method | Path | Description | Status Codes |
|---|---|---|---|
| POST | `/api/upload` | Upload receipt file (multipart) | 200, 400, 413, 422 |
| POST | `/api/classify` | Classify receipt text `{ text: string }` | 200, 400, 500 |
| GET | `/api/health` | Health check | 200 |
 
### Fastify Endpoints (when `ENABLE_FASTIFY=true`)
 
| Method | Path | Description | Status Codes |
|---|---|---|---|
| POST | `/api/receipts/upload` | Upload receipt (multipart) | 200, 400, 413, 422 |
| POST | `/api/receipts/classify` | Classify receipt text | 200, 400, 500 |
| GET | `/api/receipts/spend/:userId` | Get total spend for user | 200 |
| GET | `/api/health` | Health check | 200 |
 
---
 
## Testing
 
```bash
pnpm test            # vitest run with coverage (target >= 90% on all metrics)
```
 
All external calls are mocked — no live network required. Uses `vi.mock` for module mocking and MSW for HTTP mocking. Test files live under `tests/` mirroring the `src/` structure.
 
---
 
## Environment Variables
 
| Variable | Default | Description |
|---|---|---|
| `OPENAI_API_KEY` | — | OpenAI API key for classification |
| `ANTHROPIC_API_KEY` | — | Anthropic API key (reserved) |
| `GOOGLE_GENERATIVE_AI_API_KEY` | — | Google AI API key (reserved) |
| `RECEIPT_CLASSIFIER_MODEL` | `gpt-5.2-mini` | Model ID passed to AI SDK |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse observability public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse observability secret key |
| `ENABLE_FASTIFY` | `false` | Toggle Fastify server alongside Next.js |
| `FASTIFY_PORT` | `3001` | Port for Fastify server |
| `NODE_ENV` | `development` | Node environment |
 
---
 
## License
 
MIT — see [LICENSE](./LICENSE).