Files · Anthropic Document Pipeline for Square SMB Receipt Extraction
81 (1 binary, 606.9 kB total)attempt 1
README.md·4713 B·markdown
markdown
# Anthropic Document Pipeline for Square SMB Receipt Extraction
> Automatically extract line items, totals, and vendor info from Square receipts and push structured data to accounting systems.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## What it does
This pipeline ingests receipt image URLs, preprocesses them with Unstructured for OCR/text extraction, confidence-gates the result with @reaatech/confidence-router to filter low-quality scans, extracts structured data via Anthropic Claude, repairs malformed JSON with @reaatech/structured-repair-core, budget-enforces with @reaatech/agent-budget-engine, and pushes the final structured receipt to Square.
## Pipeline flow
```
Image URL → Unstructured partition → Confidence threshold check → Claude structured extraction → JSON repair → Square push
```
## Configuration
| Variable | Description |
|---|---|
| `NODE_ENV` | Runtime environment (`development`, `production`, `test`) |
| `ANTHROPIC_API_KEY` | Anthropic API credential |
| `ANTHROPIC_MODEL` | Claude model ID (default: `claude-sonnet-4-6`) |
| `ANTHROPIC_MAX_TOKENS` | Max output tokens per extraction call (default: `4096`) |
| `SQUARE_ACCESS_TOKEN` | Square SDK auth token |
| `SQUARE_LOCATION_ID` | Target Square location for expense pushes |
| `UNSTRUCTURED_API_KEY` | Unstructured partition API key |
| `LANGFUSE_PUBLIC_KEY` | Langfuse public key for tracing |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key for tracing |
| `LANGFUSE_BASE_URL` | Langfuse base URL (default: `https://cloud.langfuse.com`) |
| `CONFIDENCE_ROUTE_THRESHOLD` | ConfidenceRouter route threshold (default: `0.8`) |
| `CONFIDENCE_FALLBACK_THRESHOLD` | ConfidenceRouter fallback threshold (default: `0.3`) |
| `BUDGET_DAILY_LIMIT` | Daily USD spend cap (default: `5.0`) |
| `BUDGET_SOFT_CAP` | Soft-cap ratio for budget warnings (default: `0.8`) |
## API endpoints
### POST /api/ingest
Ingest a single receipt image URL.
**Request:**
```json
{
"receiptImageUrl": "https://example.com/receipt.jpg",
"source": "mobile-upload",
"callbackUrl": "https://hooks.example.com/callback"
}
```
**Response (200):**
```json
{
"receiptId": "rcpt_abc123",
"status": "success",
"extractedData": {
"vendorName": "Acme Coffee",
"date": "2025-06-01",
"lineItems": [
{ "name": "Latte", "quantity": 1, "unitPrice": 4.50, "totalPrice": 4.50 }
],
"subtotal": 4.50,
"total": 5.13,
"currency": "USD"
},
"costUsd": 0.0023
}
```
**Response (422):**
```json
{
"receiptId": "rcpt_abc123",
"status": "low_confidence",
"error": "OCR confidence below route threshold (0.45 < 0.80)",
"costUsd": 0.0004
}
```
### GET /api/health
**Response (200):**
```json
{
"status": "ok",
"timestamp": "2025-06-22T12:00:00.000Z"
}
```
### POST /api/batch
Ingest up to 50 receipt image URLs in a single request.
**Request:**
```json
{
"requests": [
{ "receiptImageUrl": "https://example.com/receipt1.jpg" },
{ "receiptImageUrl": "https://example.com/receipt2.jpg" }
]
}
```
**Response (200):**
```json
{
"results": [
{ "receiptId": "rcpt_001", "status": "success", ... },
{ "receiptId": "rcpt_002", "status": "budget_exceeded", ... }
]
}
```
**Response (400):**
```json
{
"error": "max batch size 50"
}
```
## Tech Stack
- **Next.js 16+ App Router** — API route handlers and server infrastructure
- **@anthropic-ai/sdk** — Claude structured extraction from receipt text
- **Square SDK v44** — Pushing structured receipts to Square accounting
- **unstructured-client** — OCR and text extraction from receipt images
- **Zod** — Runtime schema validation for requests, config, and receipt data
- **@reaatech/confidence-router** — Quality gating on OCR confidence scores
- **@reaatech/structured-repair-core** — Repair malformed Claude JSON output
- **@reaatech/llm-cost-telemetry** — Token and cost tracking per LLM call
- **@reaatech/agent-budget-engine** — Daily USD spend caps and soft-cap warnings
- **Langfuse** — Observability and tracing across the pipeline
- **vitest** — Test runner with v8 coverage at ≥90%
## Running locally
```bash
pnpm install
pnpm test # vitest run with coverage
pnpm dev # next dev
```
## Project layout
```
app/ Next.js App Router pages + API routes
src/ services, lib, adapters
tests/ vitest suite (mirrors src/)
packages/ API references for every dependency (read these first)
DEV_PLAN.md build plan for this recipe
```
## License
MIT — see [LICENSE](./LICENSE).