Files · Anthropic Document Pipeline for QuickBooks Online Invoice Automation
67 (1 binary, 556.3 kB total)attempt 2
README.md·4676 B·markdown
markdown
# Anthropic Document Pipeline for QuickBooks Online Invoice Automation
> Automatically extract invoice details from PDFs, images, and text documents using Claude AI, then sync the results to QuickBooks Online — all with built-in budget enforcement.
Small businesses spend hours manually keying invoice data into QuickBooks. This pipeline automates the entire flow: upload a file (PDF, scanned image, or plain text), extract structured invoice fields with Claude, validate against a Zod schema, and push the result to QuickBooks Online via the QuickBooks API. An integrated budget controller prevents runaway LLM costs.
## Architecture
```
Upload (PDF/image/text)
│
▼
Ingestion Pipeline
├─ unpdf (PDF text extraction)
├─ tesseract.js (OCR for images)
└─ hybrid-rag-ingestion → chunking → OpenAI embeddings
│
▼
Claude Invoice Extraction
├─ Anthropic SDK → structured JSON output
└─ Zod schema validation (with retry on parse failure)
│
▼
QuickBooks Online Sync
└─ node-quickbooks → createInvoice API
│
▼
Budget Enforcement (pre-flight & post-hoc)
├─ @reaatech/agent-budget-engine
├─ PricingProvider (cost estimation)
└─ InMemorySpendStore (usage tracking)
```
## Quick Start
### Prerequisites
- Node.js >= 22
- pnpm 10+
### Setup
```bash
cp .env.example .env
# Fill in your API keys (see Environment Variables below)
pnpm install
pnpm dev
```
## API Documentation
### `POST /api/invoice/upload`
Upload an invoice file for processing.
**Request:** Multipart form-data with a `file` field.
| Content-Type | Description |
|---|---|
| `application/pdf` | PDF invoice |
| `image/png`, `image/jpeg`, etc. | Scanned invoice |
| `text/plain` | Plain text invoice |
**Response (200):**
```json
{
"success": true,
"invoice": {
"vendorName": "Acme Corp",
"vendorAddress": "123 Main St",
"invoiceNumber": "INV-001",
"invoiceDate": "2025-01-15",
"dueDate": "2025-02-14",
"lineItems": [
{
"description": "Widget",
"quantity": 2,
"unitAmount": 25.0,
"totalAmount": 50.0,
"accountCode": "401"
}
],
"subtotal": 50.0,
"taxTotal": 5.0,
"totalAmount": 55.0,
"currency": "USD",
"notes": "Net 30"
},
"confidence": 0.87,
"sync": {
"qboInvoiceId": "QB-12345",
"status": "created"
},
"cost": 0.0152
}
```
**Error Responses:**
| Status | Body |
|---|---|
| 400 | `{ "error": "No file uploaded" }` or unsupported type |
| 429 | `{ "error": "budget_exceeded" }` |
| 500 | `{ "error": "<message>" }` |
## Environment Variables
| Variable | Description |
|---|---|
| `ANTHROPIC_API_KEY` | API key for Claude (invoice extraction) |
| `OPENAI_API_KEY` | API key for OpenAI (embeddings) |
| `QBO_CONSUMER_KEY` | QuickBooks Online OAuth consumer key |
| `QBO_CONSUMER_SECRET` | QuickBooks Online OAuth consumer secret |
| `QBO_REALM_ID` | QuickBooks Online company/realm ID |
| `QBO_ACCESS_TOKEN` | QuickBooks Online OAuth access token |
| `QBO_REFRESH_TOKEN` | QuickBooks Online OAuth refresh token |
| `QBO_USE_SANDBOX` | Set to `true` to use QuickBooks sandbox |
| `INVOICE_BUDGET_LIMIT` | Max spend per user session (default: 10.0 USD) |
## REAA Packages
| Package | Role |
|---|---|
| `@reaatech/hybrid-rag` | Core domain types (Document, Chunk, ChunkingConfig) |
| `@reaatech/hybrid-rag-ingestion` | Document loading, preprocessing, validation, chunking |
| `@reaatech/hybrid-rag-embedding` | OpenAI embedding service |
| `@reaatech/hybrid-rag-evaluation` | Evaluation runner for pipeline benchmarks |
| `@reaatech/agent-budget-engine` | Budget enforcement with pre-flight checks and spend recording |
## Cost Tracking
The pipeline uses `@reaatech/agent-budget-engine` to enforce spending limits:
1. **PricingProvider** (`AnthropicPricingProvider`) — hard-coded per-model token pricing for Claude Sonnet, Opus, Haiku, and text-embedding-3-small.
2. **SpendStore** (`InMemorySpendStore`) — in-memory tracking of cumulative spend per scope.
3. **BudgetController** — pre-flight `check()` prevents requests that would exceed the hard cap; post-hoc `record()` updates the spend tracker.
When the soft cap (80%) is reached, the controller warns via a `threshold-breach` event. When the hard cap (100%) is reached, further requests are blocked with `allowed: false` and a `hard-stop` event is emitted.
## Running Tests
```bash
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
```
## License
MIT — see [LICENSE](./LICENSE).