Skip to content
reaatechREAATECH

Files · Anthropic Document Pipeline for QuickBooks Online Invoice Automation

67 (1 binary, 556.3 kB total)attempt 2

README.md·4676 B·markdown
markdown
# Anthropic Document Pipeline for QuickBooks Online Invoice Automation
 
> Automatically extract invoice details from PDFs, images, and text documents using Claude AI, then sync the results to QuickBooks Online — all with built-in budget enforcement.
 
Small businesses spend hours manually keying invoice data into QuickBooks. This pipeline automates the entire flow: upload a file (PDF, scanned image, or plain text), extract structured invoice fields with Claude, validate against a Zod schema, and push the result to QuickBooks Online via the QuickBooks API. An integrated budget controller prevents runaway LLM costs.
 
## Architecture
 
```
Upload (PDF/image/text)


  Ingestion Pipeline
  ├─ unpdf (PDF text extraction)
  ├─ tesseract.js (OCR for images)
  └─ hybrid-rag-ingestion → chunking → OpenAI embeddings


  Claude Invoice Extraction
  ├─ Anthropic SDK → structured JSON output
  └─ Zod schema validation (with retry on parse failure)


  QuickBooks Online Sync
  └─ node-quickbooks → createInvoice API


  Budget Enforcement (pre-flight & post-hoc)
  ├─ @reaatech/agent-budget-engine
  ├─ PricingProvider (cost estimation)
  └─ InMemorySpendStore (usage tracking)
```
 
## Quick Start
 
### Prerequisites
 
- Node.js >= 22
- pnpm 10+
 
### Setup
 
```bash
cp .env.example .env
# Fill in your API keys (see Environment Variables below)
pnpm install
pnpm dev
```
 
## API Documentation
 
### `POST /api/invoice/upload`
 
Upload an invoice file for processing.
 
**Request:** Multipart form-data with a `file` field.
 
| Content-Type | Description |
|---|---|
| `application/pdf` | PDF invoice |
| `image/png`, `image/jpeg`, etc. | Scanned invoice |
| `text/plain` | Plain text invoice |
 
**Response (200):**
 
```json
{
  "success": true,
  "invoice": {
    "vendorName": "Acme Corp",
    "vendorAddress": "123 Main St",
    "invoiceNumber": "INV-001",
    "invoiceDate": "2025-01-15",
    "dueDate": "2025-02-14",
    "lineItems": [
      {
        "description": "Widget",
        "quantity": 2,
        "unitAmount": 25.0,
        "totalAmount": 50.0,
        "accountCode": "401"
      }
    ],
    "subtotal": 50.0,
    "taxTotal": 5.0,
    "totalAmount": 55.0,
    "currency": "USD",
    "notes": "Net 30"
  },
  "confidence": 0.87,
  "sync": {
    "qboInvoiceId": "QB-12345",
    "status": "created"
  },
  "cost": 0.0152
}
```
 
**Error Responses:**
 
| Status | Body |
|---|---|
| 400 | `{ "error": "No file uploaded" }` or unsupported type |
| 429 | `{ "error": "budget_exceeded" }` |
| 500 | `{ "error": "<message>" }` |
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `ANTHROPIC_API_KEY` | API key for Claude (invoice extraction) |
| `OPENAI_API_KEY` | API key for OpenAI (embeddings) |
| `QBO_CONSUMER_KEY` | QuickBooks Online OAuth consumer key |
| `QBO_CONSUMER_SECRET` | QuickBooks Online OAuth consumer secret |
| `QBO_REALM_ID` | QuickBooks Online company/realm ID |
| `QBO_ACCESS_TOKEN` | QuickBooks Online OAuth access token |
| `QBO_REFRESH_TOKEN` | QuickBooks Online OAuth refresh token |
| `QBO_USE_SANDBOX` | Set to `true` to use QuickBooks sandbox |
| `INVOICE_BUDGET_LIMIT` | Max spend per user session (default: 10.0 USD) |
 
## REAA Packages
 
| Package | Role |
|---|---|
| `@reaatech/hybrid-rag` | Core domain types (Document, Chunk, ChunkingConfig) |
| `@reaatech/hybrid-rag-ingestion` | Document loading, preprocessing, validation, chunking |
| `@reaatech/hybrid-rag-embedding` | OpenAI embedding service |
| `@reaatech/hybrid-rag-evaluation` | Evaluation runner for pipeline benchmarks |
| `@reaatech/agent-budget-engine` | Budget enforcement with pre-flight checks and spend recording |
 
## Cost Tracking
 
The pipeline uses `@reaatech/agent-budget-engine` to enforce spending limits:
 
1. **PricingProvider** (`AnthropicPricingProvider`) — hard-coded per-model token pricing for Claude Sonnet, Opus, Haiku, and text-embedding-3-small.
2. **SpendStore** (`InMemorySpendStore`) — in-memory tracking of cumulative spend per scope.
3. **BudgetController** — pre-flight `check()` prevents requests that would exceed the hard cap; post-hoc `record()` updates the spend tracker.
 
When the soft cap (80%) is reached, the controller warns via a `threshold-breach` event. When the hard cap (100%) is reached, further requests are blocked with `allowed: false` and a `hard-stop` event is emitted.
 
## Running Tests
 
```bash
pnpm test            # vitest run with coverage
pnpm typecheck       # TypeScript type checking
pnpm lint            # ESLint
```
 
## License
 
MIT — see [LICENSE](./LICENSE).