Skip to content
reaatechREAATECH

Files · Azure AI Document Pipeline for QuickBooks Online Invoice Processing

40 (0 binary, 210.1 kB total)attempt 4

README.md·3124 B·markdown
markdown
# Azure AI Document Pipeline for QuickBooks Online Invoice Processing
 
Automatically extract, validate, and post vendor invoice data from PDFs or images into QuickBooks Online using Azure OpenAI.
 
## Problem
 
SMBs waste hours manually keying invoice data into QuickBooks, risking errors, delayed payments, and poor cash-flow visibility. This pipeline automates the full flow: PDF/image ingestion, AI-powered field extraction, confidence-based routing, PII redaction, budget enforcement, and direct QBO bill posting.
 
## Getting Started
 
### Prerequisites
 
- Node.js 22+
- Azure OpenAI deployment (GPT-4 or equivalent)
- QuickBooks Online OAuth2 app (client ID, secret, realm ID, refresh token)
 
### Setup
 
```bash
pnpm install
cp .env.example .env
# Fill in .env with your Azure OpenAI and QuickBooks credentials
```
 
### Run
 
```bash
pnpm dev       # Development server on http://localhost:3000
pnpm build     # Production build
pnpm test      # Run test suite
```
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key |
| `AZURE_OPENAI_DEPLOYMENT` | Azure OpenAI deployment name |
| `AZURE_PRICE_INPUT` | Input token price per 1K tokens (default: 0.001) |
| `AZURE_PRICE_OUTPUT` | Output token price per 1K tokens (default: 0.003) |
| `QUICKBOOKS_CLIENT_ID` | QuickBooks OAuth client ID |
| `QUICKBOOKS_CLIENT_SECRET` | QuickBooks OAuth client secret |
| `QUICKBOOKS_REALM_ID` | QuickBooks company realm ID |
| `QUICKBOOKS_REFRESH_TOKEN` | QuickBooks OAuth refresh token |
| `BUDGET_PER_DOCUMENT` | Max cost per document in USD (default: 0.05) |
 
## Key Features
 
- **AI Parsing** — Converts PDFs/images to PNG, sends to Azure OpenAI with structured output schema for invoice field extraction
- **Confidence Routing** — Uses `@reaatech/confidence-router-core` to classify field confidence and flag low-quality extractions for human review
- **Structured Repair** — Zod-based repair layer handles malformed AI JSON outputs, numeric string coercion, and field name normalization
- **PII Redaction**`@reaatech/guardrail-chain-guardrails` `PIIRedaction` scans vendor names and line items before posting to QBO
- **Budget Enforcement**`@reaatech/agent-budget-engine` enforces per-document spend limits with soft/hard caps
- **QuickBooks Bill Posting** — OAuth2 token management via `jose`, maps invoices to QBO `Bill` format, handles 401/429 retries
 
## API Endpoints
 
### Upload Invoice
 
```bash
curl -X POST http://localhost:3000/api/upload \
  -F "document=@invoice.pdf"
```
 
Returns `202 Accepted` with `{ "jobId": "...", "status": "Processing|Completed|Failed|NeedsReview" }`.
 
### Check Status
 
```bash
curl http://localhost:3000/api/status/:jobId
```
 
Returns the processing result including extracted data and errors.
 
## Architecture
 
```
PDF/Image → Parser (pdfjs-dist + sharp) → Azure OpenAI
    → Repair (Zod) → Validate (confidence-router)
    → Redact PII (guardrail-chain-guardrails)
    → Post Bill (QuickBooks REST API)
    ↑ All guarded by agent-budget-engine
```