Files · Azure AI Document Pipeline for Sage Intacct Invoice Automation
85 (1 binary, 636.9 kB total)attempt 1
README.md·4612 B·markdown
markdown
# Azure AI Document Pipeline for Sage Intacct Invoice Automation
> Turns uploaded PDF invoices into structured Sage Intacct AR entries, using Azure OpenAI extraction and REAA repair to eliminate manual data entry.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
**Problem:** SMBs manually re-key paper and PDF invoices into Sage Intacct, a slow, error-prone process that delays month-end close and leads to mis-posted transactions.
## Architecture
The pipeline runs in 8 stages through a single `POST /api/invoices` endpoint:
1. **PDF text extraction** — `unpdf` extracts raw text from the uploaded PDF buffer
2. **Azure OpenAI extraction** — Raw text is sent to Azure OpenAI's chat completions API with a structured output prompt requesting JSON invoice fields
3. **JSON repair** — `@reaatech/structured-repair-core` repairs malformed LLM JSON (markdown fences, trailing commas, type coercion, extra hallucinated fields, fuzzy key matching)
4. **Confidence routing** — `@reaatech/confidence-router-core` evaluates per-field confidence and decides whether to auto-post (ROUTE), request human review (CLARIFY), or reject (FALLBACK)
5. **Sage Intacct posting** — Transforms extracted invoice fields into Sage Intacct AR invoice shape and POSTs via OAuth2 client credentials
6. **LLM caching** — `@reaatech/llm-cache` with Redis avoids reprocessing identical PDFs (SHA-256 exact-match)
7. **Cost telemetry** — `@reaatech/llm-cost-telemetry` records per-invoice Azure OpenAI token spend
8. **Observability** — Langfuse tracing across pipeline stages (optional, fail-open)
## Prerequisites
- Node.js >=22, pnpm 10.x
- Redis (for LLM cache backend)
- Azure OpenAI resource with a deployed model (e.g. gpt-4o-mini)
- Sage Intacct OAuth2 app credentials
- Langfuse project (optional — pipeline degrades gracefully)
## Quick Start
```bash
pnpm install
cp .env.example .env
# Fill in your credentials
pnpm dev # starts Next.js dev server
```
## API Reference
### `POST /api/invoices`
Upload a PDF invoice for processing.
**Request:** `multipart/form-data` with a `file` field containing the PDF.
**Success (200):**
```json
{ "status": "posted", "invoiceId": "AR-001", "confidence": 0.92, "costUsd": 0.015 }
```
**Review Required (422):**
```json
{ "status": "review_required", "confidence": 0.45, "message": "Invoice flagged for manual review due to low extraction confidence" }
```
**Invalid Input (400):**
```json
{ "error": "invalid_file_type", "expected": "application/pdf", "received": "text/plain" }
```
**Server Error (500):**
```json
{ "status": "failed", "error": "Sage Intacct auth failed with status 401" }
```
## REAA Packages
| Package | Role | Key Exports |
|---|---|---|
| `@reaatech/structured-repair-core` | foundation | `repair()`, `repairOutput()`, `isValid()` |
| `@reaatech/confidence-router-core` | supporting | `DecisionEngine`, `mergeConfig()` |
| `@reaatech/llm-cache` | supporting | `CacheEngine`, `CacheResult` |
| `@reaatech/llm-cost-telemetry` | supporting | `generateId()`, `calculateCostFromTokens()`, `CostSpanSchema` |
| `@reaatech/media-pipeline-mcp-doc-extraction` | supporting | `createDocumentExtractionOperations()` |
## Project layout
```
app/
api/invoices/route.ts Next.js API route handler
page.tsx Landing page
src/
lib/
text-extraction.ts PDF text extraction (unpdf wrapper)
sage-intacct.ts Sage Intacct REST client (OAuth2 + AR invoice)
services/
azure-openai.ts Azure OpenAI chat completions wrapper
extraction.ts Composes text extraction + LLM extraction
repair.ts JSON repair via structured-repair-core
confidence-router.ts Confidence evaluation via confidence-router-core
cache.ts LLM cache with Redis (via @reaatech/llm-cache)
cost-telemetry.ts Cost tracking via @reaatech/llm-cost-telemetry
observability.ts Langfuse tracing wrapper
pipeline.ts Orchestrator composing all pipeline stages
types/
config.ts Pipeline configuration + env loading
invoice.ts Invoice schema (Zod) + result types
sage-intacct.ts Sage Intacct API types
errors.ts Discriminated error classes
tests/ Vitest suite (mirrors src/)
packages/ API references for every dependency
DEV_PLAN.md Build plan for this recipe
```
## License
MIT — see [LICENSE](./LICENSE).