Skip to content
reaatechREAATECH

Files · Azure AI Document Pipeline for Sage Intacct Invoice Automation

85 (1 binary, 636.9 kB total)attempt 1

README.md·4612 B·markdown
markdown
# Azure AI Document Pipeline for Sage Intacct Invoice Automation
 
> Turns uploaded PDF invoices into structured Sage Intacct AR entries, using Azure OpenAI extraction and REAA repair to eliminate manual data entry.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
**Problem:** SMBs manually re-key paper and PDF invoices into Sage Intacct, a slow, error-prone process that delays month-end close and leads to mis-posted transactions.
 
## Architecture
 
The pipeline runs in 8 stages through a single `POST /api/invoices` endpoint:
 
1. **PDF text extraction**`unpdf` extracts raw text from the uploaded PDF buffer
2. **Azure OpenAI extraction** — Raw text is sent to Azure OpenAI's chat completions API with a structured output prompt requesting JSON invoice fields
3. **JSON repair**`@reaatech/structured-repair-core` repairs malformed LLM JSON (markdown fences, trailing commas, type coercion, extra hallucinated fields, fuzzy key matching)
4. **Confidence routing**`@reaatech/confidence-router-core` evaluates per-field confidence and decides whether to auto-post (ROUTE), request human review (CLARIFY), or reject (FALLBACK)
5. **Sage Intacct posting** — Transforms extracted invoice fields into Sage Intacct AR invoice shape and POSTs via OAuth2 client credentials
6. **LLM caching**`@reaatech/llm-cache` with Redis avoids reprocessing identical PDFs (SHA-256 exact-match)
7. **Cost telemetry**`@reaatech/llm-cost-telemetry` records per-invoice Azure OpenAI token spend
8. **Observability** — Langfuse tracing across pipeline stages (optional, fail-open)
 
## Prerequisites
 
- Node.js >=22, pnpm 10.x
- Redis (for LLM cache backend)
- Azure OpenAI resource with a deployed model (e.g. gpt-4o-mini)
- Sage Intacct OAuth2 app credentials
- Langfuse project (optional — pipeline degrades gracefully)
 
## Quick Start
 
```bash
pnpm install
cp .env.example .env
# Fill in your credentials
pnpm dev             # starts Next.js dev server
```
 
## API Reference
 
### `POST /api/invoices`
 
Upload a PDF invoice for processing.
 
**Request:** `multipart/form-data` with a `file` field containing the PDF.
 
**Success (200):**
```json
{ "status": "posted", "invoiceId": "AR-001", "confidence": 0.92, "costUsd": 0.015 }
```
 
**Review Required (422):**
```json
{ "status": "review_required", "confidence": 0.45, "message": "Invoice flagged for manual review due to low extraction confidence" }
```
 
**Invalid Input (400):**
```json
{ "error": "invalid_file_type", "expected": "application/pdf", "received": "text/plain" }
```
 
**Server Error (500):**
```json
{ "status": "failed", "error": "Sage Intacct auth failed with status 401" }
```
 
## REAA Packages
 
| Package | Role | Key Exports |
|---|---|---|
| `@reaatech/structured-repair-core` | foundation | `repair()`, `repairOutput()`, `isValid()` |
| `@reaatech/confidence-router-core` | supporting | `DecisionEngine`, `mergeConfig()` |
| `@reaatech/llm-cache` | supporting | `CacheEngine`, `CacheResult` |
| `@reaatech/llm-cost-telemetry` | supporting | `generateId()`, `calculateCostFromTokens()`, `CostSpanSchema` |
| `@reaatech/media-pipeline-mcp-doc-extraction` | supporting | `createDocumentExtractionOperations()` |
 
## Project layout
 
```
app/
  api/invoices/route.ts    Next.js API route handler
  page.tsx                 Landing page
src/
  lib/
    text-extraction.ts     PDF text extraction (unpdf wrapper)
    sage-intacct.ts        Sage Intacct REST client (OAuth2 + AR invoice)
  services/
    azure-openai.ts        Azure OpenAI chat completions wrapper
    extraction.ts          Composes text extraction + LLM extraction
    repair.ts              JSON repair via structured-repair-core
    confidence-router.ts   Confidence evaluation via confidence-router-core
    cache.ts               LLM cache with Redis (via @reaatech/llm-cache)
    cost-telemetry.ts      Cost tracking via @reaatech/llm-cost-telemetry
    observability.ts       Langfuse tracing wrapper
    pipeline.ts            Orchestrator composing all pipeline stages
  types/
    config.ts              Pipeline configuration + env loading
    invoice.ts             Invoice schema (Zod) + result types
    sage-intacct.ts        Sage Intacct API types
    errors.ts              Discriminated error classes
tests/                     Vitest suite (mirrors src/)
packages/                  API references for every dependency
DEV_PLAN.md                Build plan for this recipe
```
 
## License
 
MIT — see [LICENSE](./LICENSE).