Azure AI Document Pipeline for Sage Intacct Invoice Automation

Turns uploaded PDF invoices into structured Sage Intacct AR entries, using Azure OpenAI extraction and REAA repair to eliminate manual data entry.

azure-ai document-pipeline invoice-automation sage-intacct structured-repair-core confidence-router llm-cache cost-telemetry express

The problem

SMBs manually re‑key paper and PDF invoices into Sage Intacct, a slow, error‑prone process that delays month‑end close and leads to mis‑posted transactions.

Built from

Intro

This tutorial walks you through building an Azure AI Document Pipeline for Sage Intacct Invoice Automation — a Next.js API endpoint that turns uploaded PDF invoices into structured Sage Intacct AR entries. You’ll compose seven pipeline stages: PDF text extraction (unpdf), Azure OpenAI field extraction, JSON repair (structured-repair-core), confidence routing (confidence-router-core), Sage Intacct posting via OAuth2, LLM caching with Redis (llm-cache), and cost telemetry (llm-cost-telemetry). By the end, you’ll have a document automation pipeline that auto-posts invoices or flags low-confidence ones for human review.

Prerequisites

Node.js >= 22 and pnpm 10.x installed
Redis running locally (default: redis://localhost:6379)
Azure OpenAI resource with an API key, endpoint URL, and deployment name
Sage Intacct OAuth2 credentials (client ID, client secret, company ID)
Langfuse account (optional — for tracing; credentials can stay as placeholders)
Familiarity with TypeScript, Next.js App Router, and REST APIs

Step 1: Scaffold the Next.js project

Start by creating a new Next.js project and installing the pipeline’s dependencies. The package.json pins every dependency so you get repeatable builds.

terminal

pnpm create next-app@latest azure-invoice-pipeline --typescript --tailwind --eslint

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

188 kB·99 tests·97.6% coverage·vitest passing

SHA-25602ef53af4d05325999840de629d478f5ed80ff1d1d631fd2e2598d7a003d0d2e

Book a conversation All solutions

Comments

Loading comments…

export class PdfExtractionError extends Error { readonly code = "PDF_EXTRACTION_ERROR" as const; readonly cause?: unknown; constructor(message: string, cause?: unknown) { super(message); this.name = "PdfExtractionError"; this.cause = cause; } } export class AzureOpenAiError extends Error { readonly code = "AZURE_OPENAI_ERROR" as const; readonly statusCode?: number; readonly responseBody?: string; constructor(message: string, statusCode?: number, responseBody?: string) { super(message); this.name = "AzureOpenAiError"; this.statusCode = statusCode; this.responseBody = responseBody; } } export class RepairFailedError extends Error { readonly code = "REPAIR_FAILED" as const; readonly partialData?: unknown; readonly fieldErrors?: Array<{ path: string; message: string }>; constructor( message: string, partialData?: unknown, fieldErrors?: Array<{ path: string; message: string }>, ) { super(message); this.name = "RepairFailedError"; this.partialData = partialData; this.fieldErrors = fieldErrors; } } export class LowConfidenceError extends Error { readonly code = "LOW_CONFIDENCE" as const; readonly confidence: number; constructor(confidence: number) { super(`Low confidence: ${confidence}`); this.name = "LowConfidenceError"; this.confidence = confidence; } } export class SageIntacctError extends Error { readonly code = "SAGE_INTACCT_ERROR" as const; readonly statusCode?: number; readonly responseBody?: string; readonly endpoint?: string; constructor( message: string, statusCode?: number, responseBody?: string, endpoint?: string, ) { super(message); this.name = "SageIntacctError"; this.statusCode = statusCode; this.responseBody = responseBody; this.endpoint = endpoint; } } export class CacheConnectionError extends Error { readonly code = "CACHE_CONNECTION_ERROR" as const; constructor(message: string, options?: ErrorOptions) { super(message, options); this.name = "CacheConnectionError"; } }

import { type PipelineConfig } from "../types/config.js"; import { AzureOpenAiError } from "../types/errors.js"; const SYSTEM_PROMPT = "Extract invoice fields from this text as JSON. Return a JSON object with keys: invoice_number (string), invoice_date (string), due_date (string), vendor_name (string), vendor_tax_id (string), subtotal (number), tax (number), total (number), is_paid (boolean), line_items (array of objects with description, quantity, unit_price, amount)."; const RETRY_DELAYS_MS = [1_000, 2_000, 4_000]; const STATUS_RETRY = new Set([429, 500, 503]); export type AzureOpenAiCostCallback = (tokens: { inputTokens: number; outputTokens: number; }) => void; async function delay(ms: number): Promise<void> { return new Promise((resolve) => setTimeout(resolve, ms)); } export async function extractInvoiceFields( rawText: string, config: PipelineConfig, onCost?: AzureOpenAiCostCallback, ): Promise<string> { if (rawText.length === 0) { return "{}"; } const url = `${config.azureOpenAiEndpoint}/openai/deployments/${config.azureOpenAiDeployment}/chat/completions?api-version=${config.azureOpenAiApiVersion}`; const body = { messages: [ { role: "system", content: SYSTEM_PROMPT }, { role: "user", content: rawText }, ], response_format: { type: "json_object" as const }, }; let lastError: unknown; for (const delayMs of [0, ...RETRY_DELAYS_MS]) { if (delayMs > 0) { await delay(delayMs); } try { const response = await fetch(url, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${config.azureOpenAiApiKey}`, }, body: JSON.stringify(body), }); if (!response.ok && !STATUS_RETRY.has(response.status)) { const responseBody = await response.text(); throw new AzureOpenAiError( `Azure OpenAI request failed with status ${response.status}`, response.status, responseBody, ); } if (!response.ok) { lastError = new AzureOpenAiError( `Azure OpenAI request failed with status ${response.status}`, response.status, await response.text(), ); continue; } const data = (await response.json()) as { choices: Array<{ message: { content: string } }>; usage?: { prompt_tokens: number; completion_tokens: number }; }; if (data.usage && onCost) { onCost({ inputTokens: data.usage.prompt_tokens, outputTokens: data.usage.completion_tokens, }); } return data.choices[0].message.content; } catch (err) { if (err instanceof AzureOpenAiError && STATUS_RETRY.has(err.statusCode ?? 0)) { lastError = err; continue; } throw err; } } throw lastError instanceof AzureOpenAiError ? lastError : new AzureOpenAiError("Azure OpenAI request failed after retries"); }

import Langfuse, { type LangfuseTraceClient } from "langfuse"; import { type PipelineConfig } from "../types/config.js"; const realTraces = new WeakMap<TraceLike, LangfuseTraceClient>(); export interface TraceLike { addTags?(tags: string[]): void; } export interface ObservabilityClient { createTrace(name: string, traceId: string): TraceLike; wrapWithSpan<T>(trace: TraceLike, name: string, fn: () => Promise<T>): Promise<T>; } export function createObservabilityClient(config: PipelineConfig): ObservabilityClient { try { const langfuse = new Langfuse({ publicKey: config.langfuse.publicKey, secretKey: config.langfuse.secretKey, baseUrl: config.langfuse.host, }); return { createTrace(name: string, traceId: string): TraceLike { const realTrace = langfuse.trace({ name, id: traceId, metadata: { recipe: "azure-ai-document-pipeline-for-sage-intacct-invoice-automation", }, }); const adapted: TraceLike = { addTags(tags: string[]): void { realTrace.update({ tags }); }, }; realTraces.set(adapted, realTrace); return adapted; }, async wrapWithSpan<T>( trace: TraceLike, name: string, fn: () => Promise<T>, ): Promise<T> { const realTrace = realTraces.get(trace); if (!realTrace) { return fn(); } const span = realTrace.span({ name }); try { return await fn(); } finally { span.end(); } }, }; } catch { console.warn("Langfuse init failed, using no-op observability"); return createNoopClient(); } } function createNoopClient(): ObservabilityClient { return { createTrace: () => noopTrace, wrapWithSpan: async <T>( _trace: TraceLike, _name: string, fn: () => Promise<T>, ): Promise<T> => fn(), }; } const noopTrace: TraceLike = { addTags: () => {}, };

Azure AI Document Pipeline for Sage Intacct Invoice Automation

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Configure environment variables

Step 3: Create the pipeline configuration loader

Step 4: Define invoice schemas, API types, and error classes

Step 5: Build the PDF text extraction module

Step 6: Build the Azure OpenAI structured extraction service

Step 7: Build the extraction orchestrator

Step 8: Build the JSON repair service

Step 9: Build the confidence router

Step 10: Build the LLM cache with Redis

Step 11: Build the cost telemetry service

Step 12: Build the Sage Intacct REST client

Step 13: Build the pipeline orchestrator

Step 14: Build the Next.js API route

Step 15: Build the landing page

Step 16: Run the tests

Step 17: Add the Langfuse observability service (optional)

Step 18: Wire up the entry point

Next steps