SMB finance teams spend hours manually extracting data from supplier PDF invoices and entering it into Sage Intacct, leading to errors and delayed payments.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an automated invoice processing pipeline for Sage Intacct. You’ll build a Next.js API route that accepts a PDF invoice, extracts text with pdfjs-dist, parses line-item data with Anthropic’s Claude, repairs and validates the JSON output with @reaatech/structured-repair-core, routes the result based on confidence scores with @reaatech/confidence-router, and pushes the approved bill into Sage Intacct’s REST API. Cost tracking and budget enforcement are handled by @reaatech/agent-budget-engine and @reaatech/llm-cost-telemetry, with optional Langfuse observability wired through Next.js instrumentation.
Prerequisites
Node.js 22+ and pnpm 10+ installed
A Sage Intacct developer account (or the API credentials for one)
Expected output: Your package.json now lists each dependency at its exact version (no ^ or ~ prefixes). The scripts section includes typecheck, lint, and test.
Step 2: Configure environment variables
Copy the example env file and fill in your real credentials. Create .env.example with placeholder values for every variable your pipeline will read:
env
# Env vars used by anthropic-document-pipeline-for-sage-intacct-smb-invoice-processing# Keep placeholders only — never commit real values.NODE_ENV=development# AnthropicANTHROPIC_API_KEY=<your-anthropic-api-key># Sage IntacctSAGE_INTACCT_COMPANY_ID=<your-company-id>SAGE_INTACCT_SENDER_ID=<your-sender-id>SAGE_INTACCT_SENDER_PASSWORD=<your-sender-password>SAGE_INTACCT_USER_ID=<your-user-id>SAGE_INTACCT_USER_PASSWORD=<your-user-password>SAGE_INTACCT_BASE_URL=<https://your-sage-intacct-api-url># LangfuseLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=<https://your-langfuse-host># UnstructuredUNSTRUCTURED_API_KEY=<your-unstructured-api-key>
Expected output: A .env.example file in the project root listing 11 environment variables with placeholder values. The real .env file you create for local development will hold the actual keys (never committed).
Step 3: Enable the instrumentation hook in Next.js config
Because you’ll add src/instrumentation.ts later (for Langfuse init), tell Next.js to run it at startup. Open next.config.ts and set the experimental flag:
Expected output: The config exports an object with experimental.instrumentationHook set to true. Without this flag, the register() function in src/instrumentation.ts is dead code.
Step 4: Create the Zod schemas for invoice data
Define the shape of your invoice data with Zod. These schemas validate both the raw output from Claude and the final result returned to the caller. Create src/schemas/invoiceSchema.ts:
Expected output: A file at src/schemas/invoiceSchema.ts with three schemas and three inferred types. The currency field defaults to "USD" when omitted.
Step 5: Define the shared TypeScript types for Sage Intacct
Create a types module for the Sage Intacct client — the credentials object, the vendor bill payload you’ll send, and the response you’ll receive. Create src/types/index.ts:
Expected output: A file at src/types/index.ts with four interfaces that define the contract between your pipeline and Sage Intacct’s REST API.
Step 6: Build the PDF text extractor
Use pdfjs-dist to read raw text from any PDF buffer. Create src/services/pdfExtractor.ts:
ts
import { getDocument } from "pdfjs-dist";export class PdfExtractionError extends Error { constructor(message: string, options?: ErrorOptions) { super(message, options); this.name = "PdfExtractionError"; }}export async function extractPdfText(buffer: Buffer): Promise<string> { if (buffer.length === 0) { return ""; } try { const data = new Uint8Array(buffer); const doc = await getDocument({ data }).promise; const pageTexts: string[] = []; for (let i = 1; i <= doc.numPages; i++) { const page = await doc.getPage(i); const content = await page.getTextContent(); const text = content.items.map((item) => ("str" in item ? item.str : "")).join(" "); pageTexts.push(text); } return pageTexts.join("\n\n").trim(); } catch (cause) { throw new PdfExtractionError("Failed to extract PDF text", { cause }); }}
Expected output: The file exports extractPdfText which takes a Buffer and returns the concatenated text of every page, separated by double newlines. An empty buffer returns ""; a corrupt PDF throws PdfExtractionError.
Step 7: Build the Anthropic invoice extractor
Call Claude to turn raw invoice text into structured JSON. Create src/services/anthropicExtractor.ts:
ts
import Anthropic from "@anthropic-ai/sdk";export class InvoiceExtractionError extends Error { constructor(message: string, options?: ErrorOptions) { super(message, options); this.name = "InvoiceExtractionError"; }}const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY ?? "",});export async function extractInvoiceFromText( text: string,): Promise<{ rawOutput: string; inputTokens: number; outputTokens: number }> { try { const truncated = text.slice(0, 80000); const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 4096, system: "You are a financial data extraction specialist. Extract invoice fields as JSON matching the schema provided. Return ONLY valid JSON, no commentary.", messages: [ { role: "user", content: `Extract invoice data from this document text:\n\n${truncated}`, }, ], }); const contentBlock = message.content[0]; const rawOutput = contentBlock.type === "text" ? contentBlock.text : ""; return { rawOutput, inputTokens: message.usage.input_tokens, outputTokens: message.usage.output_tokens, }; } catch (cause) { throw new InvoiceExtractionError("Anthropic API call failed", { cause }); }}
Expected output: A module that initializes the Anthropic client at module scope with your API key. The extractInvoiceFromText function truncates input at 80,000 characters, calls claude-sonnet-4-6, and returns the raw JSON text plus token counts. API errors throw InvoiceExtractionError.
Step 8: Build the repair and validation service
Claude’s output may have JSON syntax issues, extra fields, or type mismatches. The @reaatech/structured-repair-core package fixes all that automatically. Create src/services/repairService.ts:
ts
import { repair, repairOutput, isValid, analyzeInput, type RepairResult, UnrepairableError,} from "@reaatech/structured-repair-core";import { z } from "zod";export { isValid, analyzeInput, UnrepairableError };export type { RepairResult };export async function repairInvoiceData<TSchema extends z.ZodType>( schema: TSchema, raw: string,): Promise<z.infer<TSchema>> { try { return await repair(schema, raw); } catch (err) { if (err instanceof UnrepairableError) { const parsed: unknown = JSON.parse(raw); return schema.parse(parsed); } throw err; }}export function repairInvoiceDataWithDiagnostics<TSchema extends z.ZodType>( schema: TSchema, raw: string,): RepairResult<z.infer<TSchema>> { return repairOutput<TSchema>({ schema, input: raw });}export function validateInvoiceJSON( data: unknown, schema: z.ZodType,): boolean { return isValid(schema, JSON.stringify(data));}export function analyzeExtractionOutput( raw: string,): ReturnType<typeof analyzeInput> { return analyzeInput(raw);}
Expected output: The service wraps @reaatech/structured-repair-core which applies six default strategies: strip fences, extract JSON, fix JSON syntax, coerce types, fuzzy-match keys, and remove extra fields. repairInvoiceData returns typed data or throws; repairInvoiceDataWithDiagnostics returns a full RepairResult with steps, errors, and field-level diagnostics.
Step 9: Build the confidence router service
Based on how clean the repaired data is, decide whether to auto-approve, flag for human review, or reject. Create src/services/confidenceRouterService.ts:
Expected output: The ConfidenceRouter is initialized with a route threshold of 0.85 and fallback threshold of 0.3. buildClassification converts repair quality into a confidence score, and mapDecisionType translates the router’s internal types into your public ProcessingResult actions.
Step 10: Build the budget engine service
Track and enforce spend limits on Claude API calls. Create src/services/budgetService.ts:
Expected output: The BudgetController is backed by a real SpendStore instance. definePipelineBudget sets a limit with a soft cap at 80% and auto-downgrade from Sonnet to Haiku. checkBudget lets you gate calls before they happen, and recordSpend logs each usage.
Step 11: Build the LLM cost telemetry service
Record every Claude invocation’s cost in an in-memory span array. Create src/services/telemetryService.ts:
Expected output:createInvoiceCostSpan computes cost using Anthropic’s $3/1M input and $15/1M output pricing. recordCostSpan stores each span, and getTotalCostSpent returns the running sum. The clearAllCostSpans helper is useful for test teardown.
Step 12: Build the Sage Intacct REST client
Implement an HTTP client that authenticates with OAuth2 password grant and creates vendor bills. Create src/lib/sageIntacct.ts:
ts
import type { SageIntacctCredentials, VendorBillPayload, VendorBillResponse,} from "../types/index.js";export class SageIntacctError extends Error { status: number; code: string; constructor(status: number, code: string, message: string) { super(message); this.name = "SageIntacctError"; this.status = status; this.code = code; }}export class SageIntacctClient {
Expected output: The client handles OAuth2 authentication with automatic token refresh on 401. createVendorBill maps your InvoiceData to Sage Intacct’s API shape, and getVendors returns the vendor list. All non-2xx responses throw SageIntacctError with status and code.
Step 13: Wire the Langfuse observability with instrumentation
Set up Langfuse tracing so you can monitor pipeline health. First create the init module at src/lib/langfuseInit.ts:
ts
import { Langfuse } from "langfuse";let langfuse: Langfuse | null = null;export function initLangfuse(): Langfuse { if (!langfuse) { langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "", secretKey: process.env.LANGFUSE_SECRET_KEY ?? "", baseUrl: process.env.LANGFUSE_HOST, }); } return langfuse;}export function getLangfuse(): Langfuse | null { return langfuse;}
Then create src/instrumentation.ts with the dynamic import guard so it only runs in the Node.js runtime (not Edge):
ts
export async function register(): Promise<void> { if (process.env.NEXT_RUNTIME === "nodejs") { const { initLangfuse } = await import("./lib/langfuseInit.js"); initLangfuse(); }}
Expected output:initLangfuse() creates a singleton Langfuse client. The register() function in instrumentation.ts dynamically imports it only when NEXT_RUNTIME is "nodejs", preventing Edge runtime errors from loading Node-only dependencies. The experimental.instrumentationHook: true flag in next.config.ts makes this function fire at startup.
Step 14: Assemble the pipeline orchestrator
This is the core of the project — it chains all the services together. Create src/services/pipelineOrchestrator.ts:
ts
import { extractPdfText } from "./pdfExtractor.js";import { extractInvoiceFromText } from "./anthropicExtractor.js";import { repairInvoiceDataWithDiagnostics } from "./repairService.js";import { buildClassification, decideInvoiceAction, mapDecisionType } from "./confidenceRouterService.js";import { checkBudget, recordSpend } from "./budgetService.js";import { createInvoiceCostSpan, recordCostSpan } from "./telemetryService.js";import { SageIntacctClient } from "../lib/sageIntacct.js";import { InvoiceSchema } from "../schemas/invoiceSchema.js";import type { ProcessingResult, InvoiceData, InvoiceLineItem } from "../schemas/invoiceSchema.js";import type { VendorBillPayload } from "../types/index.js"
Expected output: The orchestrator runs eight stages in sequence — budget check, PDF extraction, Claude extraction, structured repair, confidence classification, decision action (auto-approve pushes to Sage Intacct), bookkeeping, and result return. The function signature accepts a Buffer and filename string, returning a fully typed ProcessingResult.
Step 15: Wire the API route handler
Create the Next.js API route that accepts the PDF upload and calls the orchestrator. Create app/api/invoice/process/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { processInvoice } from "../../../../src/services/pipelineOrchestrator.js";import { UnrepairableError } from "../../../../src/services/repairService.js";export const runtime = "nodejs";export const maxDuration = 60;export async function POST(req: NextRequest): Promise<NextResponse> { try { const contentType = req.headers.get("content-type") ?? ""; if (!contentType.includes("multipart/form-data")) { return NextResponse.json( { error: "Content-Type must be multipart/form-data" }, { status: 400 }, ); } const formData = await req.formData(); const file = formData.get("file"); if (!file || !(file instanceof File)) { return NextResponse.json( { error: "Missing file field in multipart form data" }, { status: 400 }, ); } if (file.type !== "application/pdf") { return NextResponse.json( { error: "File must be a PDF" }, { status: 400 }, ); } const buffer = Buffer.from(await file.arrayBuffer()); const result = await processInvoice(buffer, file.name); return NextResponse.json(result, { status: 200 }); } catch (err) { if (err instanceof UnrepairableError) { return NextResponse.json( { error: "Invoice data could not be extracted", details: err.message }, { status: 422 }, ); } const message = err instanceof Error ? err.message : "Unknown error"; return NextResponse.json( { error: "Pipeline processing failed", message }, { status: 500 }, ); }}
Expected output: The route exports POST (not a default export) using NextRequest and NextResponse. It validates Content-Type, extracts the file from multipart/form-data, rejects non-PDF files with 400, returns 422 on UnrepairableError, 500 on generic errors, and 200 on success. The runtime is set to nodejs (required by pdfjs-dist and node:fs).
Step 16: Set up the MSW mock server for tests
Mock the external APIs so your tests run offline. Create tests/mocks/server.ts:
Expected output: The MSW server mocks three endpoints: the Anthropic Messages API (returns a realistic invoice JSON object), Sage Intacct OAuth2 token endpoint, and Sage Intacct vendor bills endpoint. The setupTestServer() helper wires up Vitest lifecycle hooks so every test gets clean handlers.
Step 17: Run the tests
The project ships a full test suite covering schemas, every service, the orchestrator, and the API route handler with edge cases for API failures, repair failures, budget rejections, and boundary conditions. From the project root:
terminal
pnpm test
Expected output: Vitest runs the suite and prints a coverage report. All 73 tests pass across 17 test files — 0 failed — with coverage above the configured thresholds (lines > 80%, branches > 55%, functions > 75%, statements > 80%).
Add a human review dashboard — build a Next.js page at /review/invoice/[id] that displays the extracted data alongside the original PDF for manual approval or correction.
Deploy with long-running functions — set maxDuration to 300 seconds on Vercel or use a queue-worker pattern with AWS SQS so large PDFs don’t time out.
Extract purchase order matching — extend the pipeline to cross-reference poNumber against a database of open purchase orders before auto-approving the vendor bill.
Add multi-pipeline routing — use the same extraction + repair + confidence pattern for other document types (purchase orders, receipts, credit memos) and route them to different downstream systems.
Persist cost telemetry — replace the in-memory costSpans array with a database-backed store so you can query historical spend per tenant.