SMB bookkeepers waste hours on manual entry from supplier invoices into NetSuite, leading to errors, late payments, and missed early-payment discounts.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an AI-powered invoice processing pipeline for small and medium businesses using NetSuite. You’ll wire together Claude (Anthropic), a multi-strategy JSON repair engine, a Redis-backed semantic cache, a tenant budget controller with auto-downgrade, and an OAuth 1.0a NetSuite client — all behind a Next.js upload endpoint. By the end you’ll have a system that takes a PDF invoice, extracts structured fields via Claude, repairs malformed JSON output, caches results for recurring vendors, enforces monthly cost caps, and pushes the resulting vendor bill to NetSuite’s REST API.
Prerequisites
Node.js >= 22 and pnpm 10 installed on your machine
An Anthropic API key with access to Claude models (claude-sonnet-4-6 and claude-haiku-4-5-20251001)
An OpenAI API key (required by the cache engine’s embedding provider — even though the extraction model is Anthropic, the semantic cache needs embeddings)
A Redis instance (local or remote) for the cache storage adapter
NetSuite REST Web Services enabled on your account, with OAuth 1.0a credentials (consumer key/secret, token key/secret, account ID)
A Langfuse account (or self-hosted instance) for tracing
Familiarity with TypeScript, Next.js App Router, and basic REST API concepts
Step 1: Scaffold the project and install dependencies
Start with a fresh Next.js 16 project using the App Router. The scaffold includes package.json, tsconfig.json, next.config.ts, and the test runner configuration.
The project depends on eight @reaatech/* packages plus third-party libraries. Every version is pinned exactly:
Expected output: A pnpm-lock.yaml is generated and all packages resolve without warnings.
Step 2: Configure environment variables
Create .env.example with placeholders for every integration your pipeline touches:
env
# Env vars used by anthropic-document-pipeline-for-netsuite-smb-invoice-processing.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# Anthropic SDKANTHROPIC_API_KEY=<your-anthropic-key># OpenAI API key (required by the LLM-Cache engine's embedder for semantic caching)OPENAI_API_KEY=<your-openai-key># Redis (for LLM-Cache storage adapter)REDIS_URL=redis://localhost:6379# NetSuite OAuth 1.0a credentials (for REST API vendor bill creation)NETSUITE_CONSUMER_KEY=<your-consumer-key>NETSUITE_CONSUMER_SECRET=<your-consumer-secret>NETSUITE_TOKEN_KEY=<your-token-key>NETSUITE_TOKEN_SECRET=<your-token-secret>NETSUITE_ACCOUNT_ID=<your-account-id># Monthly budget cap per tenantDEFAULT_TENANT_MONTHLY_BUDGET=100.0# Langfuse observabilityLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.com# LlamaCloud (@llamaindex/cloud is deprecated, kept for reference)LLAMA_CLOUD_API_KEY=<your-llamacloud-key># LLM Cost TelemetryTELEMETRY_SERVICE_NAME=anthropic-invoice-pipelineDEFAULT_DAILY_BUDGET=5.0
The config module at src/config/index.ts reads these into a typed AppConfig object:
The DEFAULT_TENANT_MONTHLY_BUDGET sets the monthly spend cap per tenant (default $100) and the redisUrl defaults to a local Redis instance.
Step 3: Define the invoice types
Create the core type definitions your pipeline will carry through every stage. src/types/invoice.ts defines the ExtractedInvoice interface, the LineItem sub-type, and the InvoiceComplexity union:
The parser at src/extract/parser.ts accepts a buffer and MIME type, then dispatches to the right text extraction library. PDFs go through pdf-parse, DOCX through mammoth, CSV passes through as raw UTF-8 text, and anything else throws an error.
ts
import { PDFParse } from "pdf-parse";import mammoth from "mammoth";import { ParseError, UnsupportedFormatError } from "../lib/errors.js";import type { Buffer } from "node:buffer";export async function parseFile(buffer: Buffer, mimeType: string): Promise<string> { switch (mimeType) { case "application/pdf": { try { const parser = new PDFParse({ data: buffer }); const result = await parser.getText(); return result.text; } catch (err) { throw new ParseError("Failed to parse PDF", mimeType, err instanceof Error ? err : undefined); } } case "application/vnd.openxmlformats-officedocument.wordprocessingml.document": { try { const result = await mammoth.extractRawText({ buffer }); return result.value; } catch (err) { throw new ParseError("Failed to parse DOCX", mimeType, err instanceof Error ? err : undefined); } } case "text/csv": { return buffer.toString("utf-8"); } default: throw new UnsupportedFormatError(mimeType); }}
Error types at src/lib/errors.ts provide structured error classes for each failure mode. These propagate up through the pipeline and are caught by the API route to return appropriate HTTP status codes:
Every constructor sets this.name explicitly so that instanceof checks and error handling work reliably across the pipeline.
Step 5: Create the Anthropic service
The Anthropic service at src/services/anthropic.ts wraps two Claude calls. The first classifies invoice complexity (simple vs complex) — this determines which model to use later. The second extracts structured invoice fields using a prompt that asks Claude to return valid JSON matching your schema.
ts
import Anthropic from "@anthropic-ai/sdk";import { config } from "../config/index.js";import { ExtractionError } from "../lib/errors.js";import type { InvoiceComplexity } from "../types/invoice.js";const client = new Anthropic({ apiKey: config.anthropicApiKey });export async function extractInvoiceFromText(rawText: string, modelId: string): Promise<string> { const prompt = `Extract invoice information from the following document. Return ONLY valid JSON with these fields: vendorName, invoiceNumber, invoiceDate, dueDate (optional), totalAmount, lineItems (array of {description, quantity, unitPrice, total, taxRate?}), taxAmount (optional), currency, poNumber (optional). Document:\n\n${rawText}`; try { const message = await client.messages.create({ model: modelId, max_tokens: 4096, system: "You are an invoice extraction assistant. Return only valid JSON matching the requested schema.", messages: [{ role: "user", content: prompt }], }); const block = message.content[0]; if (block.type === "text") { return block.text; } return "{}"; } catch (err) { const errorMessage = err instanceof Error ? err.message : "Anthropic API call failed"; const originalError = err instanceof Error ? err : undefined; throw new ExtractionError(errorMessage, originalError); }}export async function classifyInvoiceComplexity(text: string): Promise<InvoiceComplexity> { const prompt = `Classify the following invoice document as either "simple" (short, few fields, single line item) or "complex" (long, many line items, multiple pages). Document:\n\n${text}\n\nRespond with ONLY the word "simple" or "complex".`; try { const message = await client.messages.create({ model: "claude-haiku-4-5-20251001", max_tokens: 10, messages: [{ role: "user", content: prompt }], }); const block = message.content[0]; const reply = block.type === "text" ? block.text.trim().toLowerCase() : "complex"; return reply.startsWith("simple") ? "simple" : "complex"; } catch { return "complex"; }}
Key details: the apiKey is camelCase per the Anthropic SDK. Every messages.create call passes max_tokens. The response content array is handled as typed blocks — you check block.type === "text" and read .text, never treating content as a raw string.
Step 6: Implement the model router
The model router at src/router/model-router.ts defines two Claude models — a powerful one (claude-sonnet-4-6) for complex invoices and a cheaper one (claude-haiku-4-5-20251001) for simple invoices or when the budget is running low. The selectModel function returns the cost-effective choice based on complexity and remaining budget ratio.
ts
import { ModelDefinitionSchema, type ModelDefinition, RoutingRequestSchema, type RoutingRequest,} from "@reaatech/llm-router-core";import type { InvoiceComplexity } from "../types/invoice.js";export const modelDefinitions: ModelDefinition[] = [ { id: "claude-sonnet-4-6", provider: "anthropic", costPerMillionInput: 3, costPerMillionOutput: 15, maxTokens: 200000, capabilities: ["code", "reasoning", "analysis"], }, { id: "claude-haiku-4-5-20251001", provider: "anthropic", costPerMillionInput: 0.8, costPerMillionOutput: 4, maxTokens: 200000, capabilities: ["general", "summarization"], },];for (const def of modelDefinitions) { ModelDefinitionSchema.parse(def);}export function selectModel( complexity: InvoiceComplexity, budgetRemaining: number, budgetLimit: number,): string { const ratio = budgetRemaining / budgetLimit; if (complexity === "simple" || ratio < 0.2) { return "claude-haiku-4-5-20251001"; } return "claude-sonnet-4-6";}export function getModelConfig(modelId: string): ModelDefinition | undefined { return modelDefinitions.find((m) => m.id === modelId);}
The model definitions are validated at startup with ModelDefinitionSchema.parse() from @reaatech/llm-router-core, catching any configuration typos before the server starts.
Step 7: Build the budget guard
The budget guard at src/budget/guard.ts is the gatekeeper. It uses @reaatech/agent-budget-engine to define a monthly budget with a soft cap (80%) and hard cap (100%). When spending crosses the soft cap, the system auto-suggests downgrading from claude-sonnet-4-6 to claude-haiku-4-5-20251001. When it hits the hard cap, extraction is blocked entirely.
The return type is a BudgetGuard interface — this keeps the rest of the code decoupled from the budget engine internals. Every check and spend record goes through this interface.
Step 8: Create the vendor cache
The vendor cache at src/cache/vendor-cache.ts wraps @reaatech/llm-cache with a Redis storage adapter. It caches extraction results keyed by the vendor’s raw invoice text, using both exact-match and semantic lookup. Future invoices from the same vendor skip the LLM call entirely.
The CacheResult is a discriminated union: { hit: true, type: "exact" | "semantic", entry: ... } on success or { hit: false, reason: "not_found" | "below_threshold" | ... } on miss. The ingest route checks this before calling Claude.
Step 9: Implement the NetSuite client
The NetSuite client at src/services/netsuite.ts transforms an ExtractedInvoice into a NetSuite vendor bill and pushes it via the REST API. It uses OAuth 1.0a with HMAC-SHA1 signing, provided by the oauth-1.0a package and Node’s built-in crypto module.
The OAuth signature is computed from the HTTP method and URL, then attached as an Authorization header. NetSuite’s REST API requires this for every request.
Step 10: Set up Langfuse observability
The observability module at src/lib/observability.ts wraps the Langfuse SDK. It provides a wrapWithTracing helper that creates a trace, runs your function, and records success or failure — useful for the ingest endpoint.
The pipeline at src/extract/invoice.ts is the core orchestrator. It runs seven steps: parse the file, classify complexity, select a model, check the budget, call Claude, repair the JSON output, and record the spend.
The same Zod schema (ExtractedInvoiceSchema) that defines the expected shape is also passed to isValid(), repair(), and repairOutput() — the structured repair engine uses it to validate and fix the output. If Claude returns valid JSON on the first try, the repair step is skipped entirely via the isValid() fast-path.
Step 12: Create the API routes
Health endpoint
Create app/api/health/route.ts — a simple health check that returns status and timestamp:
ts
import { type NextRequest, NextResponse } from "next/server";export function GET(_req: NextRequest): NextResponse { void _req; return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), });}
Invoice ingest endpoint
Create app/api/ingest/route.ts — the main upload endpoint. It accepts multipart form-data with a file field and an optional vendorName. The handler runs the full pipeline: parse, classify, check cache, check budget, extract, repair, cache result, push to NetSuite.
ts
import { type NextRequest, NextResponse } from "next/server";import { config } from "../../../src/config/index.js";import { extractInvoice } from "../../../src/extract/invoice.js";import { createBudgetGuard } from "../../../src/budget/guard.js";import { createVendorBill } from "../../../src/services/netsuite.js";import { wrapWithTracing } from "../../../src/lib/observability.js";import { parseFile } from "../../../src/extract/parser.js";import { createVendorCache, getCachedExtraction, setCachedExtraction } from "../../../src/cache/vendor-cache.js";import { selectModel } from "../../../src/router/model-router.js";import { UnsupportedFormatError, BudgetExceededError, NetSuiteError, ExtractionError } from "../../../src/lib/errors.js"
Step 13: Wire the main exports
src/index.ts re-exports every public function so consumers can import from a single entry point:
ts
export { extractInvoice } from "./extract/invoice.js";export { createVendorCache, getCachedExtraction, setCachedExtraction } from "./cache/vendor-cache.js";export { createBudgetGuard } from "./budget/guard.js";export { createVendorBill } from "./services/netsuite.js";export { parseFile } from "./extract/parser.js";export { extractInvoiceFromText, classifyInvoiceComplexity } from "./services/anthropic.js";export { selectModel, getModelConfig } from "./router/model-router.js";
Step 14: Run the tests and verify
The test suite covers every module with mocked external services. Run all three quality gates:
terminal
pnpm typecheck
Expected output:tsc --noEmit exits 0 with no type errors. The project uses bundler module resolution, and all relative imports use .js extensions.
terminal
pnpm lint
Expected output: ESLint exits 0 with zero errors.
terminal
pnpm test
Expected output: Vitest runs the full suite — 84 tests passing across all modules with line, branch, function, and statement coverage at or above 90%. Coverage targets runtime code only (under src/ and app/**/route.ts).
Try the API with a real invoice file (requires a running Next.js dev server and configured environment variables):
Expected output: A JSON response with the extracted invoice fields, the NetSuite vendor bill ID, and whether the result was served from cache.
Next steps
Add multi-tenant isolation by scoping the budget controller and cache per tenant ID from the request header
Extend file format support by adding handlers for XLSX spreadsheets and email attachments (.eml, .msg) using additional parsing libraries
Implement webhook notifications — after a successful NetSuite push, POST a status webhook to a configured URL for integration with accounting dashboards
Add a manual review queue — route invoices that fail structured repair (low confidence scores) to a human-in-the-loop review page
Deploy via Docker — containerize the Next.js app with a sidecar Redis instance for production deployment
;
const budgetGuard = createBudgetGuard({
monthlyLimit: config.defaultMonthlyBudget,
});
let cache: Awaited<ReturnType<typeof createVendorCache>> | null = null;
async function getCache(): Promise<Awaited<ReturnType<typeof createVendorCache>>> {
if (!cache) {
cache = await createVendorCache(config.redisUrl);
}
return cache;
}
export async function POST(req: NextRequest): Promise<NextResponse> {
try {
const formData = await req.formData();
const file = formData.get("file") as File | null;
const vendorName = formData.get("vendorName") as string | null;