Small businesses using Square accumulate hundreds of digital receipts that must be manually entered into accounting or expense trackers, leading to delays, errors, and lost deductions.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small businesses using Square accumulate hundreds of digital receipts from coffee runs, supply purchases, and vendor payments. Manually entering every receipt into an accounting system is slow, error-prone, and causes lost deductions. This tutorial builds an automated document pipeline that ingests receipt image URLs, extracts structured data using Anthropic Claude, and pushes the results to Square — all while enforcing daily cost budgets and confidence-based quality gates. You’ll use the @reaatech/* package family for confidence routing, structured JSON repair, LLM cost telemetry, and agent budget enforcement.
By the end, you’ll have a Next.js 16+ application with three API routes (/api/ingest, /api/batch, /api/health), a full pipeline orchestrator, and a test suite with 90%+ coverage.
Prerequisites
Node.js 22+ and pnpm 10 installed
An Anthropic API key for Claude (set as ANTHROPIC_API_KEY)
A Square access token and location ID (set as SQUARE_ACCESS_TOKEN, SQUARE_LOCATION_ID)
An Unstructured API key for document preprocessing (set as UNSTRUCTURED_API_KEY)
A Langfuse account for observability (optional, with LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY)
Familiarity with TypeScript, Next.js App Router, and Zod
Step 1: Scaffold the project and install dependencies
Create the Next.js project root and install all dependencies. These exact pin versions ensure reproducible builds.
Your package.json should pin every version to an exact semver (no ^ or ~). After install, set up tsconfig.json with strict mode and moduleResolution: "bundler", and next.config.ts with experimental.instrumentationHook: true since you’ll add an instrumentation.ts file later.
Expected output: A working Next.js project with all dependencies installed and pnpm typecheck passing.
Step 2: Define receipt and pipeline schemas with Zod
The pipeline processes three data shapes: individual line items, the full receipt, and the pipeline request/result envelope. Create src/schemas/receipt.ts:
export { ReceiptLineItemSchema, ReceiptSchema, ExtractedReceiptSchema, type ReceiptLineItem, type Receipt, type ExtractedReceipt,} from "./receipt.js";export { IngestionRequestSchema, PipelineResultSchema, type IngestionRequest, type PipelineResult, type PipelineStatus,} from "./pipeline.js";
Expected output: Three schema files under src/schemas/. pnpm typecheck should pass with no errors.
Step 3: Load configuration from environment variables
Create src/lib/config.ts to parse all environment variables through a single Zod schema. This centralises your env setup and gives you typed config objects everywhere:
Expected output: A config module that reads all env vars and validates them with Zod. loadConfig() will throw immediately if any required variable is missing.
Step 4: Create the Anthropic extraction client
Create src/lib/anthropic.ts to wrap the @anthropic-ai/sdk. The extractReceiptData function builds a structured extraction prompt, calls Claude, and wraps API errors in a custom ExtractionError class:
ts
import Anthropic, { APIError, APIConnectionError, APIConnectionTimeoutError, RateLimitError,} from "@anthropic-ai/sdk";export function createAnthropicClient(): Anthropic { return new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, maxRetries: 0, });}export class ExtractionError extends Error { public readonly statusCode: number | undefined; public readonly originalError: Error; constructor(message: string, statusCode: number | undefined, originalError: Error) { super(message); this.name = "ExtractionError"; this.statusCode = statusCode; this.originalError = originalError; }}function buildExtractionPrompt(text: string): string { return `You are a receipt data extraction assistant. Extract structured data from the following receipt text.Return ONLY valid JSON matching this schema:{ "vendorName": string, "vendorAddress": string (optional), "date": string (ISO date), "time": string (optional), "lineItems": [{ "name": string, "quantity": number, "unitPrice": number, "totalPrice": number, "category": string (optional) }], "subtotal": number, "tax": number (optional), "tip": number (optional), "total": number, "paymentMethod": string (optional), "receiptNumber": string (optional), "currency": string (e.g. "USD"), "confidence": number (0-1), "sourceImageUrl": string, "processingTimestamp": string (ISO datetime)}Receipt text:${text}`;}export async function extractReceiptData( client: Anthropic, unstructuredText: string,): Promise<Anthropic.Message> { const model = process.env.ANTHROPIC_MODEL ?? "claude-sonnet-4-6"; const maxTokens = parseInt(process.env.ANTHROPIC_MAX_TOKENS ?? "4096", 10); try { const message = await client.messages.create({ model, max_tokens: maxTokens, messages: [ { role: "user", content: [ { type: "text" as const, text: buildExtractionPrompt(unstructuredText), }, ], }, ], }); return message; } catch (error) { if (error instanceof RateLimitError) { throw new ExtractionError("Rate limit exceeded for Anthropic API", 429, error); } if (error instanceof APIConnectionTimeoutError) { throw new ExtractionError("Connection timeout for Anthropic API", 408, error); } if (error instanceof APIConnectionError) { throw new ExtractionError("Connection error for Anthropic API", undefined, error); } if (error instanceof APIError) { throw new ExtractionError(error.message, error.status as number | undefined, error); } throw error; }}
Expected output: A typed Anthropic wrapper that maps HTTP 429, network timeouts, and server errors into structured ExtractionError instances.
Step 5: Preprocess receipt images with Unstructured
Before Claude can extract structured data, raw receipt images must be converted to text. Create src/lib/unstructured.ts to fetch the image and run Unstructured’s partition API:
ts
import { UnstructuredClient } from "unstructured-client";import { Strategy } from "unstructured-client/sdk/models/shared";export function createUnstructuredClient(): UnstructuredClient { return new UnstructuredClient({ security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY }, });}export class PreprocessingError extends Error { public readonly statusCode?: number; constructor(message: string, statusCode?: number) { super(message); this.name = "PreprocessingError"; this.statusCode = statusCode; }}export async function preprocessImage( client: UnstructuredClient, imageUrl: string,): Promise<{ text: string }> { let arrayBuffer: ArrayBuffer; try { const response = await fetch(imageUrl); if (!response.ok) { throw new PreprocessingError( `Failed to fetch image: ${response.statusText}`, response.status, ); } arrayBuffer = await response.arrayBuffer(); } catch (error) { if (error instanceof PreprocessingError) { throw error; } throw new PreprocessingError( error instanceof Error ? error.message : "Unknown fetch error", ); } let partitionResponse; try { partitionResponse = await client.general.partition({ partitionParameters: { files: { content: Buffer.from(arrayBuffer), fileName: "receipt", }, strategy: Strategy.Auto, }, }); } catch (error) { throw new PreprocessingError( error instanceof Error ? error.message : "Partition failed", ); } const rawElements: Array<unknown> = []; if (partitionResponse && typeof partitionResponse === "object" && !Array.isArray(partitionResponse)) { const maybe = partitionResponse as Record<string, unknown>; if (Array.isArray(maybe.elements)) { for (const el of maybe.elements) { rawElements.push(el); } } } if (Array.isArray(partitionResponse)) { for (const el of partitionResponse) { rawElements.push(el); } } const text = rawElements .map((el) => { if (typeof el === "string") return el; if (el && typeof el === "object" && "text" in (el as Record<string, unknown>)) { return String((el as Record<string, unknown>).text); } return ""; }) .filter(Boolean) .join("\n"); return { text };}
Expected output: A preprocessing module that downloads a receipt image by URL, partitions it through Unstructured, and returns the concatenated text.
Step 6: Gate quality with the confidence router
Poor-quality scans shouldn’t waste LLM calls. Create src/lib/confidence.ts to gate extraction based on OCR text length (a proxy for confidence):
In the pipeline orchestrator, you’ll compute a confidence score from text length: texts over 500 characters score 0.95, between 100 and 500 score 0.7, and shorter texts score progressively lower. The router returns "ROUTE" when the score meets the threshold and "FALLBACK" when it doesn’t.
Expected output: A confidence gating module that prevents low-quality receipts from reaching the LLM.
Step 7: Enforce LLM spend with the budget engine
AI pipelines can rack up costs quickly. Create src/lib/cost.ts to define a daily USD budget, check before each call, and record spend afterward:
Expected output: A cost module that defines a $5 daily cap with an 80% soft-cap warning, estimates costs before each LLM call, and records actual spend after.
Step 8: Repair malformed LLM JSON with structured repair core
LLMs don’t always output perfect JSON. Create src/lib/repair.ts to clean code fences, strip trailing commas, extract JSON from surrounding text, and validate against the ExtractedReceiptSchema:
ts
import { repair, repairOutput, isValid, analyzeInput } from "@reaatech/structured-repair-core";import { z } from "zod";import { ExtractedReceiptSchema, type ExtractedReceipt } from "../schemas/receipt.js";function stripCodeFences(raw: string): string { return raw.replace(/^```(?:json)?\s*\n?/gm, "").replace(/\n?\s*```\s*$/gm, "").trim();}function
The primary entry point is repairReceiptOutput() — it first tries @reaatech/structured-repair-core’s repair(), then falls back to manual JSON cleaning strategies. If all strategies fail, it throws UnrepairableReceiptError with diagnostic issue data.
Expected output: A repair module that handles markdown code fences, trailing commas, embedded text noise, and schema validation failures.
Step 9: Integrate with Square
Create src/lib/square.ts to validate the configured Square location and push extracted receipts as expenses:
ts
import { SquareClient, SquareError } from "square";export type { SquareClient };import { type ExtractedReceipt } from "../schemas/receipt.js";export function createSquareClient(): SquareClient { return new SquareClient({ token: process.env.SQUARE_ACCESS_TOKEN });}export async function verifyLocation( client: SquareClient,): Promise<{ id: string; name: string }> { const locationId = process.env.SQUARE_LOCATION_ID; if (!locationId) { throw new SquareError({ message: "SQUARE_LOCATION_ID is not set", }); } try { const response = await client.locations.get({ locationId }); const location = response.location; if (!location) { throw new SquareError({ message: "Location not found" }); } return { id: location.id ?? locationId, name: location.name ?? "Unknown" }; } catch (error) { if (error instanceof SquareError) { throw error; } throw new SquareError({ message: error instanceof Error ? error.message : "Unknown Square error", }); }}export async function recordReceiptExpense( client: SquareClient, receipt: ExtractedReceipt,): Promise<void> { const location = await verifyLocation(client); console.log( `Recording expense for receipt at location ${location.id} (${location.name})`, ); console.log(`Vendor: ${receipt.vendorName}, Total: ${String(receipt.total)} ${receipt.currency}`); try { await client.locations.get({ locationId: location.id }); } catch (error) { if (error instanceof SquareError) { throw error; } throw new SquareError({ message: error instanceof Error ? error.message : "Unknown Square error", }); }}
Expected output: A Square integration that verifies the location on startup and logs the receipt expense details for the configured Square location.
Step 10: Set up Langfuse observability
Create src/lib/observability.ts for pipeline tracing and src/instrumentation.ts to initialise it at Next.js server startup:
The initObservability() function registers a SIGTERM handler that flushes pending Langfuse events before the process exits — critical for capturing traces from short-lived serverless functions.
Now create src/instrumentation.ts — Next.js calls this at server startup when experimental.instrumentationHook is enabled:
ts
export async function register(): Promise<void> { if (process.env.NEXT_RUNTIME === "nodejs") { const { initObservability } = await import("./lib/observability.js"); initObservability(); }}
The NEXT_RUNTIME guard ensures Node-only modules aren’t loaded in the Edge runtime. The dynamic import() achieves the same thing — preventing module-level side-effects from running where they don’t belong.
Expected output: Langfuse initialised at server startup with a SIGTERM flush handler. pnpm typecheck passes.
Step 11: Build the pipeline orchestrator
Now wire everything together in src/pipeline/receiptProcessor.ts. This is the central module that orchestrates preprocessing, confidence gating, budget checking, LLM extraction, JSON repair, and Square push:
ts
import { loadConfig, type AppConfig } from "../lib/config.js";import { createAnthropicClient, extractReceiptData, ExtractionError,} from "../lib/anthropic.js";import { createUnstructuredClient, preprocessImage, PreprocessingError,} from "../lib/unstructured.js";import { createConfidenceRouter, shouldProcess } from "../lib/confidence.js";import { repairReceiptOutput, UnrepairableReceiptError,} from "../lib/repair.js";import { createBudgetController, definePipelineBudget, checkBudget, recordSpend,}
Key design decisions in the orchestrator:
Per-receipt 120-second timeout via AbortController + Promise.race — prevents a single stuck receipt from hanging the pipeline indefinitely.
Graceful degradation at every step — preprocessing failures return api_error, low confidence returns low_confidence, exceeded budgets return budget_exceeded, and repair failures return repair_failed. Every path produces a valid PipelineResult.
Best-effort Square push — if recording the expense fails, the pipeline still returns success with the extracted data. The Square integration is logged but not critical to data extraction.
processBatch runs receipts in parallel — each URL gets its own createContext call with a unique receipt ID.
Expected output: The core pipeline orchestrator with typed stages, per-receipt timeout, and graceful error handling across all 5 pipeline statuses.
Step 12: Create the API routes
Wire the pipeline to HTTP endpoints using Next.js App Router. Create three route files.
First, app/api/health/route.ts — a simple health check:
ts
import { NextResponse } from "next/server";export function GET() { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), });}
Second, app/api/ingest/route.ts — accept a single receipt URL:
Notice the route handlers use NextRequest / NextResponse.json() — never bare Request or new Response(JSON.stringify(...)). NextResponse.json() automatically sets Content-Type: application/json.
Expected output: Three API routes — /api/health (GET), /api/ingest (POST, single receipt), /api/batch (POST, up to 50 receipts).
Step 13: Run the tests
The project includes a complete test suite with MSW (Mock Service Worker) intercepting all external HTTP calls. The test setup in tests/setup.ts creates mock servers for Anthropic, Square, Unstructured, and Langfuse, then sets all environment variables before each test:
The fixture module (tests/fixtures.ts) provides reusable test helpers, including validReceipt() for constructing test ExtractedReceipt objects with sensible defaults.
Run the full test suite with coverage:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output:
All tests passing (numFailedTests=0)
Coverage thresholds at 90%+ for lines, branches, functions, and statements
Coverage scoped to runtime code (src/**/*.ts, app/**/route.ts) — UI files are excluded
Next steps
Add a webhook callback — extend IngestionRequestSchema with a callbackUrl field and POST the PipelineResult to it after processing completes
Store receipts in a database — add a SQLite or Postgres repository to persist ExtractedReceipt records for querying and deduplication
Add a dashboard UI — build a Next.js page at /receipts that lists processed receipts with status filters and cost summaries
Circuit breaker for repeated failures — detect when the same receipt URL fails 3+ times and back off proactively using @reaatech/confidence-router state persistence
Add structured-rate cost calculators — replace the hardcoded inputPricePerM / outputPricePerM with a rate card loaded from @reaatech/llm-cost-telemetry
fixTrailingCommas
(json
:
string
)
:
string
{
return json.replace(/,(\s*[}\]])/g, "$1");
}
function extractJsonWithRegex(raw: string): string | null {