Anthropic Document Pipeline for Square SMB Receipt Extraction

Automatically extract line items, totals, and vendor info from Square receipts and push structured data to accounting systems.

anthropic document-pipeline receipt-extraction square nextjs typescript confidence-router structured-repair llm-cost-telemetry agent-budget-engine

The problem

Small businesses using Square accumulate hundreds of digital receipts that must be manually entered into accounting or expense trackers, leading to delays, errors, and lost deductions.

Built from

Intro

Small businesses using Square accumulate hundreds of digital receipts from coffee runs, supply purchases, and vendor payments. Manually entering every receipt into an accounting system is slow, error-prone, and causes lost deductions. This tutorial builds an automated document pipeline that ingests receipt image URLs, extracts structured data using Anthropic Claude, and pushes the results to Square — all while enforcing daily cost budgets and confidence-based quality gates. You’ll use the @reaatech/* package family for confidence routing, structured JSON repair, LLM cost telemetry, and agent budget enforcement.

By the end, you’ll have a Next.js 16+ application with three API routes (/api/ingest, /api/batch, /api/health), a full pipeline orchestrator, and a test suite with 90%+ coverage.

Prerequisites

Node.js 22+ and pnpm 10 installed
An Anthropic API key for Claude (set as ANTHROPIC_API_KEY)
A Square access token and location ID (set as SQUARE_ACCESS_TOKEN, SQUARE_LOCATION_ID)
An Unstructured API key for document preprocessing (set as UNSTRUCTURED_API_KEY)
A Langfuse account for observability (optional, with LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY)
Familiarity with TypeScript, Next.js App Router, and Zod

Step 1: Scaffold the project and install dependencies

Create the Next.js project root and install all dependencies. These exact pin versions ensure reproducible builds.

terminal

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

181 kB·137 tests·97.4% coverage·vitest passing

SHA-2563bf1ca7e3e6e9f58bbf0dcca1671976c6c98318d899455119f96e080a86db20a

Book a conversation All solutions

Comments

Loading comments…

Intro

By the end, you’ll have a Next.js 16+ application with three API routes (/api/ingest, /api/batch, /api/health), a full pipeline orchestrator, and a test suite with 90%+ coverage.

Prerequisites

Node.js 22+ and pnpm 10 installed
An Anthropic API key for Claude (set as ANTHROPIC_API_KEY)
A Square access token and location ID (set as SQUARE_ACCESS_TOKEN, SQUARE_LOCATION_ID)
An Unstructured API key for document preprocessing (set as UNSTRUCTURED_API_KEY)
A Langfuse account for observability (optional, with LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY)
Familiarity with TypeScript, Next.js App Router, and Zod

Step 1: Scaffold the project and install dependencies

Create the Next.js project root and install all dependencies. These exact pin versions ensure reproducible builds.

terminal

import Anthropic, { APIError, APIConnectionError, APIConnectionTimeoutError, RateLimitError, } from "@anthropic-ai/sdk"; export function createAnthropicClient(): Anthropic { return new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, maxRetries: 0, }); } export class ExtractionError extends Error { public readonly statusCode: number | undefined; public readonly originalError: Error; constructor(message: string, statusCode: number | undefined, originalError: Error) { super(message); this.name = "ExtractionError"; this.statusCode = statusCode; this.originalError = originalError; } } function buildExtractionPrompt(text: string): string { return `You are a receipt data extraction assistant. Extract structured data from the following receipt text. Return ONLY valid JSON matching this schema: { "vendorName": string, "vendorAddress": string (optional), "date": string (ISO date), "time": string (optional), "lineItems": [{ "name": string, "quantity": number, "unitPrice": number, "totalPrice": number, "category": string (optional) }], "subtotal": number, "tax": number (optional), "tip": number (optional), "total": number, "paymentMethod": string (optional), "receiptNumber": string (optional), "currency": string (e.g. "USD"), "confidence": number (0-1), "sourceImageUrl": string, "processingTimestamp": string (ISO datetime) } Receipt text: ${text}`; } export async function extractReceiptData( client: Anthropic, unstructuredText: string, ): Promise<Anthropic.Message> { const model = process.env.ANTHROPIC_MODEL ?? "claude-sonnet-4-6"; const maxTokens = parseInt(process.env.ANTHROPIC_MAX_TOKENS ?? "4096", 10); try { const message = await client.messages.create({ model, max_tokens: maxTokens, messages: [ { role: "user", content: [ { type: "text" as const, text: buildExtractionPrompt(unstructuredText), }, ], }, ], }); return message; } catch (error) { if (error instanceof RateLimitError) { throw new ExtractionError("Rate limit exceeded for Anthropic API", 429, error); } if (error instanceof APIConnectionTimeoutError) { throw new ExtractionError("Connection timeout for Anthropic API", 408, error); } if (error instanceof APIConnectionError) { throw new ExtractionError("Connection error for Anthropic API", undefined, error); } if (error instanceof APIError) { throw new ExtractionError(error.message, error.status as number | undefined, error); } throw error; } }

import { UnstructuredClient } from "unstructured-client"; import { Strategy } from "unstructured-client/sdk/models/shared"; export function createUnstructuredClient(): UnstructuredClient { return new UnstructuredClient({ security: { apiKeyAuth: process.env.UNSTRUCTURED_API_KEY }, }); } export class PreprocessingError extends Error { public readonly statusCode?: number; constructor(message: string, statusCode?: number) { super(message); this.name = "PreprocessingError"; this.statusCode = statusCode; } } export async function preprocessImage( client: UnstructuredClient, imageUrl: string, ): Promise<{ text: string }> { let arrayBuffer: ArrayBuffer; try { const response = await fetch(imageUrl); if (!response.ok) { throw new PreprocessingError( `Failed to fetch image: ${response.statusText}`, response.status, ); } arrayBuffer = await response.arrayBuffer(); } catch (error) { if (error instanceof PreprocessingError) { throw error; } throw new PreprocessingError( error instanceof Error ? error.message : "Unknown fetch error", ); } let partitionResponse; try { partitionResponse = await client.general.partition({ partitionParameters: { files: { content: Buffer.from(arrayBuffer), fileName: "receipt", }, strategy: Strategy.Auto, }, }); } catch (error) { throw new PreprocessingError( error instanceof Error ? error.message : "Partition failed", ); } const rawElements: Array<unknown> = []; if (partitionResponse && typeof partitionResponse === "object" && !Array.isArray(partitionResponse)) { const maybe = partitionResponse as Record<string, unknown>; if (Array.isArray(maybe.elements)) { for (const el of maybe.elements) { rawElements.push(el); } } } if (Array.isArray(partitionResponse)) { for (const el of partitionResponse) { rawElements.push(el); } } const text = rawElements .map((el) => { if (typeof el === "string") return el; if (el && typeof el === "object" && "text" in (el as Record<string, unknown>)) { return String((el as Record<string, unknown>).text); } return ""; }) .filter(Boolean) .join("\n"); return { text }; }

import { BudgetController } from "@reaatech/agent-budget-engine"; import { SpendStore } from "@reaatech/agent-budget-spend-tracker"; import { BudgetScope } from "@reaatech/agent-budget-types"; import { generateId, now, calculateCostFromTokens, type CostSpan } from "@reaatech/llm-cost-telemetry"; export function createBudgetController(): BudgetController { return new BudgetController({ spendTracker: new SpendStore(), defaultEstimateTokens: 1000, }); } export function definePipelineBudget( controller: BudgetController, ): void { controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: "receipt-pipeline", limit: parseFloat(process.env.BUDGET_DAILY_LIMIT ?? "5.0"), policy: { softCap: parseFloat(process.env.BUDGET_SOFT_CAP ?? "0.8"), hardCap: 1.0, }, }); } export function checkBudget( controller: BudgetController, estimatedCost: number, modelId: string, ): { allowed: boolean; suggestedModel?: string } { const result = controller.check({ scopeType: BudgetScope.User, scopeKey: "receipt-pipeline", estimatedCost, modelId, tools: [], }); return { allowed: result.allowed, suggestedModel: result.suggestedModel }; } export function recordSpend( controller: BudgetController, cost: number, inputTokens: number, outputTokens: number, modelId: string, provider: string, receiptId: string, ): void { controller.record({ requestId: receiptId, scopeType: BudgetScope.User, scopeKey: "receipt-pipeline", cost, inputTokens, outputTokens, modelId, provider, timestamp: new Date(), }); } export function createCostSpan( inputTokens: number, outputTokens: number, pricePerMInput: number, pricePerMOutput: number, ): CostSpan { const inputCost = calculateCostFromTokens(inputTokens, pricePerMInput); const outputCost = calculateCostFromTokens(outputTokens, pricePerMOutput); return { spanId: generateId(), id: generateId(), provider: "anthropic", model: "claude-sonnet-4-6", inputTokens, outputTokens, costUsd: inputCost + outputCost, timestamp: now(), }; } export function getBudgetState( controller: BudgetController, ): { scopeType: BudgetScope; scopeKey: string; limit: number; spent: number; remaining: number; policy: { softCap: number; hardCap: number }; state: string; lastThreshold?: number; breachCount: number; lastReset?: Date; lastUpdated: Date; } | undefined { return controller.getState(BudgetScope.User, "receipt-pipeline"); }

import { beforeAll, afterEach, afterAll, beforeEach } from "vitest"; import { setupServer } from "msw/node"; import { http, HttpResponse } from "msw"; export const server = setupServer( http.post("https://api.anthropic.com/v1/messages", () => HttpResponse.json({ id: "msg_test", type: "message", role: "assistant", model: "claude-sonnet-4-6", content: [{ type: "text", text: JSON.stringify({ vendorName: "Test Store", vendorAddress: "123 Test St", date: "2025-01-15", lineItems: [{ name: "Item", quantity: 1, unitPrice: 10, totalPrice: 10 }], subtotal: 10, total: 10, currency: "USD", confidence: 0.95, sourceImageUrl: "https://example.com/receipt.jpg", processingTimestamp: "2025-01-15T12:00:00.000Z" }) }], stop_reason: "end_turn", usage: { input_tokens: 50, output_tokens: 30 }, }) ), http.get("https://connect.squareup.com/v2/locations/:id", () => HttpResponse.json({ location: { id: "test-loc", name: "Test Location" } }) ), http.post("https://api.unstructuredapp.io/general/v0/general", () => HttpResponse.json([ { type: "Title", text: "Receipt from Test Store" }, { type: "ListItem", text: "Item 1 - $10.00" }, { type: "ListItem", text: "Total: $10.00" }, ]) ), http.post("https://cloud.langfuse.com/*", () => HttpResponse.json({ id: "trace_test", status: "ok" }) ), ); beforeAll(() => { server.listen({ onUnhandledRequest: "error" }); }); afterEach(() => { server.resetHandlers(); }); afterAll(() => { server.close(); }); beforeEach(() => { process.env.ANTHROPIC_API_KEY = "test-key"; process.env.ANTHROPIC_MODEL = "claude-sonnet-4-6"; process.env.ANTHROPIC_MAX_TOKENS = "4096"; process.env.SQUARE_ACCESS_TOKEN = "test-token"; process.env.SQUARE_LOCATION_ID = "test-loc"; process.env.UNSTRUCTURED_API_KEY = "test-key"; process.env.LANGFUSE_PUBLIC_KEY = "pk-test"; process.env.LANGFUSE_SECRET_KEY = "sk-test"; process.env.LANGFUSE_BASE_URL = "https://cloud.langfuse.com"; process.env.CONFIDENCE_ROUTE_THRESHOLD = "0.8"; process.env.CONFIDENCE_FALLBACK_THRESHOLD = "0.3"; process.env.BUDGET_DAILY_LIMIT = "5.0"; process.env.BUDGET_SOFT_CAP = "0.8"; });

Anthropic Document Pipeline for Square SMB Receipt Extraction

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Define receipt and pipeline schemas with Zod

Step 3: Load configuration from environment variables

Step 4: Create the Anthropic extraction client

Step 5: Preprocess receipt images with Unstructured

Step 6: Gate quality with the confidence router

Step 7: Enforce LLM spend with the budget engine

Step 8: Repair malformed LLM JSON with structured repair core

Step 9: Integrate with Square

Step 10: Set up Langfuse observability

Step 11: Build the pipeline orchestrator

Step 12: Create the API routes

Step 13: Run the tests

Next steps