Anthropic Document Pipeline for SMB Accounting

Automated invoice processing with AI that catches errors and tracks costs, built on reliable agent orchestration.

typescript nextjs anthropic claude invoice-processing document-pipeline agent-orchestration redis

The problem

Accounting teams waste hours manually entering invoice data from PDFs and emails, leading to errors and delayed payments.

Built from

Intro

You’ll build an invoice processing pipeline that accepts PDF and image uploads, extracts structured data using Anthropic’s Claude, validates every result against a schema, retries on failure, and tracks per-invoice costs — all orchestrated through a Redis-backed job queue. By the end, you’ll have a working Next.js API with three endpoints, a background worker that processes up to four invoices concurrently, and a test suite that confirms every gate fires correctly.

Prerequisites

Node.js >= 22 (required by the engines field in package.json)
pnpm 10.0.0 (the project’s package manager; pnpm@10.0.0 as declared in packageManager)
Redis running locally on port 6379 (or a remote REDIS_URL you supply)
Anthropic API key with access to Claude 3 Haiku (ANTHROPIC_API_KEY)
Familiarity with TypeScript, Next.js App Router, and async job processing. You don’t need prior experience with Bull or the REAA agent libraries — they’re wired up step by step.

Step 1: Scaffold the project

Create a new directory and add the project configuration files. These pin every version and tell Next.js how to handle binary packages like sharp and ioredis.

Create package.json:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)

109 tests·99.3% coverage·vitest passing

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 (required by the engines field in package.json)
pnpm 10.0.0 (the project’s package manager; pnpm@10.0.0 as declared in packageManager)
Redis running locally on port 6379 (or a remote REDIS_URL you supply)
Anthropic API key with access to Claude 3 Haiku (ANTHROPIC_API_KEY)
Familiarity with TypeScript, Next.js App Router, and async job processing. You don’t need prior experience with Bull or the REAA agent libraries — they’re wired up step by step.

Step 1: Scaffold the project

Create a new directory and add the project configuration files. These pin every version and tell Next.js how to handle binary packages like sharp and ioredis.

Create package.json:

import { getAnthropicClient } from './anthropic-client.js'; import type { Anthropic } from '@anthropic-ai/sdk'; import { buildExtractionPrompt } from './prompt-builder.js'; import { getCostTracker } from './cost-tracker.js'; import type { ExtractedData } from '../types/index.js'; export class ParseError extends Error { constructor(message: string) { super(message); this.name = 'ParseError'; } } export class RateLimitError extends Error { retryAfter: number; constructor(retryAfter: number) { super(`Rate limit exceeded. Retry after ${retryAfter}s`); this.name = 'RateLimitError'; this.retryAfter = retryAfter; } } export class AnthropicServerError extends Error { constructor(message: string) { super(`Anthropic server error: ${message}`); this.name = 'AnthropicServerError'; } } export class AnthropicClientError extends Error { constructor(message: string, status?: number) { super(`Anthropic client error (${status ?? 400}): ${message}`); this.name = 'AnthropicClientError'; } } const MODEL_ID = 'claude-3-haiku-20240307'; export async function extractInvoice( documentText: string, scopeKey: string, ): Promise<ExtractedData> { const prompt = buildExtractionPrompt(documentText); const estimatedInputTokens = Math.ceil(prompt.length / 4); const estimatedOutputTokens = 1024; const costTracker = getCostTracker(); await costTracker.checkBeforeCall(scopeKey, estimatedInputTokens, estimatedOutputTokens, MODEL_ID); const client = getAnthropicClient(); let response: Anthropic.Message; try { response = await client.messages.create({ model: MODEL_ID, max_tokens: 4096, temperature: 0, messages: [{ role: 'user', content: prompt }], }); } catch (err: unknown) { const errObj = err as Error & { status?: number; headers?: Record<string, string> }; if (errObj.status === 429) { const headers = errObj.headers ?? {}; const retryAfter = parseInt(headers['retry-after'] ?? headers['Retry-After'] ?? '30', 10); throw new RateLimitError(retryAfter); } if (errObj.status !== undefined && errObj.status >= 500) { throw new AnthropicServerError(errObj.message); } if (errObj.status !== undefined && errObj.status >= 400) { throw new AnthropicClientError(errObj.message, errObj.status); } throw err; } const content = response.content[0]; if (!content || content.type !== 'text') { throw new ParseError('Empty response from Anthropic'); } let parsed: unknown; try { parsed = JSON.parse(content.text); } catch { throw new ParseError(`Failed to parse response as JSON: ${content.text.slice(0, 200)}`); } const actualInputTokens = response.usage.input_tokens ?? estimatedInputTokens; const actualOutputTokens = response.usage.output_tokens ?? estimatedOutputTokens; await costTracker.recordAfterCall( scopeKey, response.id, actualInputTokens, actualOutputTokens, MODEL_ID, ); return parsed as ExtractedData; }

import { BudgetController } from '@reaatech/agent-budget-engine'; import { Redis } from 'ioredis'; import { RedisSpendStore } from './spend-store.js'; import { AnthropicPricingProvider } from './pricing.js'; import { getRedis } from './redis.js'; const BudgetScope = { User: 'user' } as const; export class CostTracker { private controller: BudgetController; constructor() { const redis: Redis = getRedis(); const spendStore = new RedisSpendStore(redis); const pricingProvider = new AnthropicPricingProvider(); this.controller = new BudgetController({ spendTracker: spendStore as never, pricing: pricingProvider as never, }); } async defineInvoiceBudget(scopeKey: string, limitUsd: number): Promise<void> { this.controller.defineBudget({ scopeType: BudgetScope.User as never, scopeKey, limit: limitUsd, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: [], disableTools: [] } as never, }); } async checkBeforeCall( scopeKey: string, estimatedInputTokens: number, estimatedOutputTokens: number, modelId: string, ): Promise<void> { const pricingProvider = new AnthropicPricingProvider(); const estimatedCost = pricingProvider.estimate(modelId, estimatedInputTokens, estimatedOutputTokens); const result = this.controller.check({ scopeType: BudgetScope.User as never, scopeKey, estimatedCost, modelId, tools: [], } as never); const resultObj = result as never as { allowed: boolean }; if (!resultObj.allowed) { throw Object.assign( new Error(`Budget exceeded for scope ${scopeKey}`), { code: 'BudgetExceededError' }, ); } } async recordAfterCall( scopeKey: string, requestId: string, inputTokens: number, outputTokens: number, modelId: string, ): Promise<void> { const pricingProvider = new AnthropicPricingProvider(); const cost = pricingProvider.estimate(modelId, inputTokens, outputTokens); this.controller.record({ requestId, scopeType: BudgetScope.User as never, scopeKey, cost, inputTokens, outputTokens, modelId, provider: 'anthropic', timestamp: new Date(), } as never); } async getSpendForScope(scopeKey: string): Promise<{ spent: number; remaining: number; status: string }> { const state = this.controller.getState(BudgetScope.User as never, scopeKey); const stateObj = (state ?? {}) as never as { spent: number; remaining: number; state: string }; return { spent: stateObj.spent ?? 0, remaining: stateObj.remaining ?? 0, status: stateObj.state ?? 'Active', }; } } let costTrackerInstance: CostTracker | null = null; export function getCostTracker(): CostTracker { if (!costTrackerInstance) { costTrackerInstance = new CostTracker(); } return costTrackerInstance; }

import type { Job } from 'bull'; import { parsePdf } from '../lib/parse-pdf.js'; import { extractInvoice } from '../lib/invoke-claude.js'; import { validateExtractedInvoice } from '../lib/validation.js'; import { retryExtraction } from '../lib/runbook.js'; import { saveInvoice } from '../lib/invoice-store.js'; import type { JobData, ExtractedData } from '../types/index.js'; const IMAGE_EXTENSIONS = new Set(['.png', '.jpg', '.jpeg', '.tiff', '.tif']); export default async function processInvoice(job: Job<JobData>): Promise<void> { const { document, fileName, jobId } = job.data; const logCtx = { jobId }; console.log('Processing job', logCtx); let documentText = ''; const lowerName = fileName.toLowerCase(); if (lowerName.endsWith('.pdf')) { console.log('Parsing PDF', logCtx); try { documentText = await parsePdf(document); } catch (err: unknown) { console.error('PDF parse error', { ...logCtx, error: err instanceof Error ? err.message : String(err) }); throw err; } if (documentText === '') { console.warn('No extractable text found (image-based PDF)', logCtx); throw new Error('No extractable text found (image-based PDF)'); } } else if (IMAGE_EXTENSIONS.has(lowerName)) { console.warn('Image-based input detected; OCR is out of scope for MVP', logCtx); throw new Error('Image processing requires OCR which is out of scope for MVP'); } else { throw new Error(`Unsupported file type: ${fileName}`); } console.log('Extracting invoice data', logCtx); try { const validData = await retryExtraction( () => extractInvoice(documentText, jobId), (data: unknown) => validateExtractedInvoice(data), 3, ); console.log('Saving extracted invoice', logCtx); await saveInvoice(jobId, validData); const jobWithProgress = job as never as { updateProgress: (pct: number) => Promise<void> }; await jobWithProgress.updateProgress(100); console.log('Job completed successfully', logCtx); } catch (err: unknown) { const errMessage = err instanceof Error ? err.message : String(err); console.error('All retries exhausted', { ...logCtx, error: errMessage }); // Check for budget-exceeded or auth errors (non-retryable) if (errMessage.includes('Budget exceeded') || errMessage.includes('AnthropicClientError')) { throw new Error(errMessage); } // Save failure metadata const failureData: ExtractedData = { error: errMessage, jobId }; await saveInvoice(jobId, failureData); throw new Error(`All retries exhausted: ${errMessage}`); } }

Anthropic Document Pipeline for SMB Accounting

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project

Step 2: Install dependencies

Step 3: Set environment variables

Step 4: Create shared types and Zod schema

Step 5: Create infrastructure clients

Step 6: Create the document extraction pipeline

Step 7: Create cost tracking

Step 8: Create validation gates and retry logic

Step 9: Create the job queue, processor, and worker

Step 10: Create the API routes

Step 11: Test and verify

Next steps