Skip to content
/ solutions / mistral-ai-invoice-extraction-for-smb-accounting Mistral AI Invoice Extraction for SMB Accounting Automatically extract vendor, amount, and line items from invoices using Mistral Large, with cost-aware processing.
The problem SMB accountants spend hours manually entering invoice data into accounting software. Errors and delays cost money.
Example artifact A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
115 kB · 125 tests· 99.7% coverage· vitest passing
SHA-256 429147a7d1ee02e682f6f9d3e485fa4409c24580d69fb0563f0b12fb1a8956aa Comments Sign in to commentSign in with GitHub to comment and vote.
© 2026 REAA Technologies Inc. — Open-Source AI Solutions for Small Business.
On this page Intro
In this tutorial you’ll build an Express.js server that accepts invoice uploads (PDF, images, DOCX), parses them into text, and extracts structured accounting data — vendor, line items, totals — using Mistral AI’s language model. Low-confidence extractions are automatically routed to a pending review queue for a human to confirm, and every LLM call is tracked against a configurable monthly budget cap so you never get a surprise bill. By the end you’ll have a working API you can test with real invoices.
Prerequisites
Node.js >= 22
pnpm 10.x (the package.json specifies "packageManager": "pnpm@10.0.0")
A Mistral AI API key (sign up at console.mistral.ai )
A LlamaCloud API key (sign up at cloud.llamaindex.ai ) — needed only for image-based invoices
Familiarity with TypeScript, Express.js, and async/await
Step 1: Scaffold the project
Start from an empty directory. Create the project structure and configuration files so TypeScript, Vitest, and ESLint are ready before you write any application code.
mkdir mistral-invoice-extraction && cd mistral-invoice-extraction
pnpm init
Open package.json and replace it with the following, pinning every version exactly:
{
"name" : "mistral-invoice-extraction" ,
"version" : "0.1.0" ,
"private" : true ,
"type" : "module" ,
"engines" : {
"node" : ">=22"
},
"packageManager" : "pnpm@10.0.0" ,
"scripts" : {
"typecheck" : "tsc --noEmit" ,
"lint" : "eslint ." ,
"test" : "vitest run --coverage --reporter=json --outputFile=vitest-report.json"
}
} {
"compilerOptions" : {
"target" : "ES2022" ,
"module" : "NodeNext" ,
"moduleResolution" : "NodeNext" ,
"strict" : true ,
"esModuleInterop" : true ,
"forceConsistentCasingInFileNames" : true ,
"skipLibCheck" : true ,
"resolveJsonModule" : true ,
"isolatedModules" : true ,
"noUncheckedIndexedAccess" : true ,
"exactOptionalPropertyTypes" : true ,
"outDir" : "dist"
},
"include" : [ "src/**/*" , "tests/**/*" , "*.config.ts" , "*.config.mjs" ]
} Create vitest.config.ts — coverage thresholds are set to 90% across lines, branches, functions, and statements:
import { defineConfig } from "vitest/config" ;
export default defineConfig ({
test: {
globals: true ,
environment: "node" ,
coverage: {
provider: "v8" ,
reporter: [ "text" , "json" , "json-summary" ],
reportsDirectory: "./coverage" ,
thresholds: {
lines: 90 ,
branches: 90 ,
functions: 90 ,
statements: 90 ,
},
exclude: [
"node_modules/**" ,
"dist/**" ,
"coverage/**" ,
"**/*.config.{ts,mjs,js}" ,
"**/*.d.ts" ,
],
},
},
}); Create eslint.config.mjs:
import tseslint from "typescript-eslint" ;
export default tseslint. config (
{
ignores: [ "eslint.config.mjs" ],
},
... tseslint.configs.strictTypeChecked,
{
languageOptions: {
parserOptions: { project: "./tsconfig.json" },
},
rules: {
"@typescript-eslint/no-explicit-any" : "error" ,
"@typescript-eslint/ban-ts-comment" : [
"error" ,
{ "ts-ignore" : true , "ts-expect-error" : true , "ts-nocheck" : true },
],
"@typescript-eslint/no-unnecessary-type-assertion" : "error" ,
"@typescript-eslint/no-unsafe-assignment" : "off" ,
},
},
{
files: [ "tests/**/*.test.ts" ],
rules: {
"@typescript-eslint/no-unsafe-member-access" : "off" ,
"@typescript-eslint/no-unsafe-call" : "off" ,
"@typescript-eslint/no-unsafe-return" : "off" ,
"@typescript-eslint/require-await" : "off" ,
"@typescript-eslint/no-confusing-void-expression" : "off" ,
"@typescript-eslint/restrict-template-expressions" : "off" ,
"@typescript-eslint/no-unused-vars" : "off" ,
"@typescript-eslint/unbound-method" : "off" ,
"@typescript-eslint/no-non-null-assertion" : "off" ,
},
},
); Create the source directories:
mkdir -p src/api src/lib tests
Step 2: Install dependencies Run this single command to install every runtime and dev dependency:
pnpm add @llamaindex/cloud@4.1.3 @mistralai/mistralai@2.2.1 @reaatech/agent-budget-spend-tracker@0.1.0 @reaatech/agent-handoff-protocol@0.1.0 @reaatech/agent-memory-extraction@0.1.0 express@5.2.1 mammoth@1.12.0 multer@2.1.1 sharp@0.34.5 unpdf@1.6.2 zod@4.4.3 pnpm add -D @types/express@5.0.6 @types/multer@2.1.0 @types/node@25.7.0 @types/supertest@7.2.0 @vitest/coverage-v8@4.1.6 eslint@10.3.0 msw@2.14.6 supertest@7.2.2 tsx@4.21.0 typescript@6.0.3 typescript-eslint@8.59.3 vitest@4.1.6 Expected output: pnpm resolves and installs the packages. You should see a node_modules directory and a pnpm-lock.yaml file appear.
Step 3: Set environment variables and config Create .env.example so collaborators know which keys are required:
MISTRAL_API_KEY=<your-mistral-key>
LLAMA_CLOUD_API_KEY=<your-llama-cloud-key>
PORT=<port>
BUDGET_CAP_MONTHLY_USD=<monthly-cap>
HANDOFF_RECIPIENT_URL=<recipient-url>
CONFIDENCE_THRESHOLD=<threshold>
MISTRAL_MODEL=<mistral-model> Copy it and fill in your real keys:
Now create src/config.ts — this validates every env variable at startup using Zod and throws a typed ConfigError if anything is missing:
import { z } from "zod" ;
const configSchema = z. object ({
MISTRAL_API_KEY: z. string (). min ( 1 , "MISTRAL_API_KEY is required" ),
LLAMA_CLOUD_API_KEY: z. string (). min ( 1 , "LLAMA_CLOUD_API_KEY is required" ),
PORT: z.coerce. number (). default ( 3000 ),
BUDGET_CAP_MONTHLY_USD: z.coerce. number (). default ( 50 ),
HANDOFF_RECIPIENT_URL: z. string (). min ( 1 , "HANDOFF_RECIPIENT_URL is required" ),
CONFIDENCE_THRESHOLD: z.coerce. number (). default ( 0.7 ),
MISTRAL_MODEL: z. string (). default ( "mistral-large-latest" ),
});
export type Config = z . infer < typeof configSchema>;
export class ConfigError extends Error {
public readonly details : Array <{ path : string ; message : string }>;
constructor (details : Array <{ path : string ; message : string }>) {
const keys = details. map ((d) => d.path). join ( ", " );
super( `Config validation failed for: ${ keys }` );
this.name = "ConfigError" ;
this.details = details;
}
}
export function getConfig () : Config {
const result = configSchema. safeParse (process.env);
if ( ! result.success) {
const issues = result.error.issues. map ((i) => ({
path: i.path. join ( "." ),
message: i.message,
}));
throw new ConfigError (issues);
}
return result.data;
}
Step 4: Define types and logger Create src/types.ts — these Zod schemas and TypeScript interfaces define the shape of every data structure flowing through the pipeline:
import { z } from "zod" ;
export const InvoiceLineItemSchema = z. object ({
description: z. string (),
quantity: z. number (). positive (),
unitPrice: z. number (). positive (),
total: z. number (). positive (),
});
export type InvoiceLineItem = z . infer < typeof InvoiceLineItemSchema>;
export const InvoiceExtractionSchema = z. object ({
vendor: z. string (),
invoiceDate: z. string (),
totalAmount: z. number (),
lineItems: z. array (InvoiceLineItemSchema),
currency: z. string (),
});
export type InvoiceExtraction = z . infer < typeof InvoiceExtractionSchema>;
export const ConfidenceResultSchema = z. object ({
score: z. number (). min ( 0 ). max ( 1 ),
indicators: z. array (z. string ()),
threshold: z. number (),
});
export type ConfidenceResult = z . infer < typeof ConfidenceResultSchema>;
export const InvoiceResultSchema = z. object ({
fileId: z. string (),
extraction: InvoiceExtractionSchema. nullable (),
confidence: ConfidenceResultSchema,
reviewRequired: z. boolean (),
});
export type InvoiceResult = z . infer < typeof InvoiceResultSchema>;
export interface ParsedDocument {
text : string ;
sourceType : "pdf" | "image" | "docx" ;
pageCount ?: number ;
}
export interface PendingEntry {
fileId : string ;
extraction : InvoiceExtraction ;
originalText : string ;
confidence : ConfidenceResult ;
submittedAt : Date ;
} Create src/logger.ts — a thin wrapper around console that prepends ISO timestamps:
function timestamp () : string {
return new Date (). toISOString ();
}
export interface Logger {
info (message : string , ... args : unknown []) : void ;
warn (message : string , ... args : unknown []) : void ;
error (message : string , ... args : unknown []) : void ;
}
export function log (message : string , ... args : unknown []) : void {
console. log ( `[${ timestamp () }] INFO: ${ message }` , ... args);
}
export function warn (message : string , ... args : unknown []) : void {
console. warn ( `[${ timestamp () }] WARN: ${ message }` , ... args);
}
export function error (message : string , ... args : unknown []) : void {
console. error ( `[${ timestamp () }] ERROR: ${ message }` , ... args);
}
Step 5: Build the document parser Create src/lib/document-parser.ts. This module dispatches incoming buffers to the right parser depending on the MIME type: unpdf for PDFs, sharp + @llamaindex/cloud’s LlamaParseReader for images, and mammoth for DOCX files. Each parser includes retry logic with exponential backoff.
import { extractText, getDocumentProxy } from "unpdf" ;
import sharp from "sharp" ;
import mammoth from "mammoth" ;
import type { ParsedDocument } from "../types.js" ;
import { log, warn } from "../logger.js" ;
export class UnsupportedDocumentError extends Error {
public readonly mimeType : string ;
constructor (mimeType : string ) {
super( `Unsupported document type: ${ mimeType }` );
this.name = "UnsupportedDocumentError" ;
this.mimeType = mimeType;
Step 6: Build the Mistral LLM adapter and client You need two related modules. First, create src/lib/llm-adapter.ts. This provides the LLMProvider and EmbeddingProvider interfaces required by the REAA MemoryExtractor, along with a general-purpose chatComplete wrapper:
import { Mistral } from "@mistralai/mistralai" ;
import { warn } from "../logger.js" ;
import { getConfig } from "../config.js" ;
export class AuthError extends Error {
constructor (message : string ) {
super(message);
this.name = "AuthError" ;
}
}
export class ProviderError extends Error {
constructor (message : string ) {
super(message);
this.name = "ProviderError" ;
}
}
function createMistralClient
Next, create src/lib/mistral-client.ts. This module offers a richer chatComplete that uses Mistral’s tool-calling API and falls back to JSON mode if tool calls aren’t available. It also includes API key redaction in error messages and structured error classification:
import { Mistral } from "@mistralai/mistralai" ;
import type { z } from "zod" ;
import { getConfig } from "../config.js" ;
import { log, warn, error as logError } from "../logger.js" ;
import { AuthError, ProviderError } from "./llm-adapter.js" ;
function createMistralClient () : Mistral {
const config = getConfig ();
return new Mistral ({ apiKey: config.MISTRAL_API_KEY });
}
async function retryWithBackoff < T >(
fn : () => Promise <
Step 7: Build extraction and review libraries Now create the four supporting library modules. Each does one thing.
Create src/lib/confidence.ts — a rule-based scorer that deducts points for missing vendor, non-positive totals, empty line items, mismatched math, or missing currency:
import type { InvoiceExtraction, ConfidenceResult } from "../types.js" ;
import { getConfig } from "../config.js" ;
export function assessConfidence (
extraction : InvoiceExtraction ,
threshold ?: number ,
) : ConfidenceResult {
const config = getConfig ();
const effectiveThreshold = threshold ?? config.CONFIDENCE_THRESHOLD;
let score = 1.0 ;
const indicators : string [] = [];
if ( ! extraction.vendor || extraction.vendor. trim ().length === 0 ) {
score -= 0.3 ;
indicators. push ( "missing_vendor" );
}
if (extraction.totalAmount <= 0 ) {
score -= 0.3 ;
indicators. push ( "non_positive_total" );
}
if (extraction.lineItems.length === 0 ) {
score -= 0.3 ;
indicators. push ( "no_line_items" );
}
if ( ! extraction.currency || extraction.currency. trim ().length === 0 ) {
score -= 0.1 ;
indicators. push ( "missing_currency" );
}
for ( const item of extraction.lineItems) {
const expectedTotal = item.quantity * item.unitPrice;
if (Math. abs (item.total - expectedTotal) > 0.01 ) {
score -= 0.15 ;
indicators. push ( "line_item_total_mismatch" );
break ;
}
}
const clampedScore = Math. max ( 0 , Math. min ( 1 , score));
return {
score: clampedScore,
indicators,
threshold: effectiveThreshold,
};
} Create src/lib/memory-extractor.ts — wraps the REAA MemoryExtractor with the Mistral provider, feeding it the parsed document text as a conversation turn:
import { MemoryExtractor, type ConversationTurn } from "@reaatech/agent-memory-extraction" ;
import type { InvoiceExtraction } from "../types.js" ;
import { InvoiceExtractionSchema } from "../types.js" ;
import { log, error } from "../logger.js" ;
import { createExtractorConfig } from "./llm-adapter.js" ;
import { getConfig } from "../config.js" ;
export async function extractInvoice (
text : string ,
recordSpendFn ?: (params : {
requestId : string ;
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}) => void ,
) : Promise <{ extraction : InvoiceExtraction | null ; confidence : number }> {
try {
const config = getConfig ();
const { llmProvider, embeddingProvider } = createExtractorConfig (config.MISTRAL_API_KEY);
const extractor = new MemoryExtractor (llmProvider, embeddingProvider, {
batchSize: 1 ,
confidenceThreshold: 0.5 ,
enabledTypes: [ "FACT" as never ],
tenantId: "invoice-extraction" ,
});
const turns : ConversationTurn [] = [
{ speaker: "user" , content: text, timestamp: new Date () },
];
const result = await extractor. extractFromConversation (turns);
if (recordSpendFn) {
const inputTokens = Math. ceil (text.length / 4 );
const outputTokens = 200 ;
const inputCost = (inputTokens / 1_000_000 ) * 2.0 ;
const outputCost = (outputTokens / 1_000_000 ) * 6.0 ;
recordSpendFn ({
requestId: `extract-${ String ( Date . now ()) }` ,
model: config.MISTRAL_MODEL,
inputTokens,
outputTokens,
cost: inputCost + outputCost,
});
}
const candidateCount = result.candidates.length;
if (candidateCount > 0 ) {
for ( const candidate of result.candidates) {
try {
const content =
typeof candidate.content === "string"
? candidate.content
: JSON. stringify (candidate.content);
let parsedContent : unknown ;
try {
parsedContent = JSON. parse (content);
} catch {
parsedContent = content;
}
const extraction = InvoiceExtractionSchema. parse (parsedContent);
return { extraction, confidence: result.confidence };
} catch {
continue ;
}
}
}
log ( "[memory-extractor] No parseable candidates found" );
return { extraction: null , confidence: 0 };
} catch (err) {
error ( `[memory-extractor] extraction failed: ${ err instanceof Error ? err . message : String ( err ) }` );
return { extraction: null , confidence: 0 };
}
} Create src/lib/budget.ts — tracks every LLM call via REAA’s SpendStore and throws a BudgetExceededError if the monthly cap is hit:
import { SpendStore } from "@reaatech/agent-budget-spend-tracker" ;
import { BudgetScope } from "@reaatech/agent-budget-types" ;
import { getConfig } from "../config.js" ;
export class BudgetExceededError extends Error {
constructor ( public readonly totalSpend : number , public readonly cap : number ) {
super( `Monthly budget cap reached: $${ totalSpend . toFixed ( 2 ) } exceeded $${ cap . toFixed ( 2 ) }` );
this.name = "BudgetExceededError" ;
}
}
const store = new SpendStore ({ maxEntries: 10_000 });
export function getSpendStore () : SpendStore {
return store;
}
export function recordSpend (params : {
requestId : string ;
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}) : void {
const config = getConfig ();
store. record ({
requestId: params.requestId,
scopeType: BudgetScope.Org,
scopeKey: "invoice-extraction" ,
cost: params.cost,
inputTokens: params.inputTokens,
outputTokens: params.outputTokens,
modelId: params.model,
provider: "mistral" ,
timestamp: new Date (),
});
const totalSpend = store. getSpend (BudgetScope.Org, "invoice-extraction" );
if (totalSpend > config.BUDGET_CAP_MONTHLY_USD) {
throw new BudgetExceededError (totalSpend, config.BUDGET_CAP_MONTHLY_USD);
}
}
export function getSpendSummary () : {
totalSpend : number ;
ratePerMinute : number ;
projectedHourly : number ;
} {
const totalSpend = store. getSpend (BudgetScope.Org, "invoice-extraction" );
const ratePerMinute = store. getRate (BudgetScope.Org, "invoice-extraction" , 60 );
const projectedHourly = store. projectTotal (BudgetScope.Org, "invoice-extraction" , 1 );
return { totalSpend, ratePerMinute, projectedHourly };
}
export { BudgetScope }; Create src/lib/pending-store.ts — an in-memory store for invoices awaiting human review:
import type { PendingEntry } from "../types.js" ;
const pendingStore = new Map < string , PendingEntry >();
export function addPending (entry : PendingEntry ) : void {
pendingStore. set (entry.fileId, entry);
}
export function getPending (fileId : string ) : PendingEntry | undefined {
return pendingStore. get (fileId);
}
export function listPending () : Array <{
fileId : string ;
vendor : string ;
total : number ;
confidence : Record < string , unknown >;
submittedAt : Date ;
}> {
return Array. from (pendingStore. values ()). map ((entry) => ({
fileId: entry.fileId,
vendor: entry.extraction.vendor,
total: entry.extraction.totalAmount,
confidence: {
score: entry.confidence.score,
indicators: entry.confidence.indicators,
threshold: entry.confidence.threshold,
},
submittedAt: entry.submittedAt,
}));
}
export function removePending (fileId : string ) : boolean {
return pendingStore. delete (fileId);
} Create src/lib/handoff.ts — sets up a HandoffManager from the REAA handoff protocol, registers a human-review agent, and provides a queueForHumanReview function:
import { HandoffManager, createHandoffConfig, TransportFactory, A2ATransport } from "@reaatech/agent-handoff-protocol" ;
import type { ConfidenceTooLow, ConversationState, UserMetadata, Message, AgentCapabilities } from "@reaatech/agent-handoff" ;
import type { InvoiceExtraction, ConfidenceResult } from "../types.js" ;
import { log, warn, error } from "../logger.js" ;
export type HttpClient = {
get (url : string ) : Promise <{ data : unknown }>;
post (url : string , body : unknown ) : Promise <{ data : unknown }>;
};
export function
Step 8: Wire up the invoice processor Create src/lib/invoice-processor.ts — the orchestrator that ties together document parsing, extraction, confidence scoring, handoff, and budget tracking:
import type { ParsedDocument, InvoiceResult } from "../types.js" ;
import { log, error } from "../logger.js" ;
import { parseDocument } from "./document-parser.js" ;
import { extractInvoice } from "./memory-extractor.js" ;
import { assessConfidence } from "./confidence.js" ;
import { queueForHumanReview } from "./handoff.js" ;
import { recordSpend, BudgetExceededError } from "./budget.js" ;
import { addPending } from "./pending-store.js" ;
import { getConfig } from "../config.js" ;
export async function processInvoice (
fileBuffer : Buffer ,
mimeType : string ,
fileId : string ,
) : Promise < InvoiceResult > {
log ( `[invoice-processor] start fileId=${ fileId }` );
try {
const parsed : ParsedDocument = await parseDocument (fileBuffer, mimeType);
const { extraction } = await extractInvoice (parsed.text, recordSpend);
if (extraction === null ) {
const result : InvoiceResult = {
fileId,
extraction: null ,
confidence: {
score: 0 ,
indicators: [ "extraction_failed" ],
threshold: getConfig ().CONFIDENCE_THRESHOLD,
},
reviewRequired: true ,
};
log ( `[invoice-processor] complete fileId=${ fileId } (extraction failed)` );
return result;
}
const confidence = assessConfidence (extraction);
if (confidence.score < confidence.threshold) {
await queueForHumanReview (extraction, parsed.text, fileId, confidence);
addPending ({
fileId,
extraction,
originalText: parsed.text,
confidence,
submittedAt: new Date (),
});
}
const result : InvoiceResult = {
fileId,
extraction,
confidence,
reviewRequired: confidence.score < confidence.threshold,
};
log ( `[invoice-processor] complete fileId=${ fileId }` );
return result;
} catch (err) {
if (err instanceof BudgetExceededError ) {
error ( `[invoice-processor] budget exceeded for fileId=${ fileId }: ${ err . message }` );
return {
fileId,
extraction: null ,
confidence: {
score: 0 ,
indicators: [ "budget_exceeded" ],
threshold: getConfig ().CONFIDENCE_THRESHOLD,
},
reviewRequired: true ,
};
}
error (
`[invoice-processor] error fileId=${ fileId }: ${ err instanceof Error ? err . message : String ( err ) }` ,
);
return {
fileId,
extraction: null ,
confidence: {
score: 0 ,
indicators: [ "processing_error" ],
threshold: getConfig ().CONFIDENCE_THRESHOLD,
},
reviewRequired: true ,
};
}
}
Step 9: Create API routes Create src/api/webhook.ts — accepts file uploads via POST /api/invoices/webhook with multer, validates the MIME type, kicks off async processing, and exposes a polling endpoint at GET /api/invoices/:fileId:
import { Router, type Request, type Response } from "express" ;
import multer from "multer" ;
import { randomUUID } from "node:crypto" ;
import { processInvoice } from "../lib/invoice-processor.js" ;
import { log, error } from "../logger.js" ;
import type { InvoiceResult } from "../types.js" ;
const upload = multer ({
storage: multer. memoryStorage (),
limits: { fileSize: 10 * 1024 * 1024 },
fileFilter : (_req, file, cb) => {
const allowed = [
"application/pdf" ,
"image/png" ,
"image/jpeg" ,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document" ,
];
if (allowed. includes (file.mimetype)) {
cb ( null , true );
} else {
cb ( new Error ( `Unsupported MIME type: ${ file . mimetype }` ));
}
},
});
const webhookRouter = Router ();
const processingResults = new Map < string , InvoiceResult | "processing" >();
webhookRouter. post ( "/webhook" , (req : Request , res : Response ) => {
upload. single ( "file" )(req, res, (err) => {
if (err) {
if (err instanceof Error ) {
if ( "code" in err && String (err.code) === "LIMIT_FILE_SIZE" ) {
res. status ( 413 ). json ({ error: "file_too_large" });
return ;
}
const msg = err.message;
if (msg. startsWith ( "Unsupported MIME type" )) {
const mimeType = msg. replace ( "Unsupported MIME type: " , "" );
res. status ( 415 ). json ({ error: "unsupported_type" , mimeType });
return ;
}
}
res. status ( 400 ). json ({ error: "upload_error" , message: err instanceof Error ? err.message : String (err) });
return ;
}
const file = req.file;
if ( ! file) {
res. status ( 400 ). json ({ error: "no_file" });
return ;
}
const fileId = randomUUID ();
processingResults. set (fileId, "processing" );
log ( `[webhook] received file fileId=${ fileId } mimeType=${ file . mimetype }` );
res. status ( 202 ). json ({ fileId, statusUrl: `/api/invoices/${ fileId }` });
setImmediate (() => {
void processInvoice (file.buffer, file.mimetype, fileId)
. then ((result) => {
processingResults. set (fileId, result);
})
. catch ((processingErr : unknown ) => {
error (
`[webhook] processing error fileId=${ fileId }: ${ processingErr instanceof Error ? processingErr . message : String ( processingErr ) }` ,
);
processingResults. set (fileId, {
fileId,
extraction: null ,
confidence: { score: 0 , indicators: [ "processing_error" ], threshold: 0.7 },
reviewRequired: true ,
});
});
});
});
});
webhookRouter. get ( "/:fileId" , (req : Request , res : Response ) => {
const fileId = req.params[ "fileId" ] as string ;
const result = processingResults. get (fileId);
if ( ! result) {
res. status ( 404 ). json ({ error: "not_found" });
return ;
}
if (result === "processing" ) {
res. status ( 202 ). json ({ status: "processing" });
return ;
}
res. json (result);
});
export default webhookRouter; Create src/api/invoices.ts — the pending review list and confirmation endpoints:
import { Router, type Request, type Response } from "express" ;
import { listPending, getPending, removePending } from "../lib/pending-store.js" ;
import { log } from "../logger.js" ;
const invoicesRouter = Router ();
// GET /api/invoices/pending
invoicesRouter. get ( "/pending" , (_req : Request , res : Response ) => {
const pending = listPending ();
res. json (pending);
});
// POST /api/invoices/:fileId/confirm
invoicesRouter. post ( "/:fileId/confirm" , (req : Request , res : Response ) => {
const fileId = req.params[ "fileId" ] as string ;
const entry = getPending (fileId);
if ( ! entry) {
res. status ( 404 ). json ({ error: "not_found" });
return ;
}
removePending (fileId);
log ( `[invoices] confirmed fileId=${ fileId }` );
res. json ({ status: "confirmed" });
});
export default invoicesRouter;
Step 10: Create the main server Create src/index.ts — this is the Express entry point. It mounts the routers, adds a global error handler that distinguishes Zod validation errors from config errors, and wires up graceful shutdown on SIGTERM/SIGINT:
import express, { type Request, type Response, type NextFunction } from "express" ;
import { getConfig, ConfigError } from "./config.js" ;
import { log, error } from "./logger.js" ;
import webhookRouter from "./api/webhook.js" ;
import invoicesRouter from "./api/invoices.js" ;
import { ZodError } from "zod" ;
const app = express ();
app. use (express. json ());
app. get ( "/health" , (_req : Request , res : Response ) => {
res. json ({ status: "ok" , uptime: process. uptime () });
});
app. use ( "/api/invoices" , invoicesRouter);
app. use ( "/api/invoices" , webhookRouter);
function globalErrorHandler (
err : Error ,
_req : Request ,
res : Response ,
_next : NextFunction ,
) : void {
void _next;
if (err instanceof ZodError ) {
res. status ( 400 ). json ({
error: "validation_error" ,
details: err.issues,
});
return ;
}
if (err instanceof ConfigError ) {
res. status ( 500 ). json ({ error: "config_error" });
return ;
}
error ( `Unhandled error: ${ err . message }` );
res. status ( 500 ). json ({ error: "internal_error" });
}
app. use (globalErrorHandler);
function start () : void {
const config = getConfig ();
const server = app. listen (config.PORT, () => {
log ( `Server listening on port ${ String ( config . PORT ) }` );
const shutdown = (signal : string ) : void => {
log ( `Received ${ signal }, shutting down...` );
server. close (() => {
log ( "Server closed" );
process. exit ( 0 );
});
};
process. on ( "SIGTERM" , () => { shutdown ( "SIGTERM" ); });
process. on ( "SIGINT" , () => { shutdown ( "SIGINT" ); });
});
}
export { app, start, globalErrorHandler }; Expected output: The server uses getConfig() validation at startup, so if you’re missing any required env vars in .env, it will throw a ConfigError and refuse to start.
Step 11: Write and run the tests Create tests/app.test.ts — this file imports your Express app, mocks processInvoice and pending-store, and tests the full request/response cycle with supertest:
import { describe, it, expect, vi, beforeAll } from "vitest" ;
import request from "supertest" ;
import type { Express } from "express" ;
process.env.MISTRAL_API_KEY = "k" ;
process.env.LLAMA_CLOUD_API_KEY = "k" ;
process.env.HANDOFF_RECIPIENT_URL = "https://review.example.com" ;
process.env.CONFIDENCE_THRESHOLD = "0.7" ;
process.env.MISTRAL_MODEL = "mistral-large-latest" ;
process.env.BUDGET_CAP_MONTHLY_USD = "100" ;
process.env.PORT = "0" ;
const mockProcessInvoice = vi. fn (). mockResolvedValue ({
fileId:
Expected output: Vitest runs all test files. You should see all tests passing with a coverage summary near 90% across lines, branches, functions, and statements. The command also writes a machine-readable vitest-report.json to disk.
You can also run type-checking and linting:
Step 12: Run the server and try it The start() function in src/index.ts is exported but not automatically called — this lets tests import app without starting the server. To run it, append a start() call at the end:
echo -e "\nstart();" >> src/index.ts Now start the server with tsx:
[2026-05-13T19:00:00.000Z] INFO: Server listening on port 3000
In another terminal, upload an invoice:
curl -X POST http://localhost:3000/api/invoices/webhook \
-F "file=@/path/to/sample-invoice.pdf" Expected output: A 202 response with a fileId and a statusUrl:
{ "fileId" : "a1b2c3d4-..." , "statusUrl" : "/api/invoices/a1b2c3d4-..." } curl http://localhost:3000/api/invoices/a1b2c3d4-... Expected output: Once processing completes, a 200 response with the extracted vendor, totalAmount, lineItems, and a confidence score.
Check the pending review queue:
curl http://localhost:3000/api/invoices/pending
Next steps
Add a database (PostgreSQL or SQLite) to replace the in-memory processingResults map and pendingStore so data survives server restarts
Replace the @llamaindex/cloud dependency with llama-cloud-services for production image parsing
Add authentication middleware so the webhook isn’t open to the public internet
Build a frontend dashboard that lists pending reviews and lets an accountant confirm or correct extractions in one click
}
}
async function retry < T >(
fn : () => Promise < T >,
options : { maxAttempts : number ; baseDelayMs : number },
) : Promise < T > {
let lastError : Error | undefined ;
for ( let attempt = 1 ; attempt <= options.maxAttempts; attempt ++ ) {
try {
return await fn ();
} catch (err) {
const errorMessage = err instanceof Error ? err.message : String (err);
lastError = err instanceof Error ? err : new Error (errorMessage);
if (attempt < options.maxAttempts) {
const delay = options.baseDelayMs * Math. pow ( 2 , attempt - 1 );
warn ( `[retry] attempt ${ String ( attempt ) } failed, retrying in ${ String ( delay ) }ms` );
await new Promise ((resolve) => setTimeout (resolve, delay));
}
}
}
throw lastError ?? new Error ( "retry exhausted" );
}
export async function parsePdf (buffer : Buffer ) : Promise < ParsedDocument > {
try {
const uint8 = new Uint8Array (buffer);
const pdfProxy : Record < string , unknown > = await getDocumentProxy (uint8);
const result = await extractText (pdfProxy, { mergePages: true });
return {
text: result.text,
sourceType: "pdf" ,
pageCount: result.totalPages,
};
} catch (cause) {
throw new Error (
`PDF parsing failed: ${ cause instanceof Error ? cause . message : String ( cause ) }` ,
{ cause },
);
}
}
export async function parseImage (buffer : Buffer ) : Promise < ParsedDocument > {
try {
const pngBuffer = await sharp (buffer). toFormat ( "png" ). toBuffer ();
const { LlamaParseReader } = await import ( "@llamaindex/cloud" );
const reader = new LlamaParseReader ({
apiKey: process.env.LLAMA_CLOUD_API_KEY ?? "" ,
resultType: "markdown" ,
});
const documents = await retry (
() => reader. loadDataAsContent ( new Uint8Array (pngBuffer), "invoice.png" ),
{ maxAttempts: 3 , baseDelayMs: 1000 },
);
const text = documents
. map ((d) => d.text)
. join ( "\n" );
return { text, sourceType: "image" };
} catch (cause) {
throw new Error (
`Image parsing failed: ${ cause instanceof Error ? cause . message : String ( cause ) }` ,
{ cause },
);
}
}
export async function parseDocx (buffer : Buffer ) : Promise < ParsedDocument > {
try {
const result = await mammoth. extractRawText ({ buffer });
return { text: result.value, sourceType: "docx" };
} catch (cause) {
throw new Error (
`DOCX parsing failed: ${ cause instanceof Error ? cause . message : String ( cause ) }` ,
{ cause },
);
}
}
const mimeTypeMap : Record < string , (buffer : Buffer ) => Promise < ParsedDocument >> = {
"application/pdf" : parsePdf,
"image/png" : parseImage,
"image/jpeg" : parseImage,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document" : parseDocx,
};
export async function parseDocument (buffer : Buffer , mimeType : string ) : Promise < ParsedDocument > {
const parser = mimeTypeMap[mimeType. toLowerCase ()];
if ( ! parser) {
throw new UnsupportedDocumentError (mimeType);
}
log ( `[parseDocument] dispatching mimeType=${ mimeType }` );
const result = await parser (buffer);
log ( `[parseDocument] done result length=${ String ( result . text .length) }` );
return result;
}
()
:
Mistral
{
const config = getConfig ();
return new Mistral ({ apiKey: config.MISTRAL_API_KEY });
}
async function retryWithBackoff < T >(
fn : () => Promise < T >,
maxAttempts : number ,
baseDelayMs : number ,
) : Promise < T > {
let lastError : Error | undefined ;
for ( let attempt = 1 ; attempt <= maxAttempts; attempt ++ ) {
try {
return await fn ();
} catch (err) {
lastError = err instanceof Error ? err : new Error ( String (err));
if (err instanceof AuthError ) throw err;
if (attempt < maxAttempts) {
const delay = baseDelayMs * Math. pow ( 2 , attempt - 1 );
warn ( `[mistral-client] attempt ${ String ( attempt ) } failed, retrying in ${ String ( delay ) }ms` );
await new Promise ((resolve) => setTimeout (resolve, delay));
}
}
}
throw lastError ?? new ProviderError ( "max retries exceeded" );
}
export interface ChatCompleteResult < T > {
data : T ;
usage : {
inputTokens : number ;
outputTokens : number ;
};
}
export async function chatComplete < T >(
prompt : string ,
parseFn : (raw : string ) => T ,
recordSpendFn ?: (params : {
requestId : string ;
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}) => void ,
) : Promise < ChatCompleteResult < T >> {
const config = getConfig ();
const model = config.MISTRAL_MODEL;
const client = createMistralClient ();
const requestId = `mistral-${ String ( Date . now ()) }-${ Math . random (). toString ( 36 ). slice ( 2 , 9 ) }` ;
try {
const result = await retryWithBackoff ( async () => {
const response = await client.chat. complete ({
model,
messages: [{ role: "user" , content: prompt }],
});
const choice = response.choices[ 0 ];
if ( ! choice?.message) {
throw new ProviderError ( "No response from Mistral" );
}
const content = choice.message.content;
const contentStr = typeof content === "string" ? content : "" ;
const usage = {
inputTokens: Number (response.usage.prompt_tokens ?? 0 ),
outputTokens: Number (response.usage.completion_tokens ?? 0 ),
};
return { data: parseFn (contentStr), usage };
}, 3 , 1000 );
if (recordSpendFn) {
const inputTokens = result.usage.inputTokens;
const outputTokens = result.usage.outputTokens;
const inputCost = (inputTokens / 1_000_000 ) * 2.0 ;
const outputCost = (outputTokens / 1_000_000 ) * 6.0 ;
recordSpendFn ({
requestId,
model,
inputTokens,
outputTokens,
cost: inputCost + outputCost,
});
}
return result;
} catch (err) {
if (err instanceof AuthError || err instanceof ProviderError ) throw err;
const msg = err instanceof Error ? err.message : String (err);
const msgLower = msg. toLowerCase ();
if (msgLower. includes ( "401" ) || msgLower. includes ( "unauthorized" )) {
throw new AuthError ( "Invalid Mistral API key" );
}
if (msgLower. includes ( "429" ) || msgLower. includes ( "rate limit" ) || msgLower. includes ( "too many requests" )) {
throw new ProviderError ( `Rate limited: ${ msg }` );
}
if (msg. includes ( "5" ) && (msg. includes ( "50" ) || msgLower. includes ( "internal" ))) {
throw new ProviderError ( `Mistral API error: ${ msg }` );
}
throw new ProviderError ( `Mistral client error: ${ msg }` );
}
}
// --- LLM Provider Adapter for REAA MemoryExtractor ---
export interface LLMProvider {
complete (prompt : string ) : Promise < string >;
completeStructured < T >(prompt : string , schema : object ) : Promise < T >;
}
export interface EmbeddingProvider {
embed (text : string ) : Promise < number []>;
embedBatch (texts : string []) : Promise < number [][]>;
getModelInfo () : { name : string ; dimensions : number ; maxInputLength : number };
}
export class MistralLLMProvider implements LLMProvider {
private readonly client : Mistral ;
private readonly model : string ;
constructor (options : { apiKey : string ; model : string }) {
this.client = new Mistral ({ apiKey: options.apiKey });
this.model = options.model;
}
async complete (prompt : string ) : Promise < string > {
const response = await this.client.chat. complete ({
model: this.model,
messages: [{ role: "user" , content: prompt }],
});
const content = response.choices[ 0 ]?.message?.content;
return typeof content === "string" ? content : "" ;
}
async completeStructured < T >(prompt : string , schema : object ) : Promise < T > {
void schema;
const response = await this.client.chat. complete ({
model: this.model,
messages: [{ role: "user" , content: prompt }],
});
const content = response.choices[ 0 ]?.message?.content;
const contentStr = typeof content === "string" ? content : "" ;
try {
return JSON. parse (contentStr) as T ;
} catch {
return {} as T ;
}
}
}
export class NoOpEmbeddingProvider implements EmbeddingProvider {
embed (text : string ) : Promise < number []> {
void text;
return Promise . resolve ([]);
}
embedBatch (texts : string []) : Promise < number [][]> {
return Promise . resolve (texts. map (() => []));
}
getModelInfo () : { name : string ; dimensions : number ; maxInputLength : number } {
return { name: "noop" , dimensions: 0 , maxInputLength: 0 };
}
}
export function createExtractorConfig (apiKey : string ) : {
llmProvider : LLMProvider ;
embeddingProvider : EmbeddingProvider ;
} {
return {
llmProvider: new MistralLLMProvider ({
apiKey,
model: "mistral-large-latest" ,
}),
embeddingProvider: new NoOpEmbeddingProvider (),
};
}
export { Mistral };
T
>,
maxAttempts : number ,
baseDelayMs : number ,
) : Promise < T > {
let lastError : Error | undefined ;
for ( let attempt = 1 ; attempt <= maxAttempts; attempt ++ ) {
try {
return await fn ();
} catch (err) {
lastError = err instanceof Error ? err : new Error ( String (err));
if (err instanceof AuthError ) throw err;
if (attempt < maxAttempts) {
const delay = baseDelayMs * Math. pow ( 2 , attempt - 1 );
warn ( `[mistral-client] attempt ${ String ( attempt ) } failed, retrying in ${ String ( delay ) }ms` );
await new Promise ((resolve) => setTimeout (resolve, delay));
}
}
}
throw lastError ?? new ProviderError ( "max retries exceeded" );
}
function buildToolsFromSchema (schema : z . ZodType ) : Array <{
type : "function" ;
function : {
name : string ;
description : string ;
parameters : Record < string , unknown >;
};
}> {
const description = schema.description ?? "Extract structured data" ;
return [
{
type: "function" as const ,
function: {
name: "extract_data" ,
description,
parameters: { type: "object" , properties: {} },
},
},
];
}
function extractApiKeyFromMessage (msg : string ) : boolean {
const config = getConfig ();
const key = config.MISTRAL_API_KEY;
return key.length > 0 && msg. includes (key);
}
function sanitizeErrorMessage (msg : string ) : string {
const config = getConfig ();
const key = config.MISTRAL_API_KEY;
if (key.length > 0 && msg. includes (key)) {
return msg. replaceAll (key, "[REDACTED]" );
}
return msg;
}
export interface ChatCompleteResult < T > {
data : T ;
usage : {
inputTokens : number ;
outputTokens : number ;
};
}
export async function chatComplete < T >(
prompt : string ,
schema : z . ZodType < T >,
recordSpendFn ?: (params : {
requestId : string ;
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}) => void ,
) : Promise < ChatCompleteResult < T >> {
const config = getConfig ();
const model = config.MISTRAL_MODEL;
const client = createMistralClient ();
const requestId = `mistral-${ String ( Date . now ()) }-${ Math . random (). toString ( 36 ). slice ( 2 , 9 ) }` ;
try {
const tools = buildToolsFromSchema (schema);
const result = await retryWithBackoff ( async () => {
const response = await client.chat. complete ({
model,
messages: [{ role: "user" , content: prompt }],
tools,
});
const choice = response.choices[ 0 ];
if ( ! choice?.message) {
throw new ProviderError ( "No response from Mistral" );
}
const toolCalls = choice.message.toolCalls;
const rawContent =
typeof choice.message.content === "string" ? choice.message.content : "" ;
let parsed : T ;
if (toolCalls && toolCalls.length > 0 ) {
const firstCall = toolCalls[ 0 ];
if ( ! firstCall) {
throw new ProviderError ( "Empty tool calls array" );
}
const fnArgs = firstCall.function.arguments;
const argsRaw =
typeof fnArgs === "string"
? fnArgs
: JSON. stringify (fnArgs);
const parsedArgs = JSON. parse (argsRaw) as unknown ;
parsed = schema. parse (parsedArgs);
} else if (rawContent.length > 0 ) {
const parsedJson = JSON. parse (rawContent) as unknown ;
parsed = schema. parse (parsedJson);
} else {
throw new ProviderError ( "No tool calls or content in response" );
}
const usage = {
inputTokens: Number (response.usage.prompt_tokens ?? 0 ),
outputTokens: Number (response.usage.completion_tokens ?? 0 ),
};
return { data: parsed, usage };
}, 3 , 1000 );
if (recordSpendFn) {
const inputTokens = result.usage.inputTokens;
const outputTokens = result.usage.outputTokens;
const inputCost = (inputTokens / 1_000_000 ) * 2.0 ;
const outputCost = (outputTokens / 1_000_000 ) * 6.0 ;
recordSpendFn ({
requestId,
model,
inputTokens,
outputTokens,
cost: inputCost + outputCost,
});
}
return result;
} catch (err) {
if (err instanceof AuthError || err instanceof ProviderError ) throw err;
const msg = err instanceof Error ? err.message : String (err);
const sanitized = sanitizeErrorMessage (msg);
const msgLower = sanitized. toLowerCase ();
if ( extractApiKeyFromMessage (msg)) {
logError ( "[mistral-client] API key leaked in error message - redacting" );
}
if (msgLower. includes ( "401" ) || msgLower. includes ( "unauthorized" )) {
throw new AuthError ( "Invalid Mistral API key" );
}
if (msgLower. includes ( "429" ) || msgLower. includes ( "rate limit" ) || msgLower. includes ( "too many requests" )) {
throw new ProviderError ( "Rate limited by Mistral API" );
}
if (msg. includes ( "5" ) && (msg. includes ( "50" ) || msgLower. includes ( "internal" ))) {
throw new ProviderError ( `Mistral API error: ${ sanitized }` );
}
throw new ProviderError ( `Mistral client error: ${ sanitized }` );
}
}
export async function chatCompleteJsonFallback < T >(
prompt : string ,
schema : z . ZodType < T >,
recordSpendFn ?: (params : {
requestId : string ;
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}) => void ,
) : Promise < ChatCompleteResult < T >> {
try {
return await chatComplete (prompt, schema, recordSpendFn);
} catch {
log ( "[mistral-client] Tool-call failed, falling back to JSON mode" );
const config = getConfig ();
const model = config.MISTRAL_MODEL;
const client = createMistralClient ();
const requestId = `mistral-${ String ( Date . now ()) }-${ Math . random (). toString ( 36 ). slice ( 2 , 9 ) }` ;
const fallbackResult = await retryWithBackoff ( async () => {
const response = await client.chat. complete ({
model,
messages: [
{
role: "user" ,
content: `${ prompt }\n\nRespond with JSON only.` ,
},
],
responseFormat: { type: "json_object" },
});
const choice = response.choices[ 0 ];
if ( ! choice?.message) {
throw new ProviderError ( "No response from Mistral on fallback" );
}
const rawContent =
typeof choice.message.content === "string" ? choice.message.content : "" ;
const parsedJson = JSON. parse (rawContent) as unknown ;
const data = schema. parse (parsedJson);
const usage = {
inputTokens: Number (response.usage.prompt_tokens ?? 0 ),
outputTokens: Number (response.usage.completion_tokens ?? 0 ),
};
return { data, usage };
}, 3 , 1000 );
if (recordSpendFn) {
const inputTokens = fallbackResult.usage.inputTokens;
const outputTokens = fallbackResult.usage.outputTokens;
const inputCost = (inputTokens / 1_000_000 ) * 2.0 ;
const outputCost = (outputTokens / 1_000_000 ) * 6.0 ;
recordSpendFn ({
requestId,
model,
inputTokens,
outputTokens,
cost: inputCost + outputCost,
});
}
return fallbackResult;
}
}
createHttpClient
()
:
HttpClient
{
return {
async get (url : string ) {
const response = await fetch (url, { method: "GET" , headers: { "Content-Type" : "application/json" } });
return { data: await response. json () as unknown };
},
async post (url : string , body : unknown ) {
const response = await fetch (url, { method: "POST" , headers: { "Content-Type" : "application/json" }, body: JSON. stringify (body) });
return { data: await response. json () as unknown };
},
};
}
function createManager () : HandoffManager {
const httpClient = createHttpClient ();
const transport = new A2ATransport (httpClient);
const transportFactory = new TransportFactory ([transport]);
const handoffConfig = createHandoffConfig ({
routing: { minConfidenceThreshold: 0.6 , policy: "best_effort" },
});
const manager = new HandoffManager (handoffConfig, {
transportFactory,
});
const humanReviewAgent : AgentCapabilities = {
agentId: "human-review" ,
agentName: "Human Invoice Reviewer" ,
skills: [ "invoice-review" ],
domains: [ "accounting" ],
maxConcurrentSessions: 10 ,
currentLoad: 0 ,
languages: [ "en" ],
specializations: [],
availability: "available" ,
version: "1.0.0" ,
metadata: { endpoint: process.env.HANDOFF_RECIPIENT_URL ?? "" },
};
manager. registerAgent (humanReviewAgent);
manager. on ( "handoffStart" , ({ handoffId, trigger }) => {
log ( `[handoff] ${ handoffId } started (${ trigger . type })` );
});
manager. on ( "handoffComplete" , ({ handoffId, duration }) => {
log ( `[handoff] ${ handoffId } complete (${ String ( duration ) }ms)` );
});
manager. on ( "handoffReject" , ({ handoffId, reason }) => {
warn ( `[handoff] ${ handoffId } rejected: ${ reason ?? "unknown"}` );
});
manager. on ( "handoffError" , ({ handoffId, error: err }) => {
error ( `[handoff] ${ handoffId } error: ${ err . message }` );
});
return manager;
}
let handoffManagerInstance : HandoffManager | undefined ;
export function getHandoffManager () : HandoffManager {
if ( ! handoffManagerInstance) {
handoffManagerInstance = createManager ();
}
return handoffManagerInstance;
}
export async function queueForHumanReview (
extraction : InvoiceExtraction ,
originalText : string ,
fileId : string ,
confidence : ConfidenceResult ,
) : Promise < void > {
try {
const manager = getHandoffManager ();
const messages : Message [] = [
{ id: "m1" , role: "user" , content: originalText, timestamp: new Date () },
];
const userMetadata : UserMetadata = {
userId: "system" ,
language: "en" ,
};
const state : ConversationState = {
resolvedEntities: {},
openQuestions: [],
contextVariables: {
extraction,
fileId,
confidence,
},
};
const trigger : ConfidenceTooLow = {
type: "confidence_too_low" ,
currentConfidence: confidence.score,
threshold: confidence.threshold,
message: "Invoice extraction confidence below threshold - human review needed" ,
};
const result = await manager. executeHandoff ({
sessionId: fileId,
conversationId: fileId,
messages,
trigger,
userMetadata,
state,
availableAgents: [],
});
if ( ! result.success) {
warn ( `[handoff] queueForHumanReview handoff not accepted for fileId=${ fileId }` );
} else {
log ( `[handoff] queued fileId=${ fileId } for human review` );
}
} catch (err) {
error (
`[handoff] queueForHumanReview failed for fileId=${ fileId }: ${ err instanceof Error ? err . message : String ( err ) }` ,
);
}
}
"mock-uuid"
,
extraction: {
vendor: "TestCorp" ,
invoiceDate: "2025-04-01" ,
totalAmount: 1200 ,
lineItems: [{ description: "Svc" , quantity: 1 , unitPrice: 1200 , total: 1200 }],
currency: "USD" ,
},
confidence: { score: 0.95 , indicators: [], threshold: 0.7 },
reviewRequired: false ,
});
vi. mock ( "../src/lib/invoice-processor.js" , () => ({
processInvoice: mockProcessInvoice,
}));
vi. mock ( "../src/lib/pending-store.js" , () => ({
addPending: vi. fn (),
getPending: vi. fn (),
listPending: vi. fn (). mockReturnValue ([]),
removePending: vi. fn (). mockReturnValue ( true ),
}));
describe ( "Express App (index)" , () => {
let app : Express ;
beforeAll ( async () => {
vi. resetModules ();
const index = await import ( "../src/index.js" );
app = index.app;
});
it ( "GET /health returns ok with uptime" , async () => {
const res = await request (app). get ( "/health" );
expect (res.status). toBe ( 200 );
expect (res.body.status). toBe ( "ok" );
expect ( typeof res.body.uptime). toBe ( "number" );
});
it ( "POST /api/invoices/webhook with valid file returns 202" , async () => {
const res = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "fake-pdf" ), {
filename: "invoice.pdf" ,
contentType: "application/pdf" ,
});
expect (res.status). toBe ( 202 );
expect (res.body.fileId). toBeDefined ();
expect (res.body.statusUrl). toBeDefined ();
});
it ( "POST /api/invoices/webhook without file returns 400" , async () => {
const res = await request (app). post ( "/api/invoices/webhook" );
expect (res.status). toBe ( 400 );
expect (res.body.error). toBe ( "no_file" );
});
it ( "POST /api/invoices/webhook with unsupported type returns 415" , async () => {
const res = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "plain" ), {
filename: "test.txt" ,
contentType: "text/plain" ,
});
expect (res.status). toBe ( 415 );
expect (res.body.error). toBe ( "unsupported_type" );
expect (res.body.mimeType). toBe ( "text/plain" );
});
it ( "GET /api/invoices/:fileId returns processing result" , async () => {
const uploadRes = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "pdf" ), {
filename: "inv.pdf" ,
contentType: "application/pdf" ,
});
const fileId = uploadRes.body.fileId;
await new Promise ((r) => setTimeout (r, 100 ));
const res = await request (app). get ( `/api/invoices/${ fileId }` );
expect (res.status). toBe ( 200 );
expect (res.body.fileId). toBeDefined ();
});
it ( "GET /api/invoices/:fileId for unknown returns 404" , async () => {
const res = await request (app). get ( "/api/invoices/unknown-id" );
expect (res.status). toBe ( 404 );
});
it ( "GET /api/invoices/pending returns list" , async () => {
const res = await request (app). get ( "/api/invoices/pending" );
expect (res.status). toBe ( 200 );
expect (Array. isArray (res.body)). toBe ( true );
});
it ( "POST /api/invoices/:fileId/confirm for missing returns 404" , async () => {
const res = await request (app). post ( "/api/invoices/nonexistent/confirm" );
expect (res.status). toBe ( 404 );
expect (res.body.error). toBe ( "not_found" );
});
it ( "webhook processing error sets processingResults with error status" , async () => {
mockProcessInvoice. mockRejectedValueOnce ( new Error ( "processing failed" ));
const uploadRes = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "bad" ), {
filename: "bad.pdf" ,
contentType: "application/pdf" ,
});
const fileId = uploadRes.body.fileId;
await new Promise ((r) => setTimeout (r, 100 ));
const res = await request (app). get ( `/api/invoices/${ fileId }` );
expect (res.status). toBe ( 200 );
expect (res.body.extraction). toBeNull ();
expect (res.body.confidence.indicators). toContain ( "processing_error" );
});
it ( "webhook handles string rejection in catch handler" , async () => {
mockProcessInvoice. mockRejectedValueOnce ( "string error message" );
const uploadRes = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "bad" ), {
filename: "bad.pdf" ,
contentType: "application/pdf" ,
});
const fileId = uploadRes.body.fileId;
await new Promise ((r) => setTimeout (r, 100 ));
const res = await request (app). get ( `/api/invoices/${ fileId }` );
expect (res.status). toBe ( 200 );
expect (res.body.confidence.indicators). toContain ( "processing_error" );
});
it ( "webhook GET shows processing status before async completion" , async () => {
mockProcessInvoice. mockImplementationOnce (() => {
return new Promise ((resolve) => setTimeout (() => resolve ({
fileId: "slow-uuid" ,
extraction: null ,
confidence: { score: 0.5 , indicators: [ "test" ], threshold: 0.7 },
reviewRequired: true ,
}), 500 ));
});
const uploadRes = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , Buffer. from ( "pdf" ), {
filename: "slow.pdf" ,
contentType: "application/pdf" ,
});
const fileId = uploadRes.body.fileId;
const res = await request (app). get ( `/api/invoices/${ fileId }` );
expect (res.status). toBe ( 202 );
expect (res.body.status). toBe ( "processing" );
await new Promise ((r) => setTimeout (r, 600 ));
const res2 = await request (app). get ( `/api/invoices/${ fileId }` );
expect (res2.status). toBe ( 200 );
expect (res2.body.fileId). toBeDefined ();
});
it ( "webhook rejects file over 10MB with 413" , async () => {
const bigFile = Buffer. alloc ( 11 * 1024 * 1024 );
const res = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "file" , bigFile, {
filename: "big.pdf" ,
contentType: "application/pdf" ,
});
expect (res.status). toBe ( 413 );
expect (res.body.error). toBe ( "file_too_large" );
});
it ( "webhook rejects mismatched field name" , async () => {
const res = await request (app)
. post ( "/api/invoices/webhook" )
. attach ( "wrongfield" , Buffer. from ( "test" ), {
filename: "test.pdf" ,
contentType: "application/pdf" ,
});
expect (res.status). toBe ( 400 );
});
});