Mistral AI Invoice Extraction for SMB Accounting

Automatically extract vendor, amount, and line items from invoices using Mistral Large, with cost-aware processing.

mistral invoice-extraction document-pipeline ai-accounting llm express budget-tracking human-in-the-loop

The problem

SMB accountants spend hours manually entering invoice data into accounting software. Errors and delays cost money.

Built from

Intro

In this tutorial you’ll build an Express.js server that accepts invoice uploads (PDF, images, DOCX), parses them into text, and extracts structured accounting data — vendor, line items, totals — using Mistral AI’s language model. Low-confidence extractions are automatically routed to a pending review queue for a human to confirm, and every LLM call is tracked against a configurable monthly budget cap so you never get a surprise bill. By the end you’ll have a working API you can test with real invoices.

Prerequisites

Node.js >= 22
pnpm 10.x (the package.json specifies "packageManager": "pnpm@10.0.0")
A Mistral AI API key (sign up at console.mistral.ai)
A LlamaCloud API key (sign up at cloud.llamaindex.ai) — needed only for image-based invoices
Familiarity with TypeScript, Express.js, and async/await

Step 1: Scaffold the project

Start from an empty directory. Create the project structure and configuration files so TypeScript, Vitest, and ESLint are ready before you write any application code.

terminal

mkdir mistral-invoice-extraction && cd mistral-invoice-extraction
pnpm init

Open package.json and replace it with the following, pinning every version exactly:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

115 kB·125 tests·99.7% coverage·vitest passing

SHA-256429147a7d1ee02e682f6f9d3e485fa4409c24580d69fb0563f0b12fb1a8956aa

Book a conversation All solutions

Comments

Loading comments…

import { MemoryExtractor, type ConversationTurn } from "@reaatech/agent-memory-extraction"; import type { InvoiceExtraction } from "../types.js"; import { InvoiceExtractionSchema } from "../types.js"; import { log, error } from "../logger.js"; import { createExtractorConfig } from "./llm-adapter.js"; import { getConfig } from "../config.js"; export async function extractInvoice( text: string, recordSpendFn?: (params: { requestId: string; model: string; inputTokens: number; outputTokens: number; cost: number; }) => void, ): Promise<{ extraction: InvoiceExtraction | null; confidence: number }> { try { const config = getConfig(); const { llmProvider, embeddingProvider } = createExtractorConfig(config.MISTRAL_API_KEY); const extractor = new MemoryExtractor(llmProvider, embeddingProvider, { batchSize: 1, confidenceThreshold: 0.5, enabledTypes: ["FACT" as never], tenantId: "invoice-extraction", }); const turns: ConversationTurn[] = [ { speaker: "user", content: text, timestamp: new Date() }, ]; const result = await extractor.extractFromConversation(turns); if (recordSpendFn) { const inputTokens = Math.ceil(text.length / 4); const outputTokens = 200; const inputCost = (inputTokens / 1_000_000) * 2.0; const outputCost = (outputTokens / 1_000_000) * 6.0; recordSpendFn({ requestId: `extract-${String(Date.now())}`, model: config.MISTRAL_MODEL, inputTokens, outputTokens, cost: inputCost + outputCost, }); } const candidateCount = result.candidates.length; if (candidateCount > 0) { for (const candidate of result.candidates) { try { const content = typeof candidate.content === "string" ? candidate.content : JSON.stringify(candidate.content); let parsedContent: unknown; try { parsedContent = JSON.parse(content); } catch { parsedContent = content; } const extraction = InvoiceExtractionSchema.parse(parsedContent); return { extraction, confidence: result.confidence }; } catch { continue; } } } log("[memory-extractor] No parseable candidates found"); return { extraction: null, confidence: 0 }; } catch (err) { error(`[memory-extractor] extraction failed: ${err instanceof Error ? err.message : String(err)}`); return { extraction: null, confidence: 0 }; } }

import type { ParsedDocument, InvoiceResult } from "../types.js"; import { log, error } from "../logger.js"; import { parseDocument } from "./document-parser.js"; import { extractInvoice } from "./memory-extractor.js"; import { assessConfidence } from "./confidence.js"; import { queueForHumanReview } from "./handoff.js"; import { recordSpend, BudgetExceededError } from "./budget.js"; import { addPending } from "./pending-store.js"; import { getConfig } from "../config.js"; export async function processInvoice( fileBuffer: Buffer, mimeType: string, fileId: string, ): Promise<InvoiceResult> { log(`[invoice-processor] start fileId=${fileId}`); try { const parsed: ParsedDocument = await parseDocument(fileBuffer, mimeType); const { extraction } = await extractInvoice(parsed.text, recordSpend); if (extraction === null) { const result: InvoiceResult = { fileId, extraction: null, confidence: { score: 0, indicators: ["extraction_failed"], threshold: getConfig().CONFIDENCE_THRESHOLD, }, reviewRequired: true, }; log(`[invoice-processor] complete fileId=${fileId} (extraction failed)`); return result; } const confidence = assessConfidence(extraction); if (confidence.score < confidence.threshold) { await queueForHumanReview(extraction, parsed.text, fileId, confidence); addPending({ fileId, extraction, originalText: parsed.text, confidence, submittedAt: new Date(), }); } const result: InvoiceResult = { fileId, extraction, confidence, reviewRequired: confidence.score < confidence.threshold, }; log(`[invoice-processor] complete fileId=${fileId}`); return result; } catch (err) { if (err instanceof BudgetExceededError) { error(`[invoice-processor] budget exceeded for fileId=${fileId}: ${err.message}`); return { fileId, extraction: null, confidence: { score: 0, indicators: ["budget_exceeded"], threshold: getConfig().CONFIDENCE_THRESHOLD, }, reviewRequired: true, }; } error( `[invoice-processor] error fileId=${fileId}: ${err instanceof Error ? err.message : String(err)}`, ); return { fileId, extraction: null, confidence: { score: 0, indicators: ["processing_error"], threshold: getConfig().CONFIDENCE_THRESHOLD, }, reviewRequired: true, }; } }

import { Router, type Request, type Response } from "express"; import multer from "multer"; import { randomUUID } from "node:crypto"; import { processInvoice } from "../lib/invoice-processor.js"; import { log, error } from "../logger.js"; import type { InvoiceResult } from "../types.js"; const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 10 * 1024 * 1024 }, fileFilter: (_req, file, cb) => { const allowed = [ "application/pdf", "image/png", "image/jpeg", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", ]; if (allowed.includes(file.mimetype)) { cb(null, true); } else { cb(new Error(`Unsupported MIME type: ${file.mimetype}`)); } }, }); const webhookRouter = Router(); const processingResults = new Map<string, InvoiceResult | "processing">(); webhookRouter.post("/webhook", (req: Request, res: Response) => { upload.single("file")(req, res, (err) => { if (err) { if (err instanceof Error) { if ("code" in err && String(err.code) === "LIMIT_FILE_SIZE") { res.status(413).json({ error: "file_too_large" }); return; } const msg = err.message; if (msg.startsWith("Unsupported MIME type")) { const mimeType = msg.replace("Unsupported MIME type: ", ""); res.status(415).json({ error: "unsupported_type", mimeType }); return; } } res.status(400).json({ error: "upload_error", message: err instanceof Error ? err.message : String(err) }); return; } const file = req.file; if (!file) { res.status(400).json({ error: "no_file" }); return; } const fileId = randomUUID(); processingResults.set(fileId, "processing"); log(`[webhook] received file fileId=${fileId} mimeType=${file.mimetype}`); res.status(202).json({ fileId, statusUrl: `/api/invoices/${fileId}` }); setImmediate(() => { void processInvoice(file.buffer, file.mimetype, fileId) .then((result) => { processingResults.set(fileId, result); }) .catch((processingErr: unknown) => { error( `[webhook] processing error fileId=${fileId}: ${processingErr instanceof Error ? processingErr.message : String(processingErr)}`, ); processingResults.set(fileId, { fileId, extraction: null, confidence: { score: 0, indicators: ["processing_error"], threshold: 0.7 }, reviewRequired: true, }); }); }); }); }); webhookRouter.get("/:fileId", (req: Request, res: Response) => { const fileId = req.params["fileId"] as string; const result = processingResults.get(fileId); if (!result) { res.status(404).json({ error: "not_found" }); return; } if (result === "processing") { res.status(202).json({ status: "processing" }); return; } res.json(result); }); export default webhookRouter;

Mistral AI Invoice Extraction for SMB Accounting

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project

Step 2: Install dependencies

Step 3: Set environment variables and config

Step 4: Define types and logger

Step 5: Build the document parser

Step 6: Build the Mistral LLM adapter and client

Step 7: Build extraction and review libraries

Step 8: Wire up the invoice processor

Step 9: Create API routes

Step 10: Create the main server

Step 11: Write and run the tests

Step 12: Run the server and try it

Next steps