Mistral AI Document Pipeline for Xero Expense Report Processing

Automatically extract line items from receipts and invoices, categorize expenses, and push them into Xero for seamless expense reporting.

mistral document-pipeline xero expense-reporting nextjs receipt-extraction structured-repair budget-enforcement

The problem

Small businesses spend hours manually entering receipt data into Xero, leading to data entry errors, delayed reimbursements, and lost tax deductions.

Built from

Intro

This tutorial walks you through building a document processing pipeline that extracts line items from PDF receipts and Excel expense sheets using Mistral AI, repairs malformed JSON output with a six-strategy repair engine, enforces daily spend budgets, and pushes validated expense data into Xero as ACCREC invoices. You’ll build each layer from scratch — schemas, extractors, planners, parsers, telemetry, budget enforcement, and the Next.js API routes that tie them together.

Prerequisites

Node.js 22+ and pnpm 10 installed
A Mistral AI API key — get one at https://console.mistral.ai
A Xero custom connection app with client ID and client secret (or leave those placeholders if you only want to test the pipeline without pushing to Xero)
Basic familiarity with TypeScript, Next.js App Router, and Zod schemas

Step 1: Scaffold the project and install dependencies

Start from an empty directory. Create the Next.js project, then install all dependencies at exact pinned versions.

terminal

npx create-next-app@latest . --typescript --app --src-dir

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

152 kB·60 tests·100.0% coverage·vitest passing

SHA-256b95ce31833e9149e926f07597d93d8053804d8f6399a5fe26174f0610f866419

Book a conversation All solutions

Comments

Loading comments…

import { Mistral } from "@mistralai/mistralai"; import { repair, UnrepairableError } from "@reaatech/structured-repair-core"; import { ExpenseDocumentSchema } from "../schemas/expense-schema.js"; import type { ExpenseDocument } from "../schemas/expense-schema.js"; import type { TelemetryContext } from "@reaatech/llm-cost-telemetry"; export class ParseError extends Error { code: string; rawInput: string; constructor(message: string, rawInput: string) { super(message); this.name = "ParseError"; this.code = "PARSE_FAILED"; this.rawInput = rawInput; } } const SYSTEM_PROMPT = `You extract expense line items from receipt/invoice text. Return a JSON array of objects matching this schema: { vendorName: string, date: string, totalAmount: number, currency: string, lineItems: [ { itemDescription: string, quantity: number, unitAmount: number, taxType: string, lineAmount: number, category: string } ], receiptNumber?: string }`; export async function parseExpenses( extractedText: string, _telemetryContext?: TelemetryContext, ): Promise<ExpenseDocument[]> { const mistral = new Mistral({ apiKey: process.env.MISTRAL_API_KEY ?? "", }); void _telemetryContext; if (!extractedText || extractedText.trim().length === 0) { return []; } try { const result = await mistral.chat.complete({ model: "mistral-large-latest", messages: [ { role: "system", content: SYSTEM_PROMPT }, { role: "user", content: extractedText }, ], responseFormat: { type: "text" }, }); const messageContent = result.choices[0]?.message?.content; const rawOutput = typeof messageContent === "string" ? messageContent : JSON.stringify(messageContent ?? ""); try { const data = await repair(ExpenseDocumentSchema, rawOutput); if (Array.isArray(data)) { return data; } return [data]; } catch (repairErr) { if (repairErr instanceof UnrepairableError) { throw new ParseError( "Could not parse Mistral output into expense schema", rawOutput, ); } throw new ParseError( repairErr instanceof Error ? repairErr.message : "Unknown repair error", rawOutput, ); } } catch (parseErr) { if (parseErr instanceof ParseError) throw parseErr; const message = parseErr instanceof Error ? parseErr.message : "Unknown Mistral API error"; throw new ParseError(message, extractedText); } }

import { BudgetController } from "@reaatech/agent-budget-engine"; import { BudgetScope } from "@reaatech/agent-budget-types"; let controller: BudgetController | null = null; export class InMemorySpendStore { record(): void {} getSpend(): number { return 0; } reset(): void {} } export function getBudgetController(): BudgetController { if (!controller) { controller = new BudgetController({ spendTracker: new InMemorySpendStore() as never, }); } return controller; } export function initializeBudget(dailyLimit: number): void { const ctrl = getBudgetController(); ctrl.defineBudget({ scopeType: BudgetScope.User, scopeKey: "default", limit: dailyLimit, policy: { softCap: 0.8, hardCap: 1.0 }, }); } export function checkBudget( estimatedCost: number, ): { allowed: boolean; remaining: number; suggestedModel?: string } { const ctrl = getBudgetController(); const result = ctrl.check({ scopeType: BudgetScope.User, scopeKey: "default", estimatedCost, modelId: "mistral-large-latest", tools: [], }); if (!result.allowed) { throw new Error("Budget exceeded"); } return { allowed: result.allowed, remaining: result.remaining, suggestedModel: result.suggestedModel, }; } export function recordSpend( cost: number, inputTokens: number, outputTokens: number, ): void { const ctrl = getBudgetController(); ctrl.record({ requestId: generateFallbackId(), scopeType: BudgetScope.User, scopeKey: "default", cost, inputTokens, outputTokens, modelId: "mistral-large-latest", provider: "mistral", timestamp: new Date(), }); } export function getBudgetStatus(): { spent: number; remaining: number; state: string; } { const ctrl = getBudgetController(); const state = ctrl.getState(BudgetScope.User, "default"); return { spent: state?.spent ?? 0, remaining: state?.remaining ?? 0, state: state?.state ?? "Active", }; } function generateFallbackId(): string { return `req_${String(Date.now())}_${Math.random().toString(36).slice(2, 9)}`; }

import { XeroClient, Invoice } from "xero-node"; import type { ExpenseDocument } from "../schemas/expense-schema.js"; export class XeroPushError extends Error { code: string; invoiceIndex: number; constructor(message: string, invoiceIndex: number) { super(message); this.name = "XeroPushError"; this.code = "XERO_PUSH_FAILED"; this.invoiceIndex = invoiceIndex; } } let xeroClient: XeroClient | null = null; export async function initXeroClient(): Promise<XeroClient> { if (xeroClient) return xeroClient; const client = new XeroClient({ clientId: process.env.XERO_CLIENT_ID ?? "", clientSecret: process.env.XERO_CLIENT_SECRET ?? "", grantType: "client_credentials", }); await client.getClientCredentialsToken(); xeroClient = client; return client; } export async function pushExpensesToXero( documents: ExpenseDocument[], ): Promise<{ success: boolean; invoiceIds: string[] }> { if (documents.length === 0) { return { success: false, invoiceIds: [] }; } const invoiceIds: string[] = []; for (let i = 0; i < documents.length; i++) { const doc = documents[i]; try { const client = await initXeroClient(); const invoice = { type: Invoice.TypeEnum.ACCREC, contact: { name: doc.vendorName }, lineItems: doc.lineItems.map((li) => ({ description: li.itemDescription, quantity: li.quantity, unitAmount: li.unitAmount, accountCode: "500" as const, taxType: li.taxType, lineAmount: li.lineAmount, })), date: doc.date, dueDate: doc.date, reference: doc.receiptNumber ?? doc.vendorName, status: Invoice.StatusEnum.AUTHORISED, }; const response = await client.accountingApi.createInvoices("", { invoices: [invoice], }); const createdId = response.body.invoices?.[0]?.invoiceID; if (createdId) { invoiceIds.push(createdId); } } catch (err) { const message = err instanceof Error ? err.message : "Xero API error"; console.warn(`Failed to push invoice ${String(i)}: ${message}`); } } return { success: invoiceIds.length > 0, invoiceIds }; }

"use client"; import { useState, type SyntheticEvent } from "react"; export default function Home() { const [status, setStatus] = useState<string>("idle"); const [responseData, setResponseData] = useState<string>(""); const [error, setError] = useState<string>(""); function handleSubmit(e: SyntheticEvent<HTMLFormElement>) { e.preventDefault(); setStatus("uploading"); setError(""); setResponseData(""); const form = e.currentTarget; const formData = new FormData(form); fetch("/api/process-expense", { method: "POST", body: formData, }) .then(async (res) => { const data: unknown = await res.json(); setResponseData(JSON.stringify(data, null, 2)); setStatus(res.ok ? "completed" : "failed"); }) .catch((err: unknown) => { setError(err instanceof Error ? err.message : "Unknown error"); setStatus("failed"); }); } return ( <main style={{ maxWidth: 720, margin: "2rem auto", fontFamily: "system-ui, sans-serif", padding: "0 1rem" }}> <h1>Mistral AI ⇢ Xero Expense Pipeline</h1> <p>Upload a PDF receipt or Excel expense sheet to extract line items and push them to Xero.</p> <form onSubmit={handleSubmit} style={{ marginTop: "1.5rem" }}> <input type="file" name="file" accept=".pdf,.xlsx,.xls" required disabled={status === "uploading"} style={{ display: "block", marginBottom: "1rem" }} /> <button type="submit" disabled={status === "uploading"}> {status === "uploading" ? "Processing..." : "Process Expense"} </button> </form> {error && <p style={{ color: "red", marginTop: "1rem" }}>{error}</p>} {responseData && ( <pre style={{ background: "#f4f4f4", padding: "1rem", borderRadius: 4, overflow: "auto", marginTop: "1rem", fontSize: "0.85rem", }} > {responseData} </pre> )} </main> ); }

Mistral AI Document Pipeline for Xero Expense Report Processing

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Define the expense Zod schemas

Step 3: Create shared processing types

Step 4: Build the document extractor (PDF + XLSX)

Step 5: Build the context-window planner

Step 6: Build the expense parser (Mistral AI)

Step 7: Build the cost telemetry service

Step 8: Build the budget enforcer

Step 9: Build the Xero client

Step 10: Create the API route handlers

Step 11: Build the upload UI

Step 12: Create barrel exports and run the tests

Next steps