xAI Grok Invoice Extraction for QuickBooks SMBs

Extract structured invoice data from PDFs using xAI Grok, with automatic repair of malformed LLM output and cost monitoring.

xai-grok invoice-extraction quickbooks document-pipeline structured-repair cost-telemetry idempotency express

The problem

Small businesses receive hundreds of invoices in PDF format that need to be manually entered into QuickBooks; existing OCR is brittle and expensive, and LLMs sometimes produce unparseable JSON.

Built from

Intro

This tutorial walks you through building an invoice extraction pipeline with Next.js and xAI Grok. You’ll build an API endpoint that accepts a PDF invoice, parses it with Unstructured.io, extracts structured fields using Grok, repairs any malformed LLM output with structured-repair-core, tracks LLM spend per tenant with agent-budget-engine, and pushes the final invoice to QuickBooks Online. The pipeline is idempotent — duplicate uploads with the same key return the cached result instead of charging you twice.

This is for TypeScript developers who want a copy-paste-along recipe combining LLM extraction, repair, cost governance, and a third-party accounting API.

Prerequisites

Node.js 22+ and pnpm 10 installed
xAI API key — sign up at console.x.ai and create a key
Unstructured.io API key — sign up at unstructured.io and get a free API key
QuickBooks Online sandbox account — create one at developer.intuit.com and note the OAuth credentials (consumer key, consumer secret, OAuth token, realm ID, refresh token)
Familiarity with Next.js App Router, TypeScript, and Zod

Step 1: Scaffold the project and configure environment variables

Start by creating a Next.js App Router project. Pin every dependency to an exact version so the build is reproducible.

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

165 kB·83 tests·98.9% coverage·vitest passing

SHA-2568a84aa9352d3746dfbee200d4ea10202af2477372184c66694223a11889851e2

Book a conversation All solutions

Comments

Loading comments…

// src/services/budget-tracker.ts import { SpendStore } from "@reaatech/agent-budget-spend-tracker"; import { BudgetController } from "@reaatech/agent-budget-engine"; import { BudgetScope } from "@reaatech/agent-budget-types"; import { generateId, now } from "@reaatech/llm-cost-telemetry"; import { calculateCost } from "@reaatech/llm-cost-telemetry-calculator"; export class BudgetExceededError extends Error { constructor(message: string) { super(message); this.name = "BudgetExceededError"; } } export function createBudgetEngine(): { store: SpendStore; controller: BudgetController } { const store = new SpendStore({ maxEntries: 500_000 }); const controller = new BudgetController({ spendTracker: store }); controller.on("threshold-breach", (event) => { console.warn(`Budget threshold breached: ${JSON.stringify(event)}`); }); controller.on("hard-stop", (event) => { console.error(`Budget hard stop reached: ${JSON.stringify(event)}`); }); return { store, controller }; } export function configureTenantBudget( controller: BudgetController, tenantId: string, limitUsd: number, ): void { controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: tenantId, limit: limitUsd, policy: { softCap: 0.8, hardCap: 1.0, }, }); } export function checkPreCallBudget( controller: BudgetController, tenantId: string, estimatedCost: number, ): { allowed: boolean; action: string; suggestedModel: string | null } { const result = controller.check({ scopeType: BudgetScope.User, scopeKey: tenantId, estimatedCost, modelId: "grok-2-latest", tools: [], }); return { allowed: result.allowed, action: result.action, suggestedModel: result.suggestedModel ?? null, }; } export function recordSpend( controller: BudgetController, tenantId: string, costUsd: number, inputTokens: number, outputTokens: number, ): void { controller.record({ requestId: generateId(), scopeType: BudgetScope.User, scopeKey: tenantId, cost: costUsd, inputTokens, outputTokens, modelId: "grok-2-latest", provider: "openai", timestamp: now(), }); } export function calculateInvoiceCost( modelId: string, inputTokens: number, outputTokens: number, ): number { const result = calculateCost({ provider: "openai", model: modelId, inputTokens, outputTokens, }); return result.costUsd; }

// app/api/invoices/route.ts import { type NextRequest, NextResponse } from "next/server"; import { IdempotencyError } from "@reaatech/idempotency-middleware"; import { processInvoice } from "../../../src/pipeline/process-invoice.js"; import { BudgetExceededError } from "../../../src/services/budget-tracker.js"; import { promises as fs } from "node:fs"; export async function POST(req: NextRequest): Promise<NextResponse> { let tempPath: string | null = null; try { const formData = await req.formData(); const file = formData.get("file") as File | null; const idempotencyKey = req.headers.get("Idempotency-Key") ?? req.headers.get("idempotency-key"); if (!file) { return NextResponse.json({ error: "Missing file" }, { status: 400 }); } if (!idempotencyKey) { return NextResponse.json({ error: "Missing Idempotency-Key header" }, { status: 400 }); } const tenantId = req.nextUrl.searchParams.get("tenantId") ?? "default"; tempPath = `/tmp/${idempotencyKey}-${file.name}`; await fs.writeFile(tempPath, Buffer.from(await file.arrayBuffer())); const result = await processInvoice({ filePath: tempPath, tenantId, idempotencyKey }); if (result.success) { return NextResponse.json( { success: true, invoice: result.invoice, cost: result.cost, processingTimeMs: result.processingTimeMs }, { status: 200 }, ); } return NextResponse.json( { success: false, error: result.error }, { status: 500 }, ); } catch (err) { if (err instanceof IdempotencyError && err.getStatusCode() === 409) { return NextResponse.json({ error: "Duplicate request", code: "CONFLICT" }, { status: 409 }); } if (err instanceof BudgetExceededError) { return NextResponse.json({ error: err.message, code: "BUDGET_EXCEEDED" }, { status: 402 }); } return NextResponse.json({ error: "Internal server error" }, { status: 500 }); } finally { if (tempPath) { await fs.unlink(tempPath).catch(() => {}); } } } export function GET(): NextResponse { return NextResponse.json({ status: "ok", service: "xai-grok-invoice-extraction" }); }

xAI Grok Invoice Extraction for QuickBooks SMBs

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and configure environment variables

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and configure environment variables

Step 2: Define the invoice and pipeline types with Zod

Step 3: Build the PDF processor with Unstructured.io and sharp

Step 4: Build the LLM extractor with fallback and repair

Step 5: Build the budget tracker with per-tenant spend control

Step 6: Build the QuickBooks integration

Step 7: Build the pipeline orchestrator with idempotency

Step 8: Build the API route handler

Step 9: Run the tests

Step 10: Try the API with curl

Next steps