Vertex AI Invoice Extraction for SMB Accounting

Turn stacks of invoices and receipts into clean QuickBooks transactions with Vertex AI document parsing and structured repair, reducing manual data entry to zero.

vertex-ai invoice-extraction document-pipeline quickbooks gemini structured-repair confidence-router cost-telemetry typescript nextjs

The problem

Small business owners waste hours each week manually entering invoice data into QuickBooks. Off-the-shelf OCR tools produce messy, unstructured text that still requires fixing, and generic AI pipelines fail when formats vary.

Built from

Intro

This recipe turns stacks of PDF invoices and receipts into clean QuickBooks transactions using Google’s Gemini 2.5 Flash on Vertex AI. You’ll build a complete document pipeline that extracts structured data from PDF invoices, repairs malformed LLM output, routes high-confidence fields straight to QuickBooks, flags low-confidence items for human review, and tracks per-document processing costs. By the end you’ll have a Next.js API route and a CLI batch processor that both feed the same extraction pipeline.

Prerequisites

Node.js 22+ and pnpm 10 installed
A Google Cloud Platform project with the Vertex AI API enabled
A service-account JSON key file downloaded to your machine
(Optional) A webhook endpoint that accepts QuickBooks-style transaction payloads

Step 1: Scaffold the project, configure vitest, and install dependencies

Start with a fresh Next.js 16 project. The application router lives at the project root under app/, while service code is organized under src/.

terminal

npx create-next-app@latest vertex-ai-invoice-extraction --typescript --app --use-pnpm
cd vertex-ai-invoice-extraction

Create vitest.config.ts at the project root. This sets up the @ path alias that the API route uses for its imports, and configures the 90% coverage thresholds across lines, branches, functions, and statements:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

163 kB·89 tests·96.6% coverage·vitest passing

SHA-256c3d90f76c7800b0e6b671b2f76e54b8be3282ea3318b7fa0a10098aa2572d112

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js 22+ and pnpm 10 installed
A Google Cloud Platform project with the Vertex AI API enabled
A service-account JSON key file downloaded to your machine
(Optional) A webhook endpoint that accepts QuickBooks-style transaction payloads

Step 1: Scaffold the project, configure vitest, and install dependencies

Start with a fresh Next.js 16 project. The application router lives at the project root under app/, while service code is organized under src/.

terminal

npx create-next-app@latest vertex-ai-invoice-extraction --typescript --app --use-pnpm
cd vertex-ai-invoice-extraction

import { GoogleGenAI } from "@google/genai"; import pdf from "../lib/pdf-adapter.js"; import { InvoiceExtractionSchema, type InvoiceExtraction } from "../schemas/invoice.js"; export class ExtractionError extends Error { code: string; constructor(message: string, code: string, cause?: Error) { super(message); this.name = "ExtractionError"; this.code = code; if (cause) this.cause = cause; } } export class VertexExtractor { lastUsageMetadata: unknown = undefined; constructor(private readonly ai: GoogleGenAI) {} async extractRawFromText(rawText: string): Promise<{ rawOutput: string; usageMetadata?: unknown }> { const prompt = `Extract structured invoice data from the following OCR text. Return ONLY valid JSON matching this schema: header contains invoiceNumber, invoiceDate, dueDate, currency, vendorName, vendorAddress (line1, city, state, zip, country), customerName; lineItems is an array with description, quantity, unitPrice, total; summary has subtotal, taxTotal, grandTotal. Text:\n${rawText}`; try { const response = await this.ai.models.generateContent({ model: "gemini-2.5-flash", contents: prompt, }); this.lastUsageMetadata = response.usageMetadata; return { rawOutput: typeof response.text === "string" ? response.text : "", usageMetadata: response.usageMetadata }; } catch (err) { throw new ExtractionError( err instanceof Error ? err.message : "Vertex API call failed", "API_ERROR", err instanceof Error ? err : undefined, ); } } async extractFromText(rawText: string): Promise<InvoiceExtraction> { const { rawOutput } = await this.extractRawFromText(rawText); return this.tryParse(rawOutput); } async extractFromPdfBuffer(pdfBuffer: Buffer): Promise<InvoiceExtraction> { const result = await pdf(pdfBuffer); return this.extractFromText(result.text); } private tryParse(raw: string): InvoiceExtraction { let parsed: unknown; try { parsed = JSON.parse(raw); } catch { throw new ExtractionError("Vertex response is not valid JSON", "JSON_PARSE_ERROR"); } const result = InvoiceExtractionSchema.safeParse(parsed); if (!result.success) { throw new ExtractionError(`Schema validation failed: ${result.error.message}`, "VALIDATION_FAILED"); } return result.data; } }

import { ConfidenceRouter } from "@reaatech/confidence-router"; import { KeywordClassifier } from "@reaatech/confidence-router-classifiers"; import type { InvoiceExtraction } from "../schemas/invoice.js"; export interface FieldRoutingDecision { field: string; type: "ROUTE" | "CLARIFY" | "FALLBACK"; target: string; } export interface InvoiceRoutingResult { overallType: "ROUTE" | "CLARIFY" | "FALLBACK"; highConfidenceFields: string[]; lowConfidenceFields: string[]; fieldDecisions: FieldRoutingDecision[]; } export function createInvoiceRouter(): ConfidenceRouter { const router = new ConfidenceRouter({ routeThreshold: 0.85, fallbackThreshold: 0.3, clarificationEnabled: true, }); router.registerClassifier( new KeywordClassifier( [ { label: "invoice_number", keywords: ["invoice", "inv-", "#"] }, { label: "vendor", keywords: ["vendor", "supplier", "from"] }, { label: "line_items", keywords: ["line", "item", "qty", "description"] }, { label: "totals", keywords: ["total", "subtotal", "grand", "amount"] }, ], { name: "invoice-fields", caseSensitive: false } ) ); return router; } export function routeField(router: ConfidenceRouter, fieldName: string, confidence: number): FieldRoutingDecision { const decision = router.decide({ predictions: [{ label: fieldName, confidence }], }); return { field: fieldName, type: decision.type, target: decision.target ?? fieldName }; } export function routeWholeInvoice(router: ConfidenceRouter, extraction: InvoiceExtraction): InvoiceRoutingResult { const fields: Array<{ name: string; confidence: number }> = [ { name: "invoiceNumber", confidence: extraction.confidence }, { name: "vendorName", confidence: extraction.confidence }, { name: "lineItems", confidence: extraction.confidence }, { name: "grandTotal", confidence: extraction.confidence }, ]; const decisions = fields.map((f) => routeField(router, f.name, f.confidence)); const high = decisions.filter((d) => d.type === "ROUTE").map((d) => d.field); const low = decisions.filter((d) => d.type !== "ROUTE").map((d) => d.field); const priority: Record<string, number> = { ROUTE: 0, CLARIFY: 1, FALLBACK: 2 }; const worst = decisions.reduce<FieldRoutingDecision["type"]>((w, d) => (priority[d.type] > priority[w] ? d.type : w), "ROUTE"); return { overallType: worst, highConfidenceFields: high, lowConfidenceFields: low, fieldDecisions: decisions }; }

import { generateId, now } from "@reaatech/llm-cost-telemetry"; import { ConfidenceRouter } from "@reaatech/confidence-router"; import type { InvoiceExtraction } from "../schemas/invoice.js"; import type { InvoiceRoutingResult } from "../services/invoice-router.js"; import { routeWholeInvoice } from "../services/invoice-router.js"; import { sendToQuickBooks, type QuickBooksConfig } from "../services/quickbooks-sender.js"; export interface ReviewItem { id: string; invoice: InvoiceExtraction; routingResult: InvoiceRoutingResult; submittedAt: string; status: "pending" | "resolved" | "dismissed"; correctedInvoice?: InvoiceExtraction; } export class ReviewQueue { private items: Map<string, ReviewItem> = new Map(); enqueue(invoice: InvoiceExtraction, routingResult: InvoiceRoutingResult): string { const id = generateId(); this.items.set(id, { id, invoice, routingResult, submittedAt: now().toISOString(), status: "pending", }); return id; } dequeue(id: string): ReviewItem | undefined { return this.items.get(id); } listAll(status?: ReviewItem["status"]): ReviewItem[] { const all = [...this.items.values()]; return status ? all.filter((i) => i.status === status) : all; } resolve(id: string, correctedInvoice: InvoiceExtraction): boolean { const item = this.items.get(id); if (!item) return false; item.status = "resolved"; item.correctedInvoice = correctedInvoice; return true; } dismiss(id: string): boolean { const item = this.items.get(id); if (!item) return false; item.status = "dismissed"; return true; } } export async function processReadyReviews( queue: ReviewQueue, routerFactory: () => ConfidenceRouter, quickbooksConfig: QuickBooksConfig ): Promise<{ sent: number; failed: number }> { const pending = queue.listAll("pending"); let sent = 0; let failed = 0; for (const item of pending) { if (!item.correctedInvoice) continue; const router = routerFactory(); const result = routeWholeInvoice(router, item.correctedInvoice); if (result.overallType === "ROUTE") { const response = await sendToQuickBooks(item.correctedInvoice, quickbooksConfig); if (response.ok) { queue.dismiss(item.id); sent++; } else { failed++; } } else { failed++; } } return { sent, failed }; }

import { loadConfig, generateId, now, type CostSpan } from "@reaatech/llm-cost-telemetry"; import { calculateCost } from "@reaatech/llm-cost-telemetry-calculator"; let telemetryConfig: ReturnType<typeof loadConfig> | null = null; export function resetTelemetryConfig(): void { telemetryConfig = null; } export function getTelemetryConfig(): ReturnType<typeof loadConfig> { if (!telemetryConfig) telemetryConfig = loadConfig(); return telemetryConfig; } export function trackExtractionCost(args: { provider: string; model: string; inputTokens: number; outputTokens: number; tenant?: string; }): { costUsd: number; span: CostSpan } { const result = calculateCost({ provider: args.provider as "openai" | "anthropic" | "google", model: args.model, inputTokens: args.inputTokens, outputTokens: args.outputTokens, }); const span: CostSpan = { id: generateId(), provider: args.provider as "openai" | "anthropic" | "google", model: args.model, inputTokens: args.inputTokens, outputTokens: args.outputTokens, costUsd: result.costUsd, tenant: args.tenant ?? "default", feature: "invoice-extraction", timestamp: now(), }; return { costUsd: result.costUsd, span }; } export function checkBudget(tenant: string, estimatedCostUsd: number): boolean { const config = getTelemetryConfig(); const globalBudget = config.budget.global; if (!globalBudget) return true; const dailyBudget = globalBudget.daily; if (typeof dailyBudget !== "number") return true; return estimatedCostUsd <= dailyBudget; } export function formatCostReport(spans: Array<{ costUsd: number; feature: string }>): string { const byFeature = new Map<string, number>(); let total = 0; for (const s of spans) { byFeature.set(s.feature, (byFeature.get(s.feature) ?? 0) + s.costUsd); total += s.costUsd; } const lines = ["Cost Report:", "Feature | Cost | %"]; for (const [feature, cost] of byFeature) { const pct = total > 0 ? ((cost / total) * 100).toFixed(1) + "%" : "0%"; lines.push(`${feature} | $${cost.toFixed(4)} | ${pct}`); } lines.push(`Total | $${total.toFixed(4)} | 100%`); return lines.join("\n"); }

import { NextRequest, NextResponse } from "next/server"; import { GoogleGenAI } from "@google/genai"; import { VertexExtractor } from "@/src/services/vertex-extractor"; import { repairInvoiceOutput } from "@/src/services/repair-pipeline"; import { createInvoiceRouter, routeWholeInvoice } from "@/src/services/invoice-router"; import { sendToQuickBooks } from "@/src/services/quickbooks-sender"; import pdf from "@/src/lib/pdf-adapter"; import { ReviewQueue } from "@/src/queues/review"; import { trackExtractionCost } from "@/src/lib/cost-telemetry"; const reviewQueue = new ReviewQueue(); export async function POST(req: NextRequest) { try { const formData = await req.formData(); const fileField = formData.get("file"); if (!fileField || !(fileField instanceof File)) { return NextResponse.json({ error: "file required" }, { status: 400 }); } const bytes = await fileField.arrayBuffer(); const buffer = Buffer.from(bytes); const ai = new GoogleGenAI({ vertexai: true, project: process.env["GOOGLE_CLOUD_PROJECT"], location: process.env["GOOGLE_CLOUD_LOCATION"], }); const extractor = new VertexExtractor(ai); const pdfData = await pdf(buffer); const rawResult = await extractor.extractRawFromText(pdfData.text); const repairResult = repairInvoiceOutput(rawResult.rawOutput); if (!repairResult.success || !repairResult.data) { return NextResponse.json( { error: "extraction failed", details: repairResult.errors }, { status: 422 }, ); } const repairedInvoice = repairResult.data; const usage = extractor.lastUsageMetadata as { promptTokenCount?: number; candidatesTokenCount?: number } | undefined; trackExtractionCost({ provider: "google", model: "gemini-2.5-flash", inputTokens: usage?.promptTokenCount ?? 0, outputTokens: usage?.candidatesTokenCount ?? 0, }); const router = createInvoiceRouter(); const routingResult = routeWholeInvoice(router, repairedInvoice); if (routingResult.overallType === "ROUTE") { const qbConfig = { webhookUrl: process.env["QUICKBOOKS_WEBHOOK_URL"] ?? "", apiToken: process.env["QUICKBOOKS_API_TOKEN"] ?? "", }; const qbResult = await sendToQuickBooks(repairedInvoice, qbConfig); if (qbResult.ok) { return NextResponse.json({ status: "sent", transactionId: qbResult.transactionId }); } } const reviewId = reviewQueue.enqueue(repairedInvoice, routingResult); return NextResponse.json({ status: "review_required", reviewId }); } catch (err) { const message = err instanceof Error ? err.message : String(err); return NextResponse.json({ error: "internal error", message }, { status: 500 }); } } export function GET() { return NextResponse.json({ status: "ok", version: "0.1.0" }); }

Vertex AI Invoice Extraction for SMB Accounting

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project, configure vitest, and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project, configure vitest, and install dependencies

Step 2: Define the invoice Zod schemas

Step 3: Create the PDF text extraction adapter

Step 4: Build the Vertex AI extraction service

Step 5: Wire up the structured repair pipeline

Step 6: Build the confidence router wrapper

Step 7: Create the QuickBooks webhook sender

Step 8: Implement the human review queue

Step 9: Add cost telemetry

Step 10: Build the API route handler

Step 11: Create the CLI batch processor

Step 12: Wire up barrel exports

Step 13: Configure environment variables and run tests

Next steps