xAI Grok Expense Report Extraction for SMB Finance

Automatically extract line items, totals, and merchant names from scanned receipts and invoices using xAI Grok's vision model, then export to spreadsheets.

xai-grok expense-report receipt-ocr document-pipeline nextjs typescript aws-s3

The problem

SMB finance teams spend hours manually entering data from paper receipts and PDF invoices, leading to errors and delayed expense reporting.

Built from

Intro

This tutorial walks you through building an expense-report extraction pipeline that processes scanned receipts and PDF invoices using xAI Grok’s vision model. You’ll create a Next.js API that accepts document uploads, extracts text via tesseract.js and unpdf, passes it to Grok for structured expense extraction, repairs malformed JSON, validates accuracy against golden reference data, tracks API costs, and exports results as CSV files stored in S3.

Prerequisites

Node.js 22+ and pnpm installed on your machine
An xAI API key with access to the Grok model (set as XAI_API_KEY)
An AWS S3 bucket and IAM credentials (set as AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME)
Familiarity with TypeScript and Next.js App Router basics

Step 1: Scaffold the project and install dependencies

Start by creating a Next.js project with the App Router and installing the required packages.

terminal

npx create-next-app@latest xai-grok-expense-extraction --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd xai-grok-expense-extraction

Next, install the exact pinned versions of the dependency packages:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

172 kB·99 tests·99.0% coverage·vitest passing

SHA-2564f58dfeb40ac45116f5efff9f340b6da88d105a75a3a6722c8ed513a304902c5

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js 22+ and pnpm installed on your machine
An xAI API key with access to the Grok model (set as XAI_API_KEY)
An AWS S3 bucket and IAM credentials (set as AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME)
Familiarity with TypeScript and Next.js App Router basics

Step 1: Scaffold the project and install dependencies

Start by creating a Next.js project with the App Router and installing the required packages.

terminal

npx create-next-app@latest xai-grok-expense-extraction --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd xai-grok-expense-extraction

Next, install the exact pinned versions of the dependency packages:

import { generateId, now, calculateCostFromTokens, CostSpanSchema, loadConfig, retryWithBackoff, } from "@reaatech/llm-cost-telemetry"; import type { CostSpan } from "@reaatech/llm-cost-telemetry"; const XAI_INPUT_PRICE_PER_MILLION = 2.50; const XAI_OUTPUT_PRICE_PER_MILLION = 10.00; export class CostTracker { private spans: CostSpan[] = []; recordCall(params: { provider: string; model: string; inputTokens: number; outputTokens: number; feature: string; }): void { const fn = (): Promise<void> => { const inputCost = calculateCostFromTokens(params.inputTokens, XAI_INPUT_PRICE_PER_MILLION); const outputCost = calculateCostFromTokens(params.outputTokens, XAI_OUTPUT_PRICE_PER_MILLION); const totalCost = inputCost + outputCost; const span: CostSpan = { id: generateId(), provider: "openai", model: params.model, inputTokens: params.inputTokens, outputTokens: params.outputTokens, costUsd: totalCost, feature: params.feature, timestamp: now(), metadata: { actualProvider: params.provider, }, }; CostSpanSchema.parse(span); this.spans.push(span); return Promise.resolve(); }; void retryWithBackoff(fn, { maxRetries: 2, initialDelayMs: 200, maxDelayMs: 2000, backoffMultiplier: 2, }); } getSessionCost(): number { return this.spans.reduce((sum, s) => sum + s.costUsd, 0); } budgetCheck(): { withinBudget: boolean; dailyUsed: number; dailyLimit: number } { const config = loadConfig(); const dailyLimit = config.budget.global?.daily ?? 5.0; const dailyUsed = this.getSessionCost(); const withinBudget = dailyUsed < dailyLimit; return { withinBudget, dailyUsed, dailyLimit }; } getCostByFeature(): Record<string, number> { const grouped: Record<string, number> = {}; for (const span of this.spans) { const feature = span.feature ?? "unknown"; grouped[feature] = (grouped[feature] ?? 0) + span.costUsd; } return grouped; } }

import type { ExtractedExpense } from "../types/index.js"; function escapeCsvValue(value: string | number | boolean | undefined | null): string { if (value === undefined || value === null) return ""; const str = String(value); if (str.includes(",") || str.includes('"') || str.includes("\n")) { return `"${str.replace(/"/g, '""')}"`; } return str; } export function exportCsv(expenses: ExtractedExpense[]): string { const headers = [ "merchantName", "invoiceDate", "invoiceNumber", "subtotal", "tax", "total", "currency", "isPaid", "lineItemDescription", "lineItemAmount", "lineItemQuantity", "notes", ]; const rows: string[] = [headers.join(",")]; for (const expense of expenses) { if (expense.lineItems.length === 0) { rows.push( [ escapeCsvValue(expense.merchantName), escapeCsvValue(expense.invoiceDate), escapeCsvValue(expense.invoiceNumber), escapeCsvValue(expense.subtotal), escapeCsvValue(expense.tax), escapeCsvValue(expense.total), escapeCsvValue(expense.currency), escapeCsvValue(expense.isPaid), "", "", "", escapeCsvValue(expense.notes), ].join(","), ); } else { for (const item of expense.lineItems) { rows.push( [ escapeCsvValue(expense.merchantName), escapeCsvValue(expense.invoiceDate), escapeCsvValue(expense.invoiceNumber), escapeCsvValue(expense.subtotal), escapeCsvValue(expense.tax), escapeCsvValue(expense.total), escapeCsvValue(expense.currency), escapeCsvValue(expense.isPaid), escapeCsvValue(item.description), escapeCsvValue(item.amount), escapeCsvValue(item.quantity), escapeCsvValue(expense.notes), ].join(","), ); } } } return rows.join("\n"); }

xAI Grok Expense Report Extraction for SMB Finance

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Configure environment variables

Step 3: Define shared types

Step 4: Build the document ingestion pipeline

Step 5: Create the Grok extraction module

Step 6: Implement JSON repair

Step 7: Build the quality gate

Step 8: Create the cost tracker

Step 9: Build the S3 storage module

Step 10: Create the CSV exporter

Step 11: Wire everything into the pipeline orchestrator

Step 12: Create the upload API route

Step 13: Add barrel exports and entry point

Step 14: Run the tests

Next steps