xAI Grok Tax Form Extraction for SMB Accounting

Automatically extract and normalize line items from tax forms (1040, W-2, 1099) for small business bookkeeping using Grok’s reasoning and REAA’s output repair engine.

xai-grok tax-form-extraction document-pipeline nextjs accounting ocr structured-output llm-cache budget-guardrails

The problem

SMB accountants spend hours transcribing numbers from PDF tax forms into spreadsheets. Manual entry is slow and error-prone, and off-the-shelf OCR often produces garbled or malformed JSON that downstream systems can’t use.

Built from

Intro

This tutorial walks you through building a tax form extraction pipeline for small-business accounting. You’ll create a Next.js route handler that accepts PDF tax forms (Form 1040, W-2, 1099-NEC, 1099-MISC), extracts text using unpdf with a tesseract.js OCR fallback, sends the content to xAI Grok with a structured JSON schema, repairs malformed output using REAA’s structured repair engine, and enforces daily budget caps with per-call cost telemetry and semantic caching. By the end, you’ll have a complete document extraction endpoint that returns schema-validated JSON ready for QuickBooks or Xero upload.

Prerequisites

Node.js >= 22 and pnpm >= 10 installed
An xAI API key — set it as XAI_API_KEY in your environment
Redis running locally on port 6379 (or a remote Redis URL) — used by the LLM cache
An OpenAI API key (optional, for text-embedding-3-small embeddings) — falls back to your xAI API key if not set
Basic familiarity with TypeScript, Next.js App Router, and REST APIs

Step 1: Scaffold the Next.js project

Create a new Next.js project with the App Router. The scaffold gives you TypeScript, ESLint, and the App Router layout. You’ll then add Vitest for testing along with this recipe’s pinned dependencies.

terminal

npx create-next-app@latest xai-grok-tax-form-extraction-for-smb-accounting --typescript --eslint --app

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

169 kB·92 tests·97.2% coverage·vitest passing

SHA-25632b45d4372dc3d967361738ebf17ca2702a8979401abf439e85551893b5d14be

Book a conversation All solutions

Comments

Loading comments…

import { z } from "zod"; export const Form1040Schema = z.object({ formType: z.literal("1040"), filingStatus: z.enum([ "single", "married_joint", "married_separate", "head_of_household", "qualifying_widow", ]), wages: z.number(), taxableInterest: z.number(), adjustedGrossIncome: z.number(), totalTax: z.number(), refund: z.number().optional(), amountOwed: z.number().optional(), }); export const W2Schema = z.object({ formType: z.literal("W-2"), employerEIN: z.string(), employerName: z.string(), wagesTips: z.number(), federalIncomeTaxWithheld: z.number(), socialSecurityWages: z.number(), socialSecurityTaxWithheld: z.number(), medicareWages: z.number(), medicareTaxWithheld: z.number(), }); export const Form1099Schema = z.object({ formType: z.enum(["1099-NEC", "1099-MISC"]), payerEIN: z.string(), payerName: z.string(), nonemployeeCompensation: z.number().optional(), rents: z.number().optional(), otherIncome: z.number().optional(), federalTaxWithheld: z.number(), }); export const ExtractedTaxDocumentSchema = z.discriminatedUnion("formType", [ Form1040Schema, W2Schema, Form1099Schema, ]); export const ProcessingMetadataSchema = z.object({ extractionMethod: z.enum(["pdf-text", "ocr", "hybrid"]), confidence: z.number().min(0).max(1), tokensUsed: z.number().optional(), costUsd: z.number().optional(), totalPages: z.number(), }); export const TaxExtractionOutputSchema = z.object({ documents: z.array(ExtractedTaxDocumentSchema), processingMetadata: ProcessingMetadataSchema, }); export type Form1040 = z.infer<typeof Form1040Schema>; export type W2Form = z.infer<typeof W2Schema>; export type Form1099 = z.infer<typeof Form1099Schema>; export type ExtractedTaxDocument = z.infer<typeof ExtractedTaxDocumentSchema>; export type TaxExtractionOutput = z.infer<typeof TaxExtractionOutputSchema>; export type ProcessingMetadata = z.infer<typeof ProcessingMetadataSchema>; export const SYSTEM_PROMPT_TEMPLATE = `You are a tax form data extraction assistant. Extract the following fields from the tax form text and return valid JSON. Supported form types and their fields: For Form 1040: formType ("1040"), filingStatus (one of: single, married_joint, married_separate, head_of_household, qualifying_widow), wages (number), taxableInterest (number), adjustedGrossIncome (number), totalTax (number), refund (optional number), amountOwed (optional number). For Form W-2: formType ("W-2"), employerEIN (string), employerName (string), wagesTips (number), federalIncomeTaxWithheld (number), socialSecurityWages (number), socialSecurityTaxWithheld (number), medicareWages (number), medicareTaxWithheld (number). For Form 1099-NEC or 1099-MISC: formType ("1099-NEC" or "1099-MISC"), payerEIN (string), payerName (string), nonemployeeCompensation (optional number), rents (optional number), otherIncome (optional number), federalTaxWithheld (number). Return ONLY valid JSON. Do not include any explanatory text.`;

import OpenAI from "openai"; import { config } from "../lib/config.js"; import { GrokApiError } from "../types.js"; export interface GrokResponse { content: string; usage: { promptTokens: number; completionTokens: number; totalTokens: number; }; } export async function callGrok( promptText: string, systemPrompt: string, ): Promise<GrokResponse> { const client = new OpenAI({ apiKey: config.xaiApiKey, baseURL: config.xaiBaseUrl, timeout: 60 * 1000, maxRetries: 0, }); try { const completion = await client.chat.completions.create({ model: config.xaiModel, messages: [ { role: "developer", content: systemPrompt }, { role: "user", content: promptText }, ], response_format: { type: "json_object" }, }); return { content: completion.choices[0]?.message?.content ?? "", usage: { promptTokens: completion.usage?.prompt_tokens ?? 0, completionTokens: completion.usage?.completion_tokens ?? 0, totalTokens: completion.usage?.total_tokens ?? 0, }, }; } catch (error: unknown) { if (error instanceof OpenAI.AuthenticationError) { throw new GrokApiError("Authentication failed: check XAI_API_KEY"); } if (error instanceof OpenAI.RateLimitError) { throw new GrokApiError("Rate limit exceeded", 60); } if (error instanceof OpenAI.APIConnectionError) { try { const retryCompletion = await client.chat.completions.create({ model: config.xaiModel, messages: [ { role: "developer", content: systemPrompt }, { role: "user", content: promptText }, ], response_format: { type: "json_object" }, }); return { content: retryCompletion.choices[0]?.message?.content ?? "", usage: { promptTokens: retryCompletion.usage?.prompt_tokens ?? 0, completionTokens: retryCompletion.usage?.completion_tokens ?? 0, totalTokens: retryCompletion.usage?.total_tokens ?? 0, }, }; } catch (retryError: unknown) { if (retryError instanceof OpenAI.APIConnectionError) { throw new GrokApiError( "Connection failed after retry: " + retryError.message, ); } throw retryError; } } throw error; } }

// tests/services/text-extractor.test.ts import { describe, it, expect, vi, beforeEach } from "vitest"; const mockExtractText = vi.fn().mockResolvedValue({ totalPages: 3, text: "Form 1040 line items", }); const mockRecognize = vi.fn().mockResolvedValue({ data: { text: "OCR extracted text" }, }); vi.mock("unpdf", () => ({ getDocumentProxy: vi.fn().mockResolvedValue({}), extractText: mockExtractText, })); vi.mock("tesseract.js", () => ({ createWorker: vi.fn().mockResolvedValue({ recognize: mockRecognize, terminate: vi.fn().mockResolvedValue(undefined), }), })); describe("extractPdfText", () => { beforeEach(() => { mockExtractText.mockClear(); mockExtractText.mockResolvedValue({ totalPages: 3, text: "Form 1040 line items", }); mockRecognize.mockClear(); mockRecognize.mockResolvedValue({ data: { text: "OCR extracted text" }, }); }); it("extracts text using unpdf on first attempt", async () => { const { extractPdfText } = await import( "../../src/services/text-extractor.js" ); const buffer = new Uint8Array([1, 2, 3]); const result = await extractPdfText(buffer); expect(result.method).toBe("pdf-text"); expect(result.text).toBe("Form 1040 line items"); expect(result.totalPages).toBe(3); }); it("falls back to tesseract.js when unpdf returns empty text", async () => { mockExtractText.mockResolvedValue({ totalPages: 0, text: "", }); const { extractPdfText } = await import( "../../src/services/text-extractor.js" ); const buffer = new Uint8Array([4, 5, 6]); const result = await extractPdfText(buffer); expect(result.method).toBe("ocr"); expect(result.text).toBe("OCR extracted text"); }); it("throws ExtractionError when both methods fail", async () => { mockExtractText.mockRejectedValue(new Error("unpdf error")); mockRecognize.mockRejectedValue(new Error("ocr error")); const { ExtractionError } = await import("../../src/types.js"); const { extractPdfText } = await import( "../../src/services/text-extractor.js" ); await expect(extractPdfText(new Uint8Array([7, 8, 9]))).rejects.toThrow( ExtractionError, ); }); it("falls back to OCR when unpdf throws an error", async () => { mockExtractText.mockRejectedValue(new Error("PDF parse error")); const { extractPdfText } = await import( "../../src/services/text-extractor.js" ); const buffer = new Uint8Array([10, 11, 12]); const result = await extractPdfText(buffer); expect(result.method).toBe("ocr"); expect(result.text).toBe("OCR extracted text"); }); });

// tests/services/grok-client.test.ts import { describe, it, expect, beforeAll, afterAll, afterEach, vi } from "vitest"; vi.mock("../../src/lib/config.js", () => ({ config: { xaiApiKey: "test-key-123", xaiBaseUrl: "https://api.x.ai/v1", xaiModel: "grok-3", redisUrl: "redis://localhost:6379", dailyBudgetUsd: 5.0, cacheEnabled: true, cacheSemanticThreshold: 0.85, }, })); import { setupServer } from "msw/node"; import { http, HttpResponse } from "msw"; import { callGrok } from "../../src/services/grok-client.js"; const server = setupServer( http.post("https://api.x.ai/v1/chat/completions", () => HttpResponse.json({ id: "cmpl-test", model: "grok-3", choices: [ { index: 0, message: { role: "assistant", content: '{"name":"John","age":30}', }, finish_reason: "stop", }, ], usage: { prompt_tokens: 50, completion_tokens: 20, total_tokens: 70, }, }), ), ); beforeAll(() => { server.listen({ onUnhandledRequest: "error" }); }); afterEach(() => { server.resetHandlers(); }); afterAll(() => { server.close(); }); describe("callGrok", () => { it("returns parsed content on successful response", async () => { const result = await callGrok("Extract data", "Return JSON"); expect(result.content).toBe('{"name":"John","age":30}'); expect(result.usage.promptTokens).toBeGreaterThanOrEqual(0); }); it("returns empty string when content is empty", async () => { server.use( http.post("https://api.x.ai/v1/chat/completions", () => HttpResponse.json({ id: "cmpl-empty", model: "grok-3", choices: [ { index: 0, message: { role: "assistant", content: "" }, finish_reason: "stop" }, ], }), ), ); const result = await callGrok("test", "test"); expect(result.content).toBe(""); }); it("returns empty string when choices array is empty", async () => { server.use( http.post("https://api.x.ai/v1/chat/completions", () => HttpResponse.json({ id: "cmpl-no-choices", model: "grok-3", choices: [], }), ), ); const result = await callGrok("test", "test"); expect(result.content).toBe(""); }); it("throws GrokApiError on 429 rate limit", async () => { server.use( http.post("https://api.x.ai/v1/chat/completions", () => HttpResponse.json( { error: { message: "Rate limit exceeded" } }, { status: 429 }, ), ), ); const { GrokApiError } = await import("../../src/types.js"); await expect(callGrok("test", "test")).rejects.toThrow(GrokApiError); }); });

xAI Grok Tax Form Extraction for SMB Accounting

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Configure environment variables

Step 3: Define shared types and error classes

Step 4: Create Zod schemas for each tax form

Step 5: Build the configuration loader

Step 6: Implement PDF text extraction with OCR fallback

Step 7: Create the Grok API client

Step 8: Build the output repair service

Step 9: Wire up the LLM cache with Redis

Step 10: Add budget enforcement and cost telemetry

Step 11: Create the pipeline orchestrator

Step 12: Build the API route

Step 13: Write and run tests

Next steps