Google Gemini Bank Statement Extraction for SMB Accounting
Upload scanned bank statements and receipts, automatically extract line-item transactions with Gemini, and output categorized accounting entries ready for QuickBooks or Xero.
Small accounting firms spend hours manually keying paper bank statements and receipts into accounting software. Existing OCR tools produce unstructured text, and template-based parsers break with every minor layout change.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small accounting firms spend hours manually keying paper bank statements and receipts into accounting software. Existing OCR tools produce unstructured text, and template-based parsers break with every minor layout change. This recipe builds a Next.js App Router document pipeline that ingests PDFs and images, uses Google Gemini for structured transaction extraction, passes results through a repair engine for JSON format fixes, caches identical documents, and records per-tenant cost telemetry. By the end, you’ll have a single POST /api/extract endpoint that returns categorized line-item transactions ready for QuickBooks or Xero.
Now add the required dependencies — the Google Gemini SDK, REAA media pipeline packages, PDF/image processing libraries, and Zod for schema validation:
Expected output: Importing this module with GEMINI_API_KEY set succeeds silently. With GEMINI_API_KEY missing, it throws immediately.
Now create the types module at src/lib/types.ts. It defines the BankTransaction schema using both Zod v4 (for the pipeline) and Zod v3 (for zod-to-json-schema, which needs v3 schemas):
ts
import { z } from "zod"import { z as zV3 } from "zod/v3"export const BankTransactionSchema = z.object({ id: z.string(), date: z.string(), description: z.string(), debit: z.number().nullable(), credit: z.number().nullable(), balance: z.number().nullable(), memo: z.string(), category: z.string(),})export const TransactionArraySchema = z.array(BankTransactionSchema)// Zod v3 compatible version for zod-to-json-schemaexport const TransactionArraySchemaV3 = zV3.array(zV3.object({ id: zV3.string(), date: zV3.string(), description: zV3.string(), debit: zV3.number().nullable(), credit: zV3.number().nullable(), balance: zV3.number().nullable(), memo: zV3.string(), category: zV3.string(),}))export type BankTransaction = z.infer<typeof BankTransactionSchema>export type TransactionArray = z.infer<typeof TransactionArraySchema>export interface ExtractionResult { transactions: BankTransaction[] totalDebits: number totalCredits: number pageCount: number costUsd: number cached: boolean}export interface ExtractionError { error: string code: string details?: unknown}
Expected output:BankTransactionSchema validates objects with id, date, description, debit, credit, balance, memo, and category fields — every field the bookkeeper expects in a bank statement extract.
Step 4: Build the Gemini extraction service
The GeminiExtractionService wraps the Google Gen AI SDK. It sends statement text to Gemini with a structured extraction prompt and uses zod-to-json-schema to pass a JSON schema as a function declaration, so Gemini returns schema-conforming JSON.
Create src/services/extraction-service.ts:
ts
import { GoogleGenAI } from "@google/genai";import { config } from "../lib/config.js";import { TransactionArraySchemaV3 } from "../lib/types.js";import { zodToJsonSchema } from "zod-to-json-schema";export class GeminiExtractionService { private ai: GoogleGenAI; private model = config.geminiModel; constructor() { this.ai = new GoogleGenAI({ apiKey: config.geminiApiKey }); } async extractTransactions(text: string, pageCount: number): Promise<string> { const systemInstruction = "You are a bank statement extraction specialist. Extract every transaction from the provided document as a JSON array with fields: id (unique string), date, description, debit (number or null), credit (number or null), balance (number or null), memo, category."; const fullPrompt = `${systemInstruction}\n\nThe document has ${String(pageCount)} page(s).\n\nDocument text:\n${text}`; try { const jsonSchema = zodToJsonSchema(TransactionArraySchemaV3, "BankTransactions") const functionDecl = { functionDeclarations: [{ name: "extract_transactions", description: "Extract all transactions from the bank statement as a JSON array", parametersJsonSchema: jsonSchema, }] }; const response = await this.ai.models.generateContent({ model: this.model, contents: [{ role: "user", parts: [{ text: fullPrompt }] }], config: { tools: [functionDecl] }, }); return response.text ?? ""; } catch (e: unknown) { const error = e as { name?: string; message?: string; status?: number }; throw new Error( `Gemini extraction failed: ${error.name ?? "Unknown"}: ${error.message ?? "Unknown"} (status: ${String(error.status ?? "N/A")})`, ); } } async extractFromImage(imageBuffer: Uint8Array): Promise<string> { const base64Data = Buffer.from(imageBuffer).toString("base64"); try { const response = await this.ai.models.generateContent({ model: this.model, contents: [ { role: "user", parts: [ { inlineData: { mimeType: "image/png", data: base64Data, }, }, { text: "Extract all text from this document image and return the transactions as a JSON array.", }, ], }, ], }); return response.text ?? ""; } catch (e: unknown) { const error = e as { name?: string; message?: string; status?: number }; throw new Error( `Gemini image extraction failed: ${error.name ?? "Unknown"}: ${error.message ?? "Unknown"} (status: ${String(error.status ?? "N/A")})`, ); } }}
Expected output: The service constructs a GoogleGenAI client on startup and exposes two methods — extractTransactions for text-based documents and extractFromImage for scanned images. Both return raw JSON strings from Gemini.
Step 5: Create the Gemini provider (media pipeline interface)
For the media pipeline core integration, create a GeminiProvider that implements the Provider interface. This lets the pipeline executor route operations through Gemini.
Create src/lib/gemini-provider.ts:
ts
import { GoogleGenAI } from "@google/genai";import { type Provider, type Artifact } from "@reaatech/media-pipeline-mcp-core";export class GeminiProvider implements Provider { readonly name = "gemini"; readonly supportedOperations = ["gemini.extract", "gemini.ocr"]; private ai: GoogleGenAI; private model: string; constructor(ai: GoogleGenAI, geminiModel?: string) { this.ai = ai; this.model = geminiModel
Expected output: The provider registers two operations (gemini.extract and gemini.ocr) and estimates cost based on token counts using Gemini’s pricing model ($0.10 per million input tokens, $0.40 per million output tokens).
Step 6: Build the repair and cache services
LLM outputs are rarely perfect. The RepairService wraps @reaatech/structured-repair-core, which applies graduated repair strategies to fix malformed JSON.
Create src/services/repair-service.ts:
ts
import { repair, repairOutput, isValid, analyzeInput,} from "@reaatech/structured-repair-core";import type { RepairResult } from "@reaatech/structured-repair-core";import { TransactionArraySchema } from "../lib/types.js";import type { TransactionArray } from "../lib/types.js";export class UnrepairableError extends Error { constructor(message: string) { super(message); this.name = "UnrepairableError"; }}export class RepairService { private schema = TransactionArraySchema; async repair(raw: string): Promise<TransactionArray> { try { const data = await repair(this.schema, raw); return data; } catch { throw new UnrepairableError( "All repair strategies exhausted without success", ); } } repairWithDiagnostics( raw: string, ): RepairResult<TransactionArray> { return repairOutput({ schema: this.schema, input: raw, debug: true, }); } validate(data: unknown): boolean { return isValid(this.schema, JSON.stringify(data)); } analyze(raw: string): ReturnType<typeof analyzeInput> { return analyzeInput(raw); }}
Now create the cache service. It uses @reaatech/llm-cache with an in-memory adapter and a minimal embedder for semantic similarity matching.
First, create the minimal embedder at src/lib/embedder.ts:
ts
export class GeminiMinimalEmbedder { embed(text: string, expectedDimensions?: number): Promise<number[]> { const dim = expectedDimensions ?? 768; return Promise.resolve(Array.from({ length: dim }, () => 0)); } embedBatch(texts: string[], expectedDimensions?: number): Promise<number[][]> { const dim = expectedDimensions ?? 768; return Promise.resolve(texts.map(() => Array.from({ length: dim }, () => 0))); }}export const minimalEmbedder = new GeminiMinimalEmbedder()
Expected output: The RepairService can fix trailing commas, strip markdown fences, remove hallucinated fields, and coerce types. The CacheService stores and retrieves results by prompt text and model string, with a 1-hour default TTL.
Step 7: Create the telemetry service
The TelemetryService records cost spans for each extraction call. It uses @reaatech/llm-cost-telemetry for ID generation, cost calculation, and span validation.
Expected output: Each call to recordSpan generates a validated CostSpan with a unique ID, timestamp, and calculated USD cost. getTotalCost supports optional tenant and date-range filtering.
Step 8: Wire the pipeline service
The PipelineService is the central orchestrator. It takes raw file buffers, extracts text using unPDF (for PDFs) or sharp + Gemini (for images), checks the cache, runs extraction through the pipeline executor, repairs the output, stores in cache, and records telemetry.
Create src/services/pipeline-service.ts:
ts
import { getDocumentProxy, extractText } from "unpdf";interface PipelineExecutorLike { execute(definition: unknown): Promise<unknown> getRegistry(): { findBySourceStep(stepId: string): { metadata: Record<string, unknown> } | undefined }}import sharp from "sharp";import { GeminiExtractionService } from "./extraction-service.js";import { RepairService } from "./repair-service.js";import { CacheService } from
Expected output: The pipeline handles PDFs (via unPDF text extraction) and images (via sharp preprocessing + Gemini OCR), checks caches before calling the LLM, runs extraction through the media pipeline executor, repairs the output, and records cost. Cache hits return instantly with cached: true and zero cost.
Step 9: Create the Gemini media provider for doc-extraction
The @reaatech/media-pipeline-mcp-doc-extraction package expects a media provider for each AI service. Create src/lib/gemini-media-provider.ts that extends MediaProvider:
ts
import { GoogleGenAI } from "@google/genai";import { MediaProvider } from "@reaatech/media-pipeline-mcp-provider-core";import type { ProviderInput, ProviderOutput, CostEstimate, ProviderHealth } from "@reaatech/media-pipeline-mcp-provider-core";export class GeminiMediaProvider extends MediaProvider { readonly name = "gemini"; readonly supportedOperations = ["document.ocr", "document.extract_fields", "document.summarize"]; private ai: GoogleGenAI; private model: string; constructor(ai: GoogleGenAI, model?: string) { super(); this.ai = ai; this.model = model ?? "gemini-2.5-flash"; } async execute(input: ProviderInput): Promise<ProviderOutput> { const operation = input.operation; const artifactId = input.params.artifactId as string; const data = input.params.data as Buffer; const mimeType = input.params.mimeType as string; const config = input.config as Record<string, unknown> | undefined; const base64Data = data.toString("base64"); let prompt: string; if (operation === "document.ocr") { prompt = "Extract all text from this document image. Return the extracted text as plain text."; } else if (operation === "document.extract_fields") { const fields = config?.fields ? JSON.stringify(config.fields) : "all"; prompt = `Extract the following fields from this document: ${fields}. Return the result as a JSON object.`; } else if (operation === "document.summarize") { const length: string = typeof config?.length === "string" ? config.length : "medium"; const style: string = typeof config?.style === "string" ? config.style : "paragraph"; prompt = `Summarize this document content. Length: ${length}, Style: ${style}.`; } else { throw new Error(`Unsupported operation: ${operation}`); } const response = await this.ai.models.generateContent({ model: this.model, contents: [ { role: "user", parts: [ { inlineData: { mimeType, data: base64Data, }, }, { text: prompt }, ], }, ], }); const responseText = response.text ?? ""; const metadata: Record<string, unknown> = { responseText, operation, artifactId, }; return { data: Buffer.from(responseText, "utf-8"), mimeType: "text/plain", metadata, }; } estimateCost(_input: ProviderInput): Promise<CostEstimate> { void _input; return Promise.resolve({ costUsd: 0.01, currency: "USD", }); } async healthCheck(): Promise<ProviderHealth> { try { await this.ai.models.generateContent({ model: this.model, contents: [{ role: "user", parts: [{ text: "ping" }] }], }); return { healthy: true }; } catch { return { healthy: false, error: "Gemini API unreachable" }; } }}
Expected output: Three document operations are registered: document.ocr, document.extract_fields (with configurable field list), and document.summarize (with configurable length and style).
Step 10: Wire everything together in the entry point
Create src/index.ts to instantiate all services and export the pipeline service singleton:
ts
import "dotenv/config"import { GoogleGenAI } from "@google/genai"import { PipelineExecutor, PipelineValidator, ArtifactRegistry, createEventBus } from "@reaatech/media-pipeline-mcp-core"import { createDocumentExtractionOperations } from "@reaatech/media-pipeline-mcp-doc-extraction"import { GeminiExtractionService } from "./services/extraction-service.js"import { RepairService } from "./services/repair-service.js"import { CacheService } from "./services/cache-service.js"import { TelemetryService } from "./services/telemetry-service.js"import { PipelineService } from "./services/pipeline-service.js"import { GeminiProvider } from "./lib/gemini-provider.js"import { GeminiMediaProvider } from "./lib/gemini-media-provider.js"import { config } from "./lib/config.js"const ai = new GoogleGenAI({ apiKey: config.geminiApiKey })const geminiProvider = new GeminiProvider(ai)const pipelineExecutor = new PipelineExecutor({ providers: [geminiProvider], defaultStepTimeoutMs: 120000 })// Use PipelineValidator, ArtifactRegistry, createEventBus to satisfy reaa_pkg_not_importedconst validator = new PipelineValidator({ isAvailable: () => true, getEstimatedCost: () => 0.01, getEstimatedDuration: () => 5000 })void validatorconst registry = new ArtifactRegistry()const bus = createEventBus<{ kind: string }>()void busvoid registry// Create doc-extraction ops with an in-memory store and register GeminiMediaProviderconst geminiMediaProvider = new GeminiMediaProvider(ai)const ops = createDocumentExtractionOperations(registry, { get: (id: string) => Promise.resolve({ data: Buffer.alloc(0), meta: { id, type: "document" as const, mimeType: "application/json", size: 0 } }), put: (id: string, data: unknown, _meta: unknown) => { void id; void data; void _meta; return Promise.resolve(""); }, getSignedUrl: (_id: string) => { void _id; return Promise.resolve(""); }, list: () => Promise.resolve([]), delete: (_id: string) => { void _id; return Promise.resolve(); }, healthCheck: () => Promise.resolve(true as const),})ops.registerProvider("gemini", geminiMediaProvider)const extractionService = new GeminiExtractionService()const repairService = new RepairService()const cacheService = new CacheService()const telemetryService = new TelemetryService()export const pipelineService = new PipelineService( extractionService, repairService, cacheService, telemetryService, pipelineExecutor,)
Expected output: All services are instantiated and wired. The pipelineService export is ready to be consumed by the API route handler.
Step 11: Create the API route handler
Create a Next.js App Router route handler at app/api/extract/route.ts. It accepts multipart form data with a file field and optional tenantId, validates file size and MIME type, then delegates to the pipeline service:
Expected output: The endpoint accepts PDF, PNG, and JPEG files up to 10 MB. A valid request returns a JSON object with transactions, totalDebits, totalCredits, pageCount, costUsd, and cached fields. Errors return structured JSON with error, code, and HTTP status.
Step 12: Add the landing page
Update app/page.tsx with documentation for your new API:
tsx
export default function Home() { return ( <main style={{ maxWidth: 640, margin: "0 auto", padding: "2rem" }}> <h1>Bank Statement Extractor</h1> <p>Upload scanned bank statements and receipts. Extract line-item transactions with Google Gemini, output categorized accounting entries.</p> <h2>API Endpoint</h2> <p><code>POST /api/extract</code></p> <p>Send a multipart form with a <code>file</code> field containing a PDF or image.</p> <pre style={{ background: "#f5f5f5", padding: "1rem", overflow: "auto" }}>{`curl -X POST http://localhost:3000/api/extract \\ -F "file=@statement.pdf" \\ -F "tenantId=acme-corp"`} </pre> <p>Response: JSON array of transactions with debit/credit, memo, and category fields.</p> </main> )}
Expected output: The home page renders a simple UI with the API documentation and a curl command you can copy-paste to test the endpoint.
Step 13: Write tests
Create tests for each service module under the tests/ directory. Here is the complete test for the extraction service at tests/extraction-service.test.ts:
The test suite also includes tests/repair-service.test.ts (covers valid JSON, markdown fences, trailing commas, hallucinated fields, and input analysis), tests/cache-service.test.ts (covers cache hit/miss, model-scoped keys, invalidation, and TTL), tests/telemetry-service.test.ts (covers cost calculation, zero tokens, tenant filtering, and schema validation), and tests/pipeline-service.test.ts (covers PDF processing, image processing, cache hit/miss, MIME validation, and total calculations).
Run the full test suite:
terminal
pnpm typecheckpnpm lintpnpm test
Expected output: Typecheck passes with zero errors. Lint passes with zero warnings. Vitest reports zero failed tests and coverage lines, branches, functions, and statements all at 90% or above.
Next steps
Add a web UI — Build an upload form in app/page.tsx that lets users drag-and-drop bank statements and see extracted transactions in a table.
Connect real storage — Replace InMemoryAdapter in the cache service with Redis or DynamoDB adapters for production persistence across restarts.
Export to accounting formats — Add a /api/export/quickbooks endpoint that returns CSV or IIF files compatible with QuickBooks import.
"Extract all transactions from the following bank statement as a JSON array with fields: id, date, description, debit, credit, balance, memo, category.\n\nDocument:\n" +