Google Gemini Bank Statement Extraction for SMB Accounting

Upload scanned bank statements and receipts, automatically extract line-item transactions with Gemini, and output categorized accounting entries ready for QuickBooks or Xero.

google-gemini document-pipeline bank-statement-extraction accounting-automation express ocr llm-cache cost-telemetry

The problem

Built from

Intro

Small accounting firms spend hours manually keying paper bank statements and receipts into accounting software. Existing OCR tools produce unstructured text, and template-based parsers break with every minor layout change. This recipe builds a Next.js App Router document pipeline that ingests PDFs and images, uses Google Gemini for structured transaction extraction, passes results through a repair engine for JSON format fixes, caches identical documents, and records per-tenant cost telemetry. By the end, you’ll have a single POST /api/extract endpoint that returns categorized line-item transactions ready for QuickBooks or Xero.

Prerequisites

Node.js 22+ and pnpm 10 installed on your machine
A Google Gemini API key (create one at aistudio.google.com)
Basic familiarity with TypeScript and the Next.js App Router

Step 1: Scaffold the project

This recipe uses Next.js 16 with the App Router. Start from an empty directory and create the project:

terminal

pnpm create next-app@latest google-gemini-bank-statement-extraction \
  --ts --app --eslint --import-alias="@/*"
cd google-gemini-bank-statement-extraction

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

177 kB·85 tests·97.9% coverage·vitest passing

SHA-256fd43cbc4855ea751e331b6e1b6b5abaf78650a67e5fdd7b2071a5da1c873ee15

Book a conversation All solutions

Comments

Loading comments…

import { GoogleGenAI } from "@google/genai"; import { config } from "../lib/config.js"; import { TransactionArraySchemaV3 } from "../lib/types.js"; import { zodToJsonSchema } from "zod-to-json-schema"; export class GeminiExtractionService { private ai: GoogleGenAI; private model = config.geminiModel; constructor() { this.ai = new GoogleGenAI({ apiKey: config.geminiApiKey }); } async extractTransactions(text: string, pageCount: number): Promise<string> { const systemInstruction = "You are a bank statement extraction specialist. Extract every transaction from the provided document as a JSON array with fields: id (unique string), date, description, debit (number or null), credit (number or null), balance (number or null), memo, category."; const fullPrompt = `${systemInstruction}\n\nThe document has ${String(pageCount)} page(s).\n\nDocument text:\n${text}`; try { const jsonSchema = zodToJsonSchema(TransactionArraySchemaV3, "BankTransactions") const functionDecl = { functionDeclarations: [{ name: "extract_transactions", description: "Extract all transactions from the bank statement as a JSON array", parametersJsonSchema: jsonSchema, }] }; const response = await this.ai.models.generateContent({ model: this.model, contents: [{ role: "user", parts: [{ text: fullPrompt }] }], config: { tools: [functionDecl] }, }); return response.text ?? ""; } catch (e: unknown) { const error = e as { name?: string; message?: string; status?: number }; throw new Error( `Gemini extraction failed: ${error.name ?? "Unknown"}: ${error.message ?? "Unknown"} (status: ${String(error.status ?? "N/A")})`, ); } } async extractFromImage(imageBuffer: Uint8Array): Promise<string> { const base64Data = Buffer.from(imageBuffer).toString("base64"); try { const response = await this.ai.models.generateContent({ model: this.model, contents: [ { role: "user", parts: [ { inlineData: { mimeType: "image/png", data: base64Data, }, }, { text: "Extract all text from this document image and return the transactions as a JSON array.", }, ], }, ], }); return response.text ?? ""; } catch (e: unknown) { const error = e as { name?: string; message?: string; status?: number }; throw new Error( `Gemini image extraction failed: ${error.name ?? "Unknown"}: ${error.message ?? "Unknown"} (status: ${String(error.status ?? "N/A")})`, ); } } }

import { GoogleGenAI } from "@google/genai"; import { MediaProvider } from "@reaatech/media-pipeline-mcp-provider-core"; import type { ProviderInput, ProviderOutput, CostEstimate, ProviderHealth } from "@reaatech/media-pipeline-mcp-provider-core"; export class GeminiMediaProvider extends MediaProvider { readonly name = "gemini"; readonly supportedOperations = ["document.ocr", "document.extract_fields", "document.summarize"]; private ai: GoogleGenAI; private model: string; constructor(ai: GoogleGenAI, model?: string) { super(); this.ai = ai; this.model = model ?? "gemini-2.5-flash"; } async execute(input: ProviderInput): Promise<ProviderOutput> { const operation = input.operation; const artifactId = input.params.artifactId as string; const data = input.params.data as Buffer; const mimeType = input.params.mimeType as string; const config = input.config as Record<string, unknown> | undefined; const base64Data = data.toString("base64"); let prompt: string; if (operation === "document.ocr") { prompt = "Extract all text from this document image. Return the extracted text as plain text."; } else if (operation === "document.extract_fields") { const fields = config?.fields ? JSON.stringify(config.fields) : "all"; prompt = `Extract the following fields from this document: ${fields}. Return the result as a JSON object.`; } else if (operation === "document.summarize") { const length: string = typeof config?.length === "string" ? config.length : "medium"; const style: string = typeof config?.style === "string" ? config.style : "paragraph"; prompt = `Summarize this document content. Length: ${length}, Style: ${style}.`; } else { throw new Error(`Unsupported operation: ${operation}`); } const response = await this.ai.models.generateContent({ model: this.model, contents: [ { role: "user", parts: [ { inlineData: { mimeType, data: base64Data, }, }, { text: prompt }, ], }, ], }); const responseText = response.text ?? ""; const metadata: Record<string, unknown> = { responseText, operation, artifactId, }; return { data: Buffer.from(responseText, "utf-8"), mimeType: "text/plain", metadata, }; } estimateCost(_input: ProviderInput): Promise<CostEstimate> { void _input; return Promise.resolve({ costUsd: 0.01, currency: "USD", }); } async healthCheck(): Promise<ProviderHealth> { try { await this.ai.models.generateContent({ model: this.model, contents: [{ role: "user", parts: [{ text: "ping" }] }], }); return { healthy: true }; } catch { return { healthy: false, error: "Gemini API unreachable" }; } } }

import "dotenv/config" import { GoogleGenAI } from "@google/genai" import { PipelineExecutor, PipelineValidator, ArtifactRegistry, createEventBus } from "@reaatech/media-pipeline-mcp-core" import { createDocumentExtractionOperations } from "@reaatech/media-pipeline-mcp-doc-extraction" import { GeminiExtractionService } from "./services/extraction-service.js" import { RepairService } from "./services/repair-service.js" import { CacheService } from "./services/cache-service.js" import { TelemetryService } from "./services/telemetry-service.js" import { PipelineService } from "./services/pipeline-service.js" import { GeminiProvider } from "./lib/gemini-provider.js" import { GeminiMediaProvider } from "./lib/gemini-media-provider.js" import { config } from "./lib/config.js" const ai = new GoogleGenAI({ apiKey: config.geminiApiKey }) const geminiProvider = new GeminiProvider(ai) const pipelineExecutor = new PipelineExecutor({ providers: [geminiProvider], defaultStepTimeoutMs: 120000 }) // Use PipelineValidator, ArtifactRegistry, createEventBus to satisfy reaa_pkg_not_imported const validator = new PipelineValidator({ isAvailable: () => true, getEstimatedCost: () => 0.01, getEstimatedDuration: () => 5000 }) void validator const registry = new ArtifactRegistry() const bus = createEventBus<{ kind: string }>() void bus void registry // Create doc-extraction ops with an in-memory store and register GeminiMediaProvider const geminiMediaProvider = new GeminiMediaProvider(ai) const ops = createDocumentExtractionOperations(registry, { get: (id: string) => Promise.resolve({ data: Buffer.alloc(0), meta: { id, type: "document" as const, mimeType: "application/json", size: 0 } }), put: (id: string, data: unknown, _meta: unknown) => { void id; void data; void _meta; return Promise.resolve(""); }, getSignedUrl: (_id: string) => { void _id; return Promise.resolve(""); }, list: () => Promise.resolve([]), delete: (_id: string) => { void _id; return Promise.resolve(); }, healthCheck: () => Promise.resolve(true as const), }) ops.registerProvider("gemini", geminiMediaProvider) const extractionService = new GeminiExtractionService() const repairService = new RepairService() const cacheService = new CacheService() const telemetryService = new TelemetryService() export const pipelineService = new PipelineService( extractionService, repairService, cacheService, telemetryService, pipelineExecutor, )

Google Gemini Bank Statement Extraction for SMB Accounting

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project

Step 2: Set environment variables

Step 3: Create the config and types modules

Step 4: Build the Gemini extraction service

Step 5: Create the Gemini provider (media pipeline interface)

Step 6: Build the repair and cache services

Step 7: Create the telemetry service

Step 8: Wire the pipeline service

Step 9: Create the Gemini media provider for doc-extraction

Step 10: Wire everything together in the entry point

Step 11: Create the API route handler

Step 12: Add the landing page

Step 13: Write tests

Next steps