Skip to content
/ solutions / azure-ai-document-pipeline-for-quickbooks-online-invoice-processing Azure AI Document Pipeline for QuickBooks Online Invoice Processing Automatically extract, validate, and post vendor invoice data from PDFs or images directly into QuickBooks Online using Azure OpenAI.
The problem SMBs waste hours manually keying invoice data into QuickBooks, risking errors, delayed payments, and poor cash-flow visibility.
Example artifact A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
76 tests · 100.0% coverage · vitest passing
Comments Sign in to commentSign in with GitHub to comment and vote.
© 2026 REAA Technologies Inc. MIT-licensed open-source packages. No vendor lock-in. No bullshit.
On this page Intro
In this tutorial, you’ll build an invoice processing pipeline that reads a PDF or image file, sends it to Azure OpenAI for field extraction, validates the extracted data using confidence scoring, redacts any PII before posting, and creates a bill in QuickBooks Online — all within a Next.js app. By the end, you’ll have a working endpoint at POST /api/upload that accepts a multipart/form-data document and a GET /api/status/:jobId route for polling the result. The pipeline enforces a per-document cost budget so you don’t accidentally blow through your Azure quota on a batch of large files.
Prerequisites
Node.js 22 or higher
pnpm 10.x (npm install -g pnpm)
An Azure OpenAI deployment (GPT-4 or equivalent) with endpoint, API key, and deployment name
A QuickBooks Online OAuth2 app with client ID, client secret, realm ID, and refresh token
Step 1: Scaffold the Next.js project
Create a fresh Next.js project using the App Router. The app/ directory holds your route handlers — app/api/upload/route.ts for the upload endpoint and app/api/status/[jobId]/route.ts for the status endpoint.
pnpm create next@latest invoice-pipeline --typescript --no-eslint --no-tailwind --no-src-dir --app --import-alias "@/*"
Next.js scaffolding completes and prints the project structure, ending with
Expected output:
Created invoice-pipeline/.
Step 2: Install dependencies The project uses Azure OpenAI for parsing, pdfjs-dist + sharp for document preprocessing, zod for schema validation, and several @reaatech packages for confidence routing, PII redaction, and budget enforcement. Install them all along with the dev dependencies for testing.
cd invoice-pipeline
pnpm add \
@azure/openai@2.0.0 \
@reaatech/agent-budget-engine@0.1.0 \
@reaatech/agent-budget-spend-tracker@0.1.0 \
@reaatech/agent-budget-types@0.1.0 \
@reaatech/confidence-router-classifiers@0.1.0 \
@reaatech/confidence-router-core@0.1.0 \
@reaatech/guardrail-chain@0.1.0 \
@reaatech/guardrail-chain-guardrails@0.1.0 \
jose@6.2.3 \
pdfjs-dist@5.7.284 \
sharp@0.34.5 \
zod@3.23.8
pnpm add -D \
@types/node@25.8.0 \
@vitest/coverage-v8@4.1.6 \
vitest@4.1.6 \
typescript@6.0.3 Expected output: pnpm prints a table of added packages and finishes with Done in Xs.
Step 3: Configure environment variables Create a .env.local file in the project root with your Azure OpenAI credentials, QuickBooks OAuth2 app values, and the per-document budget cap. The app reads these via loadConfigFromEnv() in src/types/config.ts.
# Azure OpenAI
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4
AZURE_PRICE_INPUT=0.001
AZURE_PRICE_OUTPUT=0.003
# QuickBooks Online OAuth2
QUICKBOOKS_CLIENT_ID=your-client-id
QUICKBOOKS_CLIENT_SECRET=your-client-secret
QUICKBOOKS_REALM_ID=your-realm-id
QUICKBOOKS_REFRESH_TOKEN=your-refresh-token
# Budget enforcement — max USD spent per document
BUDGET_PER_DOCUMENT=0.05
Step 4: Define shared types Create the type definitions that the entire pipeline uses. These live in src/types/.
Create src/types/invoice.ts:
export interface LineItem {
description : string ;
quantity : number ;
unitPrice : number ;
amount : number ;
}
export interface InvoiceData {
vendorName : string ;
invoiceNumber : string ;
invoiceDate : string ;
dueDate : string ;
totalAmount : number ;
currency : string ;
lineItems : LineItem [];
}
export interface ExtractedInvoiceData extends InvoiceData {
confidenceScores ?: Record < string , number >;
} Create src/types/pipeline.ts:
import type { ExtractedInvoiceData } from "./invoice.js" ;
export enum PipelineStatus {
Processing = "Processing" ,
Completed = "Completed" ,
Failed = "Failed" ,
NeedsReview = "NeedsReview" ,
}
export interface ProcessingResult {
status : PipelineStatus ;
data ?: ExtractedInvoiceData ;
errors : string [];
needsReview : boolean ;
}
export interface JobRecord {
id : string ;
fileBuffer ?: Buffer ;
status : PipelineStatus ;
createdAt : Date ;
result ?: ProcessingResult ;
} Create src/types/config.ts:
export interface AppConfig {
azure : {
endpoint : string ;
apiKey : string ;
deployment : string ;
priceInput : number ;
priceOutput : number ;
};
quickbooks : {
clientId : string ;
clientSecret : string ;
realmId : string ;
refreshToken : string ;
};
budget : {
perDocument : number ;
};
thresholds : {
route : number ;
fallback : number ;
};
repair : {
maxAttempts : number ;
};
}
export function loadConfigFromEnv () : AppConfig {
const missing : string [] = [];
const or = (key : string , fallback : string ) : string =>
process.env[key] ?? fallback;
const req = (key : string ) : string => {
const val = process.env[key];
if (val === undefined || val === "" ) {
missing. push (key);
return "<missing>" ;
}
return val;
};
const num = (key : string , fallback : number ) : number => {
const raw = process.env[key] ?? String (fallback);
const parsed = Number. parseFloat (raw);
if ( ! Number. isFinite (parsed)) {
return fallback;
}
return parsed;
};
const config : AppConfig = {
azure: {
endpoint: req ( "AZURE_OPENAI_ENDPOINT" ),
apiKey: req ( "AZURE_OPENAI_API_KEY" ),
deployment: req ( "AZURE_OPENAI_DEPLOYMENT" ),
priceInput: num ( "AZURE_PRICE_INPUT" , 0.001 ),
priceOutput: num ( "AZURE_PRICE_OUTPUT" , 0.003 ),
},
quickbooks: {
clientId: req ( "QUICKBOOKS_CLIENT_ID" ),
clientSecret: req ( "QUICKBOOKS_CLIENT_SECRET" ),
realmId: req ( "QUICKBOOKS_REALM_ID" ),
refreshToken: or ( "QUICKBOOKS_REFRESH_TOKEN" , "" ),
},
budget: {
perDocument: num ( "BUDGET_PER_DOCUMENT" , 0.05 ),
},
thresholds: {
route: 0.8 ,
fallback: 0.3 ,
},
repair: {
maxAttempts: 3 ,
},
};
if (missing.length > 0 ) {
throw new Error ( `Missing required env vars: ${ missing . join ( ", " ) }` );
}
return config;
}
Step 5: Build the pipeline modules The pipeline has four stages: parse, repair, validate, and export. Each is a separate module.
Job store Create src/lib/job-store.ts:
import type { JobRecord } from "../types/pipeline.js" ;
export const jobStore = new Map < string , JobRecord >();
Document parser Create src/ingestion/parser.ts. This module accepts a PDF or image buffer, converts images to base64 PNG via sharp, sends the document to Azure OpenAI’s chat completions endpoint with a vision prompt, and returns structured invoice fields with confidence scores.
import { getDocument } from "pdfjs-dist/legacy/build/pdf.mjs" ;
import sharp from "sharp" ;
import type { AppConfig } from "../types/config.js" ;
import type { ExtractedInvoiceData } from "../types/invoice.js" ;
const MAX_FILE_SIZE = 20 * 1024 * 1024 ; // 20 MB
const AI_TIMEOUT_MS = 15_000 ;
function computeConfidence (data : Record < string , unknown >, scores : Record < string , number >) : Record < string ,
Structured repair Create src/repair/repair.ts. This module uses zod to parse and coerce the raw Azure output. It handles numeric strings being passed as numbers, normalizes snake_case field names to camelCase, and applies fallback logic for missing vendor names.
import { z } from "zod" ;
import type { ExtractedInvoiceData } from "../types/invoice.js" ;
import { PipelineStatus } from "../types/pipeline.js" ;
const lineItemSchema = z. object ({
description: z. string (),
quantity: z. union ([z. number (), z. string ()]). transform ((v) => {
if ( typeof v === "number" ) return v;
const n = Number. parseFloat (v);
return Number. isNaN (n) ? 0 : n;
Confidence classification Create src/classify/index.ts. This module registers a keyword-based classifier with the ClassifierRegistry, merges the route/fallback thresholds from config, and runs the DecisionEngine over each extracted field. Fields below the fallback threshold (0.3) are flagged for human review.
import {
DecisionEngine,
mergeConfig,
type Prediction,
type RoutingDecision,
} from "@reaatech/confidence-router-core" ;
import {
KeywordClassifier,
ClassifierRegistry,
} from "@reaatech/confidence-router-classifiers" ;
import type { ExtractedInvoiceData } from "../types/invoice.js" ;
import type { AppConfig } from "../types/config.js" ;
export async function validateInvoiceFields (
data : ExtractedInvoiceData ,
config : AppConfig ,
) : Promise <{
routing : Record < string , RoutingDecision [ "type" ]>;
needsReview : boolean ;
reviewFields : string [];
}> {
const registry = new ClassifierRegistry ();
const keywordClassifier = new KeywordClassifier (
[
{
label: "high_confidence_vendor" ,
keywords: [ "inc" , "llc" , "corp" , "ltd" , "gmbh" , "co." , "company" ],
weight: 1.0 ,
mode: "substring" ,
},
],
{ name: "vendor-quality" , caseSensitive: false },
);
registry. register (keywordClassifier);
const config_2 = mergeConfig ({
routeThreshold: config.thresholds.route,
fallbackThreshold: config.thresholds.fallback,
clarificationEnabled: true ,
});
const engine = new DecisionEngine (config_2);
const scores = data.confidenceScores ?? {};
const fields = [ "vendorName" , "invoiceNumber" , "invoiceDate" , "dueDate" , "totalAmount" , "currency" ] as const ;
const vendorClassification = await keywordClassifier. classify (data.vendorName);
const vendorBoost = vendorClassification.predictions?.[ 0 ]?.confidence ?? 0 ;
const predictions : Prediction [] = fields. map ((field) => {
const rawScore = scores[field];
let confidence = rawScore === undefined || rawScore === null ? 0 : rawScore;
if (field === "vendorName" && vendorBoost > 0 && confidence >= 0.7 ) {
confidence = Math. min ( 1 , confidence + 0.05 );
}
return { label: field, confidence };
});
const fieldRouting : Record < string , RoutingDecision [ "type" ]> = {};
const reviewFields : string [] = [];
let needsReview = false ;
for ( const pred of predictions) {
const decision = engine. decide ({
predictions: [pred],
});
fieldRouting[pred.label] = decision.type;
if (decision.type === "FALLBACK" ) {
needsReview = true ;
reviewFields. push (pred.label);
}
}
return { routing: fieldRouting, needsReview, reviewFields };
}
export function getReviewFields (data : ExtractedInvoiceData ) : string [] {
return Object. entries (data.confidenceScores ?? {})
. filter (([, score]) => score < 0.3 )
. map (([field]) => field);
}
QuickBooks client and export Create src/export/client.ts. This client manages OAuth2 token refresh using jose, caches the token in module-level variables, and retries on 401 and 429 responses.
export interface QBOBillLine {
Id : string ;
DetailType : "ItemBasedExpenseLineDetail" ;
Amount : number ;
Description : string ;
ItemBasedExpenseLineDetail : {
ItemRef : { value : string ; name : string };
Qty : number ;
UnitPrice : number ;
};
}
export interface QBOBillPayload {
VendorRef : { value : string ; name : string };
DocNumber
Create src/export/quickbooks.ts. This module applies the PIIRedaction guardrail from @reaatech/guardrail-chain-guardrails to vendor names and line-item descriptions before constructing the QBO Bill payload.
import type { ExtractedInvoiceData } from "../types/invoice.js" ;
import { QuickBooksClient } from "./client.js" ;
import { PIIRedaction } from "@reaatech/guardrail-chain-guardrails" ;
import { createChainContext } from "@reaatech/guardrail-chain" ;
async function redactPII (text : string ) : Promise < string > {
const guardrail = new PIIRedaction ({ redactionStrategy: "mask" });
const context = createChainContext (text, { maxLatencyMs: 1000 , maxTokens: 4000 });
const result = await guardrail. execute (text, context);
if (result.passed && result.output !== undefined ) {
return result.output;
}
let output = text;
output = output. replace ( /[a-zA-Z0-9._%+-] + @[a-zA-Z0-9.-] + \.[a-zA-Z] {2,} / g , "[REDACTED]" );
output = output. replace ( /(?:\+ ? 1[-.\s] ? ) ? \( ? [0-9] {3} \) ? [-.\s] ? [0-9] {3} [-.\s] ? [0-9] {4} / g , "[REDACTED]" );
output = output. replace ( / \b \d {3} [-.\s] ? \d {2} [-.\s] ? \d {4}\b / g , "[REDACTED]" );
output = output. replace ( / \b (?:\d {4} [-.\s] ? ) {3} \d {4}\b / g , "[REDACTED]" );
return output;
}
export async function postInvoiceToQBO (
data : ExtractedInvoiceData ,
client : Pick < QuickBooksClient , "postBill" >,
) : Promise <{ billId : string ; status : string }> {
const [redactedVendor, redactedInvoiceNumber, ... redactedDescriptions] = await Promise . all ([
redactPII (data.vendorName),
redactPII (data.invoiceNumber),
... data.lineItems. map ((item) => redactPII (item.description)),
]);
const lines = data.lineItems. map ((item, i) => ({
Id: String (i + 1 ),
DetailType: "ItemBasedExpenseLineDetail" as const ,
Amount: item.amount,
Description: redactedDescriptions[i] ?? item.description,
ItemBasedExpenseLineDetail: {
ItemRef: { value: "1" , name: "Services" },
Qty: item.quantity,
UnitPrice: item.unitPrice,
},
}));
const bill = {
VendorRef: { value: "1" , name: redactedVendor },
DocNumber: redactedInvoiceNumber,
TxnDate: data.invoiceDate,
DueDate: data.dueDate,
TotalAmt: data.totalAmount,
CurrencyRef: { value: data.currency },
Line: lines,
};
const result = await client. postBill (bill);
return { billId: result.Id, status: result.status };
}
Budget enforcement Create src/budget/index.ts. This module wraps the BudgetController from @reaatech/agent-budget-engine, implements an AzureOpenAIPricing class to compute token costs, and exposes defineDocumentBudget, checkBudget, and recordSpend for the pipeline.
import { BudgetController } from "@reaatech/agent-budget-engine" ;
import { SpendStore } from "@reaatech/agent-budget-spend-tracker" ;
import { BudgetScope, BudgetPolicy } from "@reaatech/agent-budget-types" ;
interface PricingProvider {
estimateCost (args : { model : string ; inputTokens : number ; outputTokens : number }) : number ;
}
export class AzureOpenAIPricing implements PricingProvider {
private readonly priceInput : number ;
private readonly priceOutput : number ;
constructor (priceInput : number , priceOutput : number ) {
this.priceInput = priceInput;
this.priceOutput = priceOutput;
}
estimateCost (args : { model : string ; inputTokens : number ; outputTokens : number }) : number {
void args.model;
const inputCost = (args.inputTokens / 1000 ) * this.priceInput;
const outputCost = (args.outputTokens / 1000 ) * this.priceOutput;
return inputCost + outputCost;
}
}
interface BudgetCheckResult {
allowed : boolean ;
remaining : number ;
spent : number ;
}
const store = new SpendStore ();
const controller = new BudgetController ({ spendTracker: store });
controller. on ( "hard-stop" , (event) => {
console. error ( `[Budget] Hard stop triggered for ${ event . scopeType }:${ event . scopeKey }, spent: ${ event . spent }, limit: ${ event . limit }` );
});
controller. on ( "threshold-breach" , (event) => {
console. warn ( `[Budget] Threshold breach at ${ ( event . threshold * 100 ). toFixed ( 0 ) }% for ${ event . scopeType }:${ event . scopeKey }` );
});
export function defineDocumentBudget (jobId : string , limit : number ) : void {
const policy : BudgetPolicy = {
softCap: 0.8 ,
hardCap: 1.0 ,
autoDowngrade: [],
disableTools: [],
};
try {
controller. defineBudget ({
scopeType: BudgetScope.Task,
scopeKey: jobId,
limit,
policy,
});
} catch {
void 0 ;
}
}
export function checkBudget (jobId : string , estimatedCost : number ) : BudgetCheckResult {
const result = controller. check ({
scopeType: BudgetScope.Task,
scopeKey: jobId,
estimatedCost,
modelId: "azure-openai" ,
tools: [],
});
return {
allowed: result.allowed,
remaining: result.remaining,
spent: 0 ,
};
}
export function recordSpend (
jobId : string ,
cost : number ,
inputTokens : number ,
outputTokens : number ,
) : void {
controller. record ({
requestId: `doc-${ jobId }` ,
scopeType: BudgetScope.Task,
scopeKey: jobId,
cost,
inputTokens,
outputTokens,
modelId: "azure-openai" ,
provider: "azure" ,
timestamp: new Date (),
});
}
export function getDocumentState (jobId : string ) : import ( "@reaatech/agent-budget-types" ). BudgetState {
return controller. getState (BudgetScope.Task, jobId) as import ( "@reaatech/agent-budget-types" ). BudgetState ;
}
export function resetDocumentBudget (jobId : string ) : void {
controller. reset (BudgetScope.Task, jobId);
}
Assemble the pipeline Create src/pipeline.ts. This ties all the modules together: budget check, parse, repair, validate, and (on high confidence) post to QuickBooks.
import type { AppConfig } from "./types/config.js" ;
import type { ProcessingResult } from "./types/pipeline.js" ;
import { PipelineStatus } from "./types/pipeline.js" ;
import { parseDocument } from "./ingestion/parser.js" ;
import { repairInvoice } from "./repair/repair.js" ;
import { validateInvoiceFields } from "./classify/index.js" ;
import { postInvoiceToQBO } from "./export/quickbooks.js" ;
import { QuickBooksClient } from "./export/client.js" ;
import { defineDocumentBudget, checkBudget, recordSpend } from "./budget/index.js" ;
import { AzureOpenAIPricing } from "./budget/index.js" ;
export async function runPipeline (
jobId : string ,
fileBuffer : Buffer ,
mimeType : string ,
config : AppConfig ,
) : Promise < ProcessingResult > {
try {
const budget = config.budget.perDocument;
const pricing = new AzureOpenAIPricing (config.azure.priceInput, config.azure.priceOutput);
const estimatedCost = pricing. estimateCost ({ model: config.azure.deployment, inputTokens: 500 , outputTokens: 500 });
await defineDocumentBudget (jobId, budget);
const budgetCheck = await checkBudget (jobId, estimatedCost);
if ( ! budgetCheck.allowed) {
return {
status: PipelineStatus.Failed,
errors: [ `Budget exceeded for job ${ jobId }: estimated ${ estimatedCost }, remaining ${ budgetCheck . remaining }` ],
needsReview: false ,
};
}
// Step 1: Parse document
const parsed = await parseDocument (fileBuffer, mimeType, config);
// Step 2: Repair
const repaired = repairInvoice (parsed, { maxAttempts: config.repair.maxAttempts });
// Step 3: Validate fields
const validation = await validateInvoiceFields (repaired.data, config);
// Record spend (estimate)
const inputTokens = 500 ;
const outputTokens = 300 ;
const actualCost = pricing. estimateCost ({ model: config.azure.deployment, inputTokens, outputTokens });
await recordSpend (jobId, actualCost, inputTokens, outputTokens);
if (repaired.status === PipelineStatus.NeedsReview || validation.needsReview) {
return {
status: PipelineStatus.NeedsReview,
data: repaired.data,
errors: [ ... repaired.errors, ... (validation.reviewFields. map ((f) => `Field "${ f }" needs review` ))],
needsReview: true ,
};
}
// Step 4: Post to QBO
const qboClient = new QuickBooksClient (config.quickbooks);
const result = await postInvoiceToQBO (repaired.data, qboClient);
void result.billId;
return {
status: PipelineStatus.Completed,
data: repaired.data,
errors: [],
needsReview: false ,
};
} catch (err) {
const msg = err instanceof Error ? err.message : String (err);
return {
status: PipelineStatus.Failed,
errors: [msg],
needsReview: false ,
};
}
}
Step 6: Create the API routes Create the two route handlers under the app/ directory.
Create app/api/upload/route.ts:
import { NextRequest, NextResponse } from "next/server" ;
import { loadConfigFromEnv } from "../../../src/types/config.js" ;
import { runPipeline } from "../../../src/pipeline.js" ;
import { jobStore } from "../../../src/lib/job-store.js" ;
import { PipelineStatus, type JobRecord } from "../../../src/types/pipeline.js" ;
export async function POST (req : NextRequest ) {
try {
const formData = await req. formData ();
const document = formData. get ( "document" );
if ( ! document || ! (document instanceof File )) {
return NextResponse. json ({ error: "Missing 'document' field" }, { status: 400 });
}
const mimeType = document.type;
const allowed = [ "application/pdf" , "image/png" , "image/jpeg" ];
if ( ! allowed. includes (mimeType)) {
return NextResponse. json (
{ error: `Unsupported MIME type: ${ mimeType }. Allowed: ${ allowed . join ( ", " ) }` },
{ status: 400 },
);
}
const buffer = Buffer. from ( await document. arrayBuffer ());
const MAX_SIZE = 20 * 1024 * 1024 ;
if (buffer.length > MAX_SIZE) {
return NextResponse. json ({ error: "File too large (max 20MB)" }, { status: 400 });
}
const jobId = crypto. randomUUID ();
const config = loadConfigFromEnv ();
const job : JobRecord = {
id: jobId,
fileBuffer: buffer,
status: PipelineStatus.Processing,
createdAt: new Date (),
};
jobStore. set (jobId, job);
const result = await runPipeline (jobId, buffer, mimeType, config);
const updated : JobRecord = {
... job,
status: result.status,
result,
};
jobStore. set (jobId, updated);
return NextResponse. json ({ jobId, status: updated.status }, { status: 202 });
} catch (err) {
const msg = err instanceof Error ? err.message : String (err);
return NextResponse. json ({ error: msg }, { status: 500 });
}
} Create app/api/status/[jobId]/route.ts:
import { NextRequest, NextResponse } from "next/server" ;
import { jobStore } from "../../../../src/lib/job-store.js" ;
export async function GET (
_req : NextRequest ,
{ params } : { params : Promise <{ jobId : string }> },
) {
const { jobId } = await params;
const job = jobStore. get (jobId);
if ( ! job) {
return NextResponse. json ({ error: "Job not found" }, { status: 404 });
}
return NextResponse. json ({
jobId: job.id,
status: job.status,
result: job.result,
createdAt: job.createdAt. toISOString (),
});
}
Step 7: Run the tests The test suite covers the parser, repair layer, classification logic, QuickBooks export with PII redaction, and the full pipeline integration. Run it with:
Expected output: vitest prints the test table, showing each test passing with a checkmark, and finishes with Test Files X passed and Test Suites X passed. A coverage summary prints below with line/branch/function/statement percentages.
Step 8: Start the dev server Start Next.js in development mode:
Expected output: The terminal prints Ready followed by http://localhost:3000. You can now upload a PDF or image invoice:
curl -X POST http://localhost:3000/api/upload -F "document=@invoice.pdf" The response is 202 Accepted with a jobId and status: "Processing". Poll the status endpoint to see the result:
curl http://localhost:3000/api/status/ < jobI d > If all fields pass the confidence threshold, status is "Completed" and result.data contains the extracted invoice. If any field scored below 0.3, status is "NeedsReview" with a list of fields to double-check.
Next steps
Add a web UI at /app/page.tsx with a drag-and-drop upload form that polls /api/status/:jobId and displays the parsed invoice fields for confirmation before the bill is posted.
Swap the keyword classifier in src/classify/index.ts for an LLM-based classifier from @reaatech/confidence-router-classifiers to handle more nuanced vendor-name validation across international formats.
Wire up MSW (mock service worker) in tests/ to add end-to-end integration tests that exercise the full pipeline with mocked Azure OpenAI and QuickBooks API responses.
number
> {
const fields = [ "vendorName" , "invoiceNumber" , "invoiceDate" , "dueDate" , "totalAmount" , "currency" , "lineItems" ];
for ( const field of fields) {
if (scores[field] === undefined ) {
const val = data[field];
scores[field] = val !== undefined && val !== null && val !== "" ? 0.9 : 0.3 ;
}
}
return scores;
}
async function pdfToBase64 (buffer : Buffer ) : Promise < string > {
try {
const data = await getDocument ({ data: new Uint8Array (buffer) }).promise;
if (data.numPages < 1 ) {
throw new Error ( "PDF has no pages" );
}
void data;
} catch {
void 0 ;
}
return buffer. toString ( "base64" );
}
export async function parseDocument (
buffer : Buffer ,
mimeType : string ,
config : AppConfig ,
) : Promise < ExtractedInvoiceData & { rawResponse : unknown }> {
if (buffer.length > MAX_FILE_SIZE) {
console. error ( `[Parser] File too large: ${ ( buffer .length / 1024 / 1024 ). toFixed ( 2 ) } MB` );
throw Object. assign ( new Error ( "File too large (max 20MB)" ), { code: "FILE_TOO_LARGE" });
}
const allowed = [ "application/pdf" , "image/png" , "image/jpeg" ];
if ( ! allowed. includes (mimeType)) {
console. error ( `[Parser] Unsupported MIME type: ${ mimeType }` );
throw Object. assign (
new Error ( `Unsupported MIME type: ${ mimeType }. Allowed: ${ allowed . join ( ", " ) }` ),
{ code: "UNSUPPORTED_MIME" },
);
}
let dataUri : string ;
if (mimeType === "application/pdf" ) {
const base64 = await pdfToBase64 (buffer);
dataUri = `data:application/pdf;base64,${ base64 }` ;
} else {
const processed = await sharp (buffer)
. resize ( 1024 , 1024 , { fit: "inside" , withoutEnlargement: true })
. normalize ()
. sharpen ()
. removeAlpha ()
. toBuffer ();
dataUri = `data:image/png;base64,${ processed . toString ( "base64" ) }` ;
}
const endpoint = config.azure.endpoint. replace ( /\/ $ / , "" );
const url = `${ endpoint }/openai/deployments/${ config . azure . deployment }/chat/completions?api-version=2024-02-15-preview` ;
const controller = new AbortController ();
const timeout = setTimeout (() => controller. abort (), AI_TIMEOUT_MS);
let resp : Response ;
try {
resp = await fetch (url, {
method: "POST" ,
signal: controller.signal,
headers: {
"Content-Type" : "application/json" ,
"api-key" : config.azure.apiKey,
},
body: JSON. stringify ({
messages: [
{
role: "user" ,
content: [
{ type: "text" , text: "Extract invoice data from this document. Return ONLY a JSON object with fields: vendorName, invoiceNumber, invoiceDate (YYYY-MM-DD), dueDate (YYYY-MM-DD), totalAmount (number), currency (string), lineItems (array of {description, quantity, unitPrice, amount})." },
{ type: mimeType === "application/pdf" ? "file" : "image_url" , [mimeType === "application/pdf" ? "file" : "image_url" ]: { url: dataUri } },
],
},
],
max_tokens: 2000 ,
temperature: 0.1 ,
}),
});
} finally {
clearTimeout (timeout);
}
if ( ! resp.ok) {
const text = await resp. text (). catch (() => "" );
throw new Error ( `Azure OpenAI error ${ resp . status }: ${ text }` );
}
const json = await resp. json () as {
choices ?: { message ?: { content ?: string } }[];
usage ?: { prompt_tokens ?: number ; completion_tokens ?: number };
};
const content = json.choices?.[ 0 ]?.message?.content ?? "{}" ;
let raw : Record < string , unknown >;
try {
raw = JSON. parse (content);
} catch {
throw Object. assign ( new Error ( `Azure returned non-JSON: ${ content . slice ( 0 , 200 ) }` ), { code: "AI_PARSE_ERROR" });
}
const scores : Record < string , number > = {};
computeConfidence (raw, scores);
return {
vendorName: (raw.vendorName as string | undefined ) ?? "" ,
invoiceNumber: (raw.invoiceNumber as string | undefined ) ?? "" ,
invoiceDate: (raw.invoiceDate as string | undefined ) ?? "" ,
dueDate: (raw.dueDate as string | undefined ) ?? "" ,
totalAmount: typeof raw.totalAmount === "number" ? raw.totalAmount : 0 ,
currency: (raw.currency as string | undefined ) ?? "USD" ,
lineItems: Array. isArray (raw.lineItems) ? raw.lineItems as import ( "../types/invoice.js" ). LineItem [] : [],
confidenceScores: scores,
rawResponse: json,
};
}
}),
unitPrice: z. union ([z. number (), z. string ()]). transform ((v) => {
if ( typeof v === "number" ) return v;
const n = Number. parseFloat (v);
return Number. isNaN (n) ? 0 : n;
}),
amount: z. union ([z. number (), z. string ()]). transform ((v) => {
if ( typeof v === "number" ) return v;
const n = Number. parseFloat (v);
return Number. isNaN (n) ? 0 : n;
}),
});
const ExtractedInvoiceDataSchema = z. object ({
vendorName: z. union ([z. string (), z. number ()]). transform ((v) => String (v)),
invoiceNumber: z. union ([z. string (), z. number ()]). transform ((v) => String (v)),
invoiceDate: z. union ([z. string (), z. number ()]). transform ((v) => String (v)),
dueDate: z. union ([z. string (), z. number ()]). transform ((v) => String (v)),
totalAmount: z. union ([z. number (), z. string ()]). transform ((v) => {
if ( typeof v === "number" ) return v;
const n = Number. parseFloat (v);
return Number. isNaN (n) ? 0 : n;
}),
currency: z. union ([z. string (), z. number ()]). transform ((v) => String (v)),
lineItems: z. array (lineItemSchema). default ([]),
confidenceScores: z. record (z. number ()). default ({}),
});
const SNAKE_CASE_MAP : Record < string , string > = {
vendor_name: "vendorName" ,
invoice_number: "invoiceNumber" ,
invoice_date: "invoiceDate" ,
due_date: "dueDate" ,
total_amount: "totalAmount" ,
};
const FALLBACK_KEYS : Record < string , string []> = {
vendorName: [ "vendorName" , "vendor" , "supplier" , "from" ],
};
function logRepair (step : string , detail ?: string ) : void {
console. debug ( `[Repair] ${ step }${ detail ? `: ${ detail }` : ""}` );
}
function normalizeSnakeCase (raw : Record < string , unknown >) : Record < string , unknown > {
const out : Record < string , unknown > = {};
for ( const [key, val] of Object. entries (raw)) {
const camelKey = SNAKE_CASE_MAP[key] ?? key;
out[camelKey] = val;
}
return out;
}
function applyRepairs (raw : Record < string , unknown >) : Record < string , unknown > {
let repaired = { ... raw };
repaired = normalizeSnakeCase (repaired);
for ( const [primary, fallbacks] of Object. entries (FALLBACK_KEYS)) {
if ( ! repaired[primary] || repaired[primary] === "" ) {
for ( const fb of fallbacks) {
if (fb !== primary && repaired[fb] && repaired[fb] !== "" ) {
logRepair ( "Fallback" , `${ primary } <- ${ fb } = "${ repaired [ fb ] }"` );
repaired[primary] = repaired[fb];
break ;
}
}
}
}
return repaired;
}
function buildData (parsed : {
vendorName : string ;
invoiceNumber : string ;
invoiceDate : string ;
dueDate : string ;
totalAmount : number ;
currency : string ;
lineItems : unknown [];
confidenceScores ?: Record < string , number >;
}) : ExtractedInvoiceData {
return {
vendorName: parsed.vendorName,
invoiceNumber: parsed.invoiceNumber,
invoiceDate: parsed.invoiceDate,
dueDate: parsed.dueDate,
totalAmount: parsed.totalAmount,
currency: parsed.currency,
lineItems: parsed.lineItems as import ( "../types/invoice.js" ). LineItem [],
confidenceScores: parsed.confidenceScores ?? {},
};
}
export function repairInvoice (
rawData : unknown ,
config : { maxAttempts : number },
) : { data : ExtractedInvoiceData ; status : PipelineStatus ; errors : string [] } {
const errors : string [] = [];
if (rawData === null || rawData === undefined || typeof rawData !== "object" ) {
logRepair ( "Invalid input" , `type=${ typeof rawData }` );
return {
data: buildData ({ vendorName: "" , invoiceNumber: "" , invoiceDate: "" , dueDate: "" , totalAmount: 0 , currency: "USD" , lineItems: [] }),
status: PipelineStatus.NeedsReview,
errors: [ "Input is not a valid object" ],
};
}
const asRecord = rawData as Record < string , unknown >;
let working = normalizeSnakeCase (asRecord);
logRepair ( "Normalized snake_case fields" , Object. keys (asRecord). filter ((k) => SNAKE_CASE_MAP[k] !== undefined ). join ( ", " ) || "none" );
for ( let attempt = 0 ; attempt < config.maxAttempts; attempt ++ ) {
logRepair ( `Attempt ${ attempt + 1 }/${ config . maxAttempts }` );
const repaired = applyRepairs (working);
const result = ExtractedInvoiceDataSchema. safeParse (repaired);
if (result.success) {
logRepair ( "Parse succeeded" , `attempt=${ attempt + 1 }` );
const wasRepaired = attempt > 0 ;
return {
data: buildData (result.data),
status: wasRepaired ? PipelineStatus.NeedsReview : PipelineStatus.Completed,
errors: wasRepaired ? errors : [],
};
}
for ( const issue of result.error.issues) {
const msg = `${ issue . path . join ( "." ) }: ${ issue . message }` ;
if ( ! errors. some ((e) => e. includes (issue.path. join ( "." )))) {
errors. push (msg);
logRepair ( "Validation issue" , msg);
}
}
if (attempt === 0 ) {
const defaults : Record < string , unknown > = {
vendorName: "" ,
invoiceNumber: "" ,
invoiceDate: "" ,
dueDate: "" ,
totalAmount: 0 ,
currency: "USD" ,
lineItems: [],
};
working = { ... defaults, ... repaired };
} else {
break ;
}
}
logRepair ( "Exhausted all attempts" , `status=${ PipelineStatus . NeedsReview }, errors=${ errors .length }` );
return {
data: buildData ({
vendorName: (working.vendorName as string ) ?? "" ,
invoiceNumber: (working.invoiceNumber as string ) ?? "" ,
invoiceDate: (working.invoiceDate as string ) ?? "" ,
dueDate: (working.dueDate as string ) ?? "" ,
totalAmount: typeof working.totalAmount === "number" ? working.totalAmount : 0 ,
currency: (working.currency as string ) ?? "USD" ,
lineItems: Array. isArray (working.lineItems) ? working.lineItems as import ( "../types/invoice.js" ). LineItem [] : [],
confidenceScores: (working.confidenceScores as Record < string , number >) ?? {},
}),
status: PipelineStatus.NeedsReview,
errors,
};
}
?:
string
;
TxnDate : string ;
DueDate : string ;
TotalAmt : number ;
CurrencyRef : { value : string };
Line : QBOBillLine [];
}
export interface QBOBillResponse {
Id : string ;
status : string ;
TxnDate : string ;
}
let currentToken : string | null = null ;
let tokenExpiry : Date = new Date ( 0 );
export function resetTokenCache () : void {
currentToken = null ;
tokenExpiry = new Date ( 0 );
}
async function doRefresh (clientId : string , clientSecret : string , refreshTokenValue : string ) : Promise < void > {
if ( ! refreshTokenValue) {
throw new Error ( "QuickBooks refresh token not configured" );
}
const resp = await fetch ( "https://oauth.platform.intuit.com/oauth2/v1/tokens/bearer" , {
method: "POST" ,
headers: { "Content-Type" : "application/x-www-form-urlencoded" },
body: new URLSearchParams ({
grant_type: "refresh_token" ,
refresh_token: refreshTokenValue,
client_id: clientId,
client_secret: clientSecret,
}),
});
if ( ! resp.ok) {
const text = await resp. text (). catch (() => "" );
throw new Error ( `QuickBooks token refresh failed ${ resp . status }: ${ text }` );
}
const data = await resp. json () as { access_token : string ; expires_in : number };
currentToken = data.access_token;
tokenExpiry = new Date (Date. now () + (data.expires_in - 60 ) * 1000 );
}
export class QuickBooksClient {
private readonly clientId : string ;
private readonly clientSecret : string ;
private readonly realmId : string ;
private readonly refreshTokenValue : string ;
constructor (cfg : { clientId : string ; clientSecret : string ; realmId : string ; refreshToken : string }) {
this.clientId = cfg.clientId;
this.clientSecret = cfg.clientSecret;
this.realmId = cfg.realmId;
this.refreshTokenValue = cfg.refreshToken;
}
async getAccessToken () : Promise < string > {
if (currentToken && Date. now () < tokenExpiry. getTime ()) {
return currentToken;
}
await doRefresh (this.clientId, this.clientSecret, this.refreshTokenValue);
if ( ! currentToken) throw new Error ( "Failed to obtain QuickBooks access token" );
return currentToken;
}
async postBill (bill : QBOBillPayload ) : Promise < QBOBillResponse > {
const token = await this. getAccessToken ();
const url = `https://quickbooks.api.intuit.com/v3/company/${ this . realmId }/bill` ;
const resp = await fetch (url, {
method: "POST" ,
headers: {
"Content-Type" : "application/json" ,
Authorization: `Bearer ${ token }` ,
Accept: "application/json" ,
},
body: JSON. stringify (bill),
});
if (resp.status === 401 ) {
await doRefresh (this.clientId, this.clientSecret, this.refreshTokenValue);
const retryToken = currentToken as string ;
const retryResp = await fetch (url, {
method: "POST" ,
headers: {
"Content-Type" : "application/json" ,
Authorization: `Bearer ${ retryToken }` ,
Accept: "application/json" ,
},
body: JSON. stringify (bill),
});
if ( ! retryResp.ok) {
const err = await retryResp. text (). catch (() => "" );
throw Object. assign ( new Error ( `QuickBooks bill POST failed ${ retryResp . status }: ${ err }` ), { status: retryResp.status });
}
return retryResp. json () as Promise < QBOBillResponse >;
}
if (resp.status === 429 ) {
await new Promise ((r) => setTimeout (r, 1000 ));
return this. postBill (bill);
}
if ( ! resp.ok) {
const err = await resp. text (). catch (() => "" );
throw Object. assign ( new Error ( `QuickBooks bill POST failed ${ resp . status }: ${ err }` ), { status: resp.status });
}
return resp. json () as Promise < QBOBillResponse >;
}
}