Mistral AI Document Pipeline for Shopify Tax Document Extraction

Extract tax IDs, totals, and line items from Shopify PDF invoices and receipts, ready for QuickBooks or Xero.

mistral document-pipeline shopify tax-extraction nextjs typescript pdf-extraction accounting-automation

The problem

E‑commerce merchants on Shopify manually re‑type tax details from PDF invoices and receipts into accounting software, leading to errors, delays, and compliance risks.

Built from

Intro

This tutorial walks you through building a document pipeline that extracts tax data from Shopify PDF invoices, DOCX receipts, and scanned images — then validates, repairs, and delivers the structured data. You’ll use Mistral AI for LLM-based extraction, the REAA package family for document processing and cost tracking, and Next.js App Router as the HTTP surface.

By the end, you’ll have a working service with three REST endpoints, a Shopify webhook parser, and a full test suite with 80 passing tests. This is a copy-paste-along tutorial — every code block is the real file content.

Prerequisites

Node.js 22 or later (node --version)
pnpm 10 (npm install -g pnpm@10)
A Mistral AI API key from console.mistral.ai
A Shopify store with an app API key and secret (or placeholder values for the mock)
Basic familiarity with TypeScript, Next.js App Router, and REST APIs

Step 1: Scaffold the project

Start from the existing scaffold — a Next.js 16 App Router project with all config files in place. The package.json pins every dependency to exact versions and includes the four REAA packages you’ll use:

json

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

173 kB·79 tests·98.8% coverage·vitest passing

SHA-25665964b1e78e3bc9079acc4010d9f6b3993746cf6e69eda7c798d00f911fe4d4b

Book a conversation All solutions

Comments

Loading comments…

import "@shopify/shopify-api/adapters/node"; import { shopifyApi, ApiVersion } from "@shopify/shopify-api"; import type { TaxDocument, ExtractionResult } from "../lib/types.js"; import { DocumentExtractionService } from "../services/document-extraction.js"; import { CostTelemetryService } from "../services/cost-telemetry.js"; export const shopify = shopifyApi({ apiKey: process.env.SHOPIFY_API_KEY ?? "", apiSecretKey: process.env.SHOPIFY_API_SECRET ?? "", scopes: ["read_orders"], hostName: process.env.SHOPIFY_HOST_NAME ?? "", apiVersion: ApiVersion.July25, isEmbeddedApp: false, }); export function handleShopifyOrderWebhook( orderData: Record<string, unknown>, ): { orderId: string; documents: TaxDocument[] } { const rawId = orderData.id; const orderId = typeof rawId === "string" ? rawId : String(rawId); const documents: TaxDocument[] = []; const lineItems = orderData.line_items as Array<Record<string, unknown>> | undefined; if (lineItems) { for (const item of lineItems) { const attachments = item.attachments as Array<Record<string, unknown>> | undefined; if (attachments) { for (const att of attachments) { const url = typeof att.url === "string" ? att.url : undefined; if (url) { const docIndex = documents.length + 1; const fileName = typeof att.filename === "string" ? att.filename : `document-${String(docIndex)}`; documents.push({ id: `doc-${String(docIndex)}`, shopifyOrderId: orderId, fileName, mimeType: typeof att.mime_type === "string" ? att.mime_type : "application/octet-stream", buffer: new Uint8Array(), }); } } } } } return { orderId, documents }; } export async function fetchDocumentFromUrl(url: string): Promise<TaxDocument> { const response = await fetch(url); if (!response.ok) { const statusStr = String(response.status); throw new Error(`Failed to fetch document from ${url}: ${statusStr} ${response.statusText}`); } const buffer = new Uint8Array(await response.arrayBuffer()); const mimeType = response.headers.get("content-type") ?? "application/octet-stream"; const fileName = url.split("/").pop() ?? "document"; return { id: `fetched-${crypto.randomUUID()}`, shopifyOrderId: "", fileName, mimeType, buffer, }; } export async function processDocument(document: TaxDocument): Promise<ExtractionResult> { const extractionService = new DocumentExtractionService(); const costTelemetry = new CostTelemetryService(); const extractionId = crypto.randomUUID(); const result = await extractionService.runExtractionPipeline(document, extractionId); costTelemetry.recordTokenUsage( "mistral", "mistral-large-latest", 500, 200, document.shopifyOrderId, "tax-extraction", ); return result; }

Mistral AI Document Pipeline for Shopify Tax Document Extraction

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project

Step 2: Define shared types and Zod schemas

Step 3: Create the cost telemetry service

Step 4: Build the confidence router

Step 5: Implement structured repair

Step 6: Build file processing utilities

Step 7: Create the Mistral AI client

Step 8: Build the document extraction service

Step 9: Build the validation service

Step 10: Create the Shopify webhook handler

Step 11: Set up API routes

Step 12: Add middleware, instrumentation, and the index

Step 13: Configure environment variables

Step 14: Run the tests

Step 15: Try the pipeline

Next steps