Skip to content
reaatechREAATECH

Files · Mistral AI Document Pipeline for Shopify Tax Document Extraction

76 (1 binary, 540.9 kB total)attempt 1

README.md·1657 B·markdown
markdown
# Mistral AI Document Pipeline for Shopify Tax Document Extraction
 
> Extract tax IDs, totals, and line items from Shopify PDF invoices and receipts, ready for QuickBooks or Xero.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem
 
E-commerce merchants on Shopify manually re-type tax details from PDF invoices and receipts into accounting software, leading to errors, delays, and compliance risks.
 
## Architecture
 
```
Shopify webhook → file processor → REAA extraction → confidence routing → structured repair → cost telemetry → validation → accounting webhook
```
 
## Prerequisites
 
- Node >= 22
- pnpm 10
- Mistral AI API key
- Shopify app credentials
 
## API Reference
 
- `POST /api/extract` — Trigger document extraction
- `GET /api/status/[id]` — Poll extraction status
- `POST /api/webhook/deliver` — Deliver results to accounting system
 
## REAA Packages Used
 
| Package | Description |
|---|---|
| `@reaatech/media-pipeline-mcp-doc-extraction` | Extracts text from PDFs and images for downstream processing |
| `@reaatech/structured-repair-core` | Repairs malformed LLM outputs into valid structured data |
| `@reaatech/llm-cost-telemetry` | Tracks per-document LLM spend and usage metrics |
| `@reaatech/confidence-router` | Routes between rule-based and LLM extraction based on confidence scores |
 
## Testing
 
```bash
pnpm test        # vitest run with coverage
pnpm typecheck   # TypeScript type checking
pnpm lint        # ESLint checks
```
 
## License
 
MIT — see [LICENSE](./LICENSE).