Files · Mistral AI Document Pipeline for Shopify Tax Document Extraction
76 (1 binary, 540.9 kB total)attempt 1
README.md·1657 B·markdown
markdown
# Mistral AI Document Pipeline for Shopify Tax Document Extraction
> Extract tax IDs, totals, and line items from Shopify PDF invoices and receipts, ready for QuickBooks or Xero.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Problem
E-commerce merchants on Shopify manually re-type tax details from PDF invoices and receipts into accounting software, leading to errors, delays, and compliance risks.
## Architecture
```
Shopify webhook → file processor → REAA extraction → confidence routing → structured repair → cost telemetry → validation → accounting webhook
```
## Prerequisites
- Node >= 22
- pnpm 10
- Mistral AI API key
- Shopify app credentials
## API Reference
- `POST /api/extract` — Trigger document extraction
- `GET /api/status/[id]` — Poll extraction status
- `POST /api/webhook/deliver` — Deliver results to accounting system
## REAA Packages Used
| Package | Description |
|---|---|
| `@reaatech/media-pipeline-mcp-doc-extraction` | Extracts text from PDFs and images for downstream processing |
| `@reaatech/structured-repair-core` | Repairs malformed LLM outputs into valid structured data |
| `@reaatech/llm-cost-telemetry` | Tracks per-document LLM spend and usage metrics |
| `@reaatech/confidence-router` | Routes between rule-based and LLM extraction based on confidence scores |
## Testing
```bash
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint checks
```
## License
MIT — see [LICENSE](./LICENSE).