Files · Anthropic Document Pipeline for NetSuite SMB Invoice Processing
72 (1 binary, 556.1 kB total)attempt 1
README.md·3908 B·markdown
markdown
# Anthropic Document Pipeline for NetSuite SMB Invoice Processing
> Extract invoice fields from PDFs and emails using Claude, push to NetSuite, with per-invoice cost caps and semantic caching for recurring vendors.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Architecture
```
File Upload → Parse Raw Text → Complexity Classification → Claude Extraction
→ Structured Repair → Vendor Template Cache → Budget Enforcement → NetSuite REST API Push
```
1. **File ingestion** — accepts PDF, DOCX, and CSV uploads via multipart form-data
2. **Raw text extraction** — dispatches to `pdf-parse`, `mammoth`, or CSV passthrough
3. **Complexity classification** — a lightweight Claude call determines `"simple"` vs `"complex"`
4. **Claude extraction** — the classified text is sent to Anthropic with an invoice-extraction prompt
5. **Structured repair** — `@reaatech/structured-repair-core` fixes malformed JSON, strips markdown fences, coerces types, and fuzzy-matches keys
6. **Vendor template cache** — `@reaatech/llm-cache` with a Redis adapter caches per-vendor extractions; future identical invoices skip the LLM
7. **Budget enforcement** — `@reaatech/agent-budget-engine` enforces per-tenant monthly caps with soft/hard limits and automatic model downgrades
8. **NetSuite push** — the structured invoice is sent to NetSuite's REST API as a vendor bill via OAuth 1.0a
## Setup
```bash
pnpm install
pnpm dev # starts Next.js dev server on http://localhost:3000
```
### Required environment variables
| Variable | Description |
|---|---|
| `ANTHROPIC_API_KEY` | Anthropic API key for Claude |
| `OPENAI_API_KEY` | OpenAI API key (required by the cache engine's semantic embedder) |
| `REDIS_URL` | Redis connection string (default: `redis://localhost:6379`) |
| `NETSUITE_CONSUMER_KEY` | NetSuite OAuth consumer key |
| `NETSUITE_CONSUMER_SECRET` | NetSuite OAuth consumer secret |
| `NETSUITE_TOKEN_KEY` | NetSuite OAuth token key |
| `NETSUITE_TOKEN_SECRET` | NetSuite OAuth token secret |
| `NETSUITE_ACCOUNT_ID` | NetSuite account ID |
| `DEFAULT_TENANT_MONTHLY_BUDGET` | Monthly cost cap (default: `100.0`) |
| `LANGFUSE_PUBLIC_KEY` | Langfuse telemetry public key |
| `LANGFUSE_SECRET_KEY` | Langfuse telemetry secret key |
| `LANGFUSE_HOST` | Langfuse host (default: `https://cloud.langfuse.com`) |
## API
### `POST /api/ingest`
Upload an invoice document for processing.
- **Content-Type:** `multipart/form-data`
- **Fields:**
- `file` (required) — PDF, DOCX, or CSV file
- `vendorName` (optional) — vendor identifier for template cache lookup
- **Success (200):** `{ invoice: ExtractedInvoice, netsuiteId: string, cached: boolean, cost: number }`
- **Errors:** `400` (missing file / unsupported format), `402` (budget exhausted), `422` (extraction failed), `502` (NetSuite API error)
### `GET /api/health`
- **Success (200):** `{ status: "ok", timestamp: string }`
## Dependencies
| Package | Role |
|---|---|
| `@reaatech/llm-cache` | Semantic and exact-match caching for recurring vendor invoices |
| `@reaatech/llm-cache-adapters-redis` | Redis-backed storage adapter for the cache engine |
| `@reaatech/structured-repair-core` | Multi-strategy JSON repair: strip fences, fix syntax, coerce types, fuzzy-match keys |
| `@reaatech/llm-cost-telemetry` | Per-request cost calculation, ID generation, and telemetry context |
| `@reaatech/agent-budget-engine` | Monthly budget controller with soft/hard caps and auto-downgrade policies |
| `@reaatech/llm-router-core` | Model definition schemas and routing request types |
## How to test
```bash
pnpm test # vitest run --coverage
pnpm typecheck # tsc --noEmit
pnpm lint # eslint .
```
## License
MIT — see [LICENSE](./LICENSE).