Skip to content
reaatechREAATECH

Files · Anthropic Document Pipeline for SMB Lease Abstraction

89 (1 binary, 744.4 kB total)attempt 1

README.md·2619 B·markdown
markdown
# Anthropic Document Pipeline for SMB Lease Abstraction
 
Extract key lease terms from PDFs and DOCX files with Claude, backed by a retrieval-augmented clause library and confidence-based human review.
 
## Problem
 
Property managers and small legal teams spend hours manually pulling critical dates, rent amounts, and clauses from lease documents, risking costly oversights.
 
## Architecture
 
```
Upload → Parse (pdfjs-dist/mammoth) → Embed (VoyageAI) → Retrieve similar clauses (ChromaDB + hybrid-rag) → Extract via Claude → Confidence gate → Auto-approve or manual review
```
 
Every extraction is guarded by:
- **Budget engine** (`@reaatech/agent-budget-engine`) — limits Claude API spend per document
- **Circuit breaker** (`@reaatech/circuit-breaker-agents`) — prevents cascading failures during PDF parsing
- **Schema repair** — ensures extracted JSON always matches the target Zod schema
- **Observability** (Langfuse) — traces every pipeline step
 
## Packages
 
| Package | Version | Role |
|---|---|---|
| `@anthropic-ai/sdk` | 0.98.0 | Claude API for lease extraction |
| `pdfjs-dist` | 5.7.284 | PDF document parsing |
| `mammoth` | 1.12.0 | DOCX document parsing |
| `voyageai` | 0.2.1 | Text embeddings |
| `chromadb` | 3.4.3 | Vector store for clause library |
| `@reaatech/hybrid-rag` | 0.1.0 | RAG type definitions |
| `@reaatech/confidence-router` | 0.1.0 | Extraction confidence routing |
| `@reaatech/agent-budget-engine` | 0.1.0 | Spend budget enforcement |
| `@reaatech/circuit-breaker-agents` | 0.1.0 | Failure isolation |
| `zod` | 4.4.3 | Schema validation |
| `langfuse` | 3.38.20 | LLM observability |
 
## Quick Start
 
1. Install: `pnpm install`
2. Set env vars (see `.env.example`):
   - `ANTHROPIC_API_KEY` — your Anthropic API key
   - `VOYAGE_API_KEY` — your Voyage AI API key (for embeddings)
   - `CHROMA_URL` — ChromaDB server URL (default: http://localhost:8000)
3. Run: `pnpm dev`
4. Open: http://localhost:3000
 
## API
 
| Endpoint | Method | Description |
|---|---|---|
| `/api/documents` | POST | Upload PDF/DOCX for extraction |
| `/api/documents/[id]` | GET | Get extraction detail |
| `/api/extractions` | GET | List extractions (filter by status) |
| `/api/extractions/[id]` | GET/PATCH | Get/update extraction |
| `/api/reviews` | GET | List review tasks |
| `/api/reviews/[id]` | GET/PATCH | Get/claim/complete review task |
 
## Commands
 
- `pnpm dev` — Start development server
- `pnpm build` — Build for production
- `pnpm test` — Run tests with coverage
- `pnpm typecheck` — TypeScript type checking
- `pnpm lint` — ESLint
 
## License
 
MIT