Files · Anthropic Salesforce Contract Extraction for SMB Sales
76 (1 binary, 666.4 kB total)attempt 1
README.md·3017 B·markdown
markdown
# Anthropic Salesforce Contract Extraction for SMB Sales
Automatically extracts key fields like value, dates, and parties from Salesforce contracts and proposals, eliminating manual data entry for SMB sales teams.
## Problem
Small sales teams keep contracts and proposals as PDFs or scanned documents inside Salesforce, but pulling out amounts, effective dates, and signatory details manually is slow, error-prone, and inconsistent.
## Architecture
```
Salesforce (ContentVersion) → jsforce fetch → pdfjs-dist / AWS Textract → text
→ Anthropic Claude (claude-sonnet-4-6) → structured output
→ @reaatech/structured-repair-core → validated JSON
```
The pipeline:
1. Accepts a Salesforce document ID via POST /api/extract
2. Fetches the binary document via JSforce
3. Extracts text via pdfjs-dist (PDFs) or AWS Textract (images)
4. Maintains multi-page context through @reaatech/session-continuity
5. Sends cleaned text to Anthropic Claude with a Zod schema
6. Repairs malformed LLM output through @reaatech/structured-repair-core
7. Enforces per-document spend caps via @reaatech/agent-budget-engine
8. Returns structured contract data
## Packages
### REAA (vendored)
| Package | Version | Purpose |
|---|---|---|
| @reaatech/media-pipeline-mcp-core | 0.3.0 | Artifact registry & pipeline types |
| @reaatech/media-pipeline-mcp-doc-extraction | 0.3.0 | Document extraction operations |
| @reaatech/structured-repair-core | 1.0.0 | Malformed LLM JSON repair |
| @reaatech/session-continuity | 0.1.0 | Multi-page conversation context |
| @reaatech/agent-budget-engine | 0.1.1 | Per-document budget enforcement |
### Third-party
| Package | Version | Purpose |
|---|---|---|
| @anthropic-ai/sdk | 0.106.0 | Claude API client |
| @aws-sdk/client-textract | 3.1075.0 | AWS Textract OCR |
| pdfjs-dist | 6.0.227 | PDF text extraction |
| jsforce | 3.10.16 | Salesforce API client |
| zod | 4.4.3 | Schema validation & type inference |
| p-limit | 7.3.0 | Concurrency limiting |
| langfuse | 3.38.20 | LLM tracing & observability |
## Setup
```bash
pnpm install
cp .env.example .env
# Fill in your env vars (see .env.example)
pnpm dev
```
## API
### POST /api/extract
```json
{
"documentId": "<salesforce-content-version-id>"
}
```
Response (200):
```json
{
"success": true,
"data": {
"contract_value": 50000,
"effective_date": "2024-01-01",
"expiration_date": "2025-01-01",
"parties": [{"name": "Acme Corp", "role": "Client"}],
"signatory_details": [{"name": "John Doe", "title": "CEO", "signed_date": "2024-01-01"}],
"contract_terms": "Net 30 payment terms...",
"governing_law": "New York",
"renewal_terms": "Auto-renew unless 30-day notice"
},
"documentId": "...",
"repairSteps": ["strip-fences"],
"cost_usd": 0.0023
}
```
### GET /api/extract
Health check. Returns `{ "status": "ok" }`.
## Development
```bash
pnpm typecheck # TypeScript check
pnpm lint # ESLint
pnpm test # Run tests with coverage
```
## License
MIT