Skip to content
reaatech

Files · Anthropic Salesforce Contract Extraction for SMB Sales

76 (1 binary, 666.4 kB total)attempt 1

README.md·3017 B·markdown
markdown
# Anthropic Salesforce Contract Extraction for SMB Sales
 
Automatically extracts key fields like value, dates, and parties from Salesforce contracts and proposals, eliminating manual data entry for SMB sales teams.
 
## Problem
 
Small sales teams keep contracts and proposals as PDFs or scanned documents inside Salesforce, but pulling out amounts, effective dates, and signatory details manually is slow, error-prone, and inconsistent.
 
## Architecture
 
```
Salesforce (ContentVersion) → jsforce fetch → pdfjs-dist / AWS Textract → text
  → Anthropic Claude (claude-sonnet-4-6) → structured output
  → @reaatech/structured-repair-core → validated JSON
```
 
The pipeline:
1. Accepts a Salesforce document ID via POST /api/extract
2. Fetches the binary document via JSforce
3. Extracts text via pdfjs-dist (PDFs) or AWS Textract (images)
4. Maintains multi-page context through @reaatech/session-continuity
5. Sends cleaned text to Anthropic Claude with a Zod schema
6. Repairs malformed LLM output through @reaatech/structured-repair-core
7. Enforces per-document spend caps via @reaatech/agent-budget-engine
8. Returns structured contract data
 
## Packages
 
### REAA (vendored)
| Package | Version | Purpose |
|---|---|---|
| @reaatech/media-pipeline-mcp-core | 0.3.0 | Artifact registry & pipeline types |
| @reaatech/media-pipeline-mcp-doc-extraction | 0.3.0 | Document extraction operations |
| @reaatech/structured-repair-core | 1.0.0 | Malformed LLM JSON repair |
| @reaatech/session-continuity | 0.1.0 | Multi-page conversation context |
| @reaatech/agent-budget-engine | 0.1.1 | Per-document budget enforcement |
 
### Third-party
| Package | Version | Purpose |
|---|---|---|
| @anthropic-ai/sdk | 0.106.0 | Claude API client |
| @aws-sdk/client-textract | 3.1075.0 | AWS Textract OCR |
| pdfjs-dist | 6.0.227 | PDF text extraction |
| jsforce | 3.10.16 | Salesforce API client |
| zod | 4.4.3 | Schema validation & type inference |
| p-limit | 7.3.0 | Concurrency limiting |
| langfuse | 3.38.20 | LLM tracing & observability |
 
## Setup
 
```bash
pnpm install
cp .env.example .env
# Fill in your env vars (see .env.example)
pnpm dev
```
 
## API
 
### POST /api/extract
 
```json
{
  "documentId": "<salesforce-content-version-id>"
}
```
 
Response (200):
```json
{
  "success": true,
  "data": {
    "contract_value": 50000,
    "effective_date": "2024-01-01",
    "expiration_date": "2025-01-01",
    "parties": [{"name": "Acme Corp", "role": "Client"}],
    "signatory_details": [{"name": "John Doe", "title": "CEO", "signed_date": "2024-01-01"}],
    "contract_terms": "Net 30 payment terms...",
    "governing_law": "New York",
    "renewal_terms": "Auto-renew unless 30-day notice"
  },
  "documentId": "...",
  "repairSteps": ["strip-fences"],
  "cost_usd": 0.0023
}
```
 
### GET /api/extract
 
Health check. Returns `{ "status": "ok" }`.
 
## Development
 
```bash
pnpm typecheck    # TypeScript check
pnpm lint         # ESLint
pnpm test         # Run tests with coverage
```
 
## License
 
MIT