Files · OpenAI Lead Intake with Adaptive Routing
73 (1 binary, 533.1 kB total)attempt 2
README.md·4422 B·markdown
markdown
# OpenAI Lead Intake with Adaptive Routing
> Automatically capture, qualify, and route inbound leads using AI, so your sales team only works the opportunities that are ready to close.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family. This project implements an end-to-end lead intake pipeline that parses attachments, classifies intent using keyword and LLM classifiers, routes leads based on confidence thresholds, enforces per-user budgets, and syncs qualified leads to HubSpot CRM.
## Quick Start
```bash
pnpm install
cp .env.example .env
# Fill in your API keys in .env
pnpm run dev
```
The server starts on port 3000 by default (configurable via `PORT` environment variable).
## Architecture
```
Inbound Lead ──▶ Parse ──▶ Classify ──▶ Route ──▶ Handoff ──▶ CRM Sync
│ │ │
▼ ▼ ▼
Attachments Keyword + Confidence Webhook HubSpot
(PDF/DOCX) LLM Thresholds Dispatch Contacts
```
1. **Parse** — Extract text from PDF and DOCX attachments using `unpdf` and `mammoth`
2. **Classify** — Run keyword matching first, then fall back to OpenAI LLM for ambiguous inputs
3. **Route** — Evaluate classification confidence: high confidence routes to sales, medium triggers clarification, low falls back
4. **Handoff** — Serialize lead context and POST to a configured webhook for downstream processing
5. **CRM Sync** — Create or deduplicate HubSpot contacts for routed leads with an email address
## API Reference
### `POST /api/lead`
Submit a lead for processing.
**Headers:**
- `Content-Type: application/json` or `multipart/form-data`
- `x-user-id: <string>` (required)
**Request body (JSON):**
```json
{
"text": "I want to purchase your enterprise plan",
"metadata": { "source": "website" },
"email": "lead@example.com",
"firstName": "John",
"lastName": "Doe",
"company": "Acme Inc"
}
```
**Response (201 — Routed):**
```json
{
"id": "uuid",
"status": "routed",
"message": "Lead processed",
"routingDecision": { "action": "route", "target": "sales", "confidence": 0.92 },
"budgetState": { "scopeKey": "user-1", "spent": 0.0, "limit": 10.0, "remaining": 10.0, "state": "Active" }
}
```
**Status codes:** `201` routed, `200` clarification/fallback, `400` validation error, `401` missing user ID, `500` error
### `GET /api/health`
Health check endpoint.
```json
{ "status": "ok", "uptime": 123.45 }
```
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| `OPENAI_API_KEY` | Yes | — | OpenAI API key for LLM classification |
| `LANGFUSE_PUBLIC_KEY` | Yes | — | Langfuse public key for observability |
| `LANGFUSE_SECRET_KEY` | Yes | — | Langfuse secret key for observability |
| `LANGFUSE_BASE_URL` | No | `https://cloud.langfuse.com` | Langfuse API base URL |
| `HUBSPOT_ACCESS_TOKEN` | Yes | — | HubSpot private app access token |
| `LEAD_HANDOFF_WEBHOOK_URL` | Yes | — | Webhook URL for lead handoff |
| `PORT` | No | `3000` | Express server listen port |
| `ALLOWED_ORIGINS` | No | `*` | Comma-separated CORS allowed origins |
## Testing
```bash
pnpm run test
```
Runs vitest with coverage reports (text, JSON, and JSON-summary reporters). All four coverage thresholds (statements, branches, functions, lines) must meet or exceed 90%.
## Dependencies
- **`@reaatech/confidence-router`** — Confidence-based routing decisions (route/clarify/fallback)
- **`@reaatech/confidence-router-classifiers`** — Keyword and LLM classifier implementations
- **`@reaatech/agent-budget-engine`** — Per-user budget enforcement with auto-downgrade
- **`@reaatech/agent-handoff`** — Handoff payload serialization, typed events, retry logic
- **`openai`** — OpenAI API client for LLM classification
- **`express`** — HTTP server and routing (v5)
- **`unpdf`** — PDF text extraction
- **`mammoth`** — DOCX text extraction
- **`@hubspot/api-client`** — HubSpot CRM contact creation and search
- **`langfuse`** — Observability traces, spans, and events
- **`zod`** — Request validation schemas
- **`uuid`** — Lead ID generation
- **`cors`** — CORS middleware for Express
## License
MIT — see [LICENSE](./LICENSE).