Files · OpenAI Lead Intake Agent for SMB Real Estate
65 (1 binary, 578.6 kB total)attempt 1
README.md·6131 B·markdown
markdown
# OpenAI Lead Intake Agent for SMB Real Estate
> Automated lead capture from forms and uploaded documents, with intelligent routing and duplicate prevention for real estate SMBs.
## Problem
Small real estate agencies lose leads in overflowing inboxes and spend hours manually entering data from buyer forms and pre-qualification documents into their CRM. This recipe automates the entire pipeline: form submission triggers extraction via OpenAI, classification via `@reaatech/confidence-router`, and persistence to HubSpot — all guarded by `@reaatech/idempotency-middleware` to prevent duplicate entries.
## Architecture
```
multipart form + files
│
▼
Idempotency-Key check ──→ cache hit → return cached response
│ cache miss
▼
File extraction (PDF / DOCX / TXT via pdf-parse + mammoth)
│
▼
OpenAI Responses API (tool-call structured extraction)
│
▼
@reaatech/confidence-router (intent + urgency classification)
│
▼
HubSpot CRM (contact create/update + deal creation)
│
▼
Response (RoutingResult with contactId, dealId)
```
Each step is traced via Langfuse (degrades gracefully when unconfigured).
## Prerequisites
- **OpenAI API key** — from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **HubSpot access token** — from a [private app](https://developers.hubspot.com/docs/api/private-apps) in your HubSpot account
- **Langfuse account** (optional) — for observability; see [langfuse.com](https://langfuse.com)
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key for structured extraction |
| `HUBSPOT_ACCESS_TOKEN` | Yes | HubSpot private-app access token |
| `LANGFUSE_PUBLIC_KEY` | No | Langfuse public key (tracing optional) |
| `LANGFUSE_SECRET_KEY` | No | Langfuse secret key (tracing optional) |
| `LANGFUSE_BASE_URL` | No | Langfuse base URL (defaults to cloud) |
## API Reference
### `POST /api/lead`
Submit a lead with form fields and optional file attachments.
**Headers**
| Header | Required | Description |
|--------|----------|-------------|
| `Idempotency-Key` | Yes | Unique key for idempotency (prevents duplicates) |
**Request format**: `multipart/form-data`
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `email` | string | Yes | Lead email address |
| `message` | string | Yes | Free-text message from the lead |
| `name` | string | No | Lead full name |
| `phone` | string | No | Lead phone number |
| `source` | string | No | Lead source (`form`, `email`, `document_upload`, `api`; default `form`) |
| `file` | file | No | Attached file (PDF, DOCX, or TXT) |
**Example request**
```bash
curl -X POST http://localhost:3000/api/lead \
-H "Idempotency-Key: my-unique-key-123" \
-F "email=jane@example.com" \
-F "message=I want to buy a 3-bedroom house in the suburbs" \
-F "file=@document.pdf"
```
**Example response (201)**
```json
{
"idempotencyKey": "my-unique-key-123",
"decision": "ROUTE",
"target": "buyer",
"classified": {
"extracted": {
"firstName": "Jane",
"lastName": "Doe",
"email": "jane@example.com",
"phone": "555-0100",
"preferredContactMethod": "phone",
"propertyInterest": "3-bedroom house",
"notes": "Looking in the suburbs",
"source": "form",
"rawText": "I want to buy a 3-bedroom house in the suburbs"
},
"intent": "buyer",
"confidence": 0.92,
"urgency": "medium",
"decisionType": "ROUTE"
},
"contactId": "123456789",
"dealId": "987654321"
}
```
**Error responses**
| Status | Code | Description |
|--------|------|-------------|
| 400 | `KEY_REQUIRED` | Missing `Idempotency-Key` header |
| 400 | `VALIDATION_ERROR` | Missing or invalid form fields |
| 409 | `CONFLICT` | Idempotency lock conflict |
| 415 | `UNSUPPORTED_FILE_TYPE` | Unsupported file MIME type |
| 422 | `EXTRACTION_FAILED` | OpenAI extraction error |
| 502 | `HUBSPOT_ERROR` | HubSpot CRM write failure |
### `GET /api/lead`
Returns a status message indicating the API is operational.
## Supported File Types
| MIME Type | Extension | Library |
|-----------|-----------|---------|
| `application/pdf` | `.pdf` | pdf-parse |
| `application/vnd.openxmlformats-officedocument.wordprocessingml.document` | `.docx` | mammoth |
| `text/plain` | `.txt` | Built-in |
## Idempotency
Every `POST` request **must** include an `Idempotency-Key` header with a unique value. The middleware caches the first successful response keyed by this header:
- **Cache hit**: duplicate submissions return the original response without re-processing.
- **Cache miss**: the handler executes and the result is cached (24-hour TTL).
- **Concurrent requests**: distributed locking ensures only one handler runs per key.
This prevents ghost leads when network retries or double-clicks submit the same form twice.
## Local Development
```bash
pnpm install
pnpm dev # start Next.js dev server on localhost:3000
pnpm test # run vitest with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint (strict type-checked rules)
```
## REAA Packages
This recipe demonstrates three `@reaatech/*` packages:
| Package | Role |
|---------|------|
| `@reaatech/confidence-router` | Classifies lead intent (buyer/seller/renter) and urgency via keyword-based routing with configurable confidence thresholds |
| `@reaatech/idempotency-middleware` | Prevents duplicate lead submissions through distributed locking and response caching |
| `@reaatech/idempotency-middleware-express` | Express adapter for the idempotency middleware (used for type reference; Next.js routes use the core API directly) |
## Testing
The test suite uses **vitest** with **MSW** for HTTP mocking. All external calls (OpenAI, HubSpot) are mocked.
Coverage thresholds (enforced by `vitest.config.ts`):
- Lines: ≥90%
- Branches: ≥90%
- Functions: ≥98%
- Statements: ≥90%
## License
MIT — see [LICENSE](./LICENSE).