Skip to content
reaatechREAATECH

Files · OpenAI Lead Intake Agent for SMB Real Estate

65 (1 binary, 578.6 kB total)attempt 1

README.md·6131 B·markdown
markdown
# OpenAI Lead Intake Agent for SMB Real Estate
 
> Automated lead capture from forms and uploaded documents, with intelligent routing and duplicate prevention for real estate SMBs.
 
## Problem
 
Small real estate agencies lose leads in overflowing inboxes and spend hours manually entering data from buyer forms and pre-qualification documents into their CRM. This recipe automates the entire pipeline: form submission triggers extraction via OpenAI, classification via `@reaatech/confidence-router`, and persistence to HubSpot — all guarded by `@reaatech/idempotency-middleware` to prevent duplicate entries.
 
## Architecture
 
```
multipart form + files


Idempotency-Key check ──→ cache hit → return cached response
       │ cache miss

File extraction (PDF / DOCX / TXT via pdf-parse + mammoth)


OpenAI Responses API (tool-call structured extraction)


@reaatech/confidence-router (intent + urgency classification)


HubSpot CRM (contact create/update + deal creation)


Response (RoutingResult with contactId, dealId)
```
 
Each step is traced via Langfuse (degrades gracefully when unconfigured).
 
## Prerequisites
 
- **OpenAI API key** — from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
- **HubSpot access token** — from a [private app](https://developers.hubspot.com/docs/api/private-apps) in your HubSpot account
- **Langfuse account** (optional) — for observability; see [langfuse.com](https://langfuse.com)
 
## Environment Variables
 
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key for structured extraction |
| `HUBSPOT_ACCESS_TOKEN` | Yes | HubSpot private-app access token |
| `LANGFUSE_PUBLIC_KEY` | No | Langfuse public key (tracing optional) |
| `LANGFUSE_SECRET_KEY` | No | Langfuse secret key (tracing optional) |
| `LANGFUSE_BASE_URL` | No | Langfuse base URL (defaults to cloud) |
 
## API Reference
 
### `POST /api/lead`
 
Submit a lead with form fields and optional file attachments.
 
**Headers**
 
| Header | Required | Description |
|--------|----------|-------------|
| `Idempotency-Key` | Yes | Unique key for idempotency (prevents duplicates) |
 
**Request format**: `multipart/form-data`
 
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `email` | string | Yes | Lead email address |
| `message` | string | Yes | Free-text message from the lead |
| `name` | string | No | Lead full name |
| `phone` | string | No | Lead phone number |
| `source` | string | No | Lead source (`form`, `email`, `document_upload`, `api`; default `form`) |
| `file` | file | No | Attached file (PDF, DOCX, or TXT) |
 
**Example request**
 
```bash
curl -X POST http://localhost:3000/api/lead \
  -H "Idempotency-Key: my-unique-key-123" \
  -F "email=jane@example.com" \
  -F "message=I want to buy a 3-bedroom house in the suburbs" \
  -F "file=@document.pdf"
```
 
**Example response (201)**
 
```json
{
  "idempotencyKey": "my-unique-key-123",
  "decision": "ROUTE",
  "target": "buyer",
  "classified": {
    "extracted": {
      "firstName": "Jane",
      "lastName": "Doe",
      "email": "jane@example.com",
      "phone": "555-0100",
      "preferredContactMethod": "phone",
      "propertyInterest": "3-bedroom house",
      "notes": "Looking in the suburbs",
      "source": "form",
      "rawText": "I want to buy a 3-bedroom house in the suburbs"
    },
    "intent": "buyer",
    "confidence": 0.92,
    "urgency": "medium",
    "decisionType": "ROUTE"
  },
  "contactId": "123456789",
  "dealId": "987654321"
}
```
 
**Error responses**
 
| Status | Code | Description |
|--------|------|-------------|
| 400 | `KEY_REQUIRED` | Missing `Idempotency-Key` header |
| 400 | `VALIDATION_ERROR` | Missing or invalid form fields |
| 409 | `CONFLICT` | Idempotency lock conflict |
| 415 | `UNSUPPORTED_FILE_TYPE` | Unsupported file MIME type |
| 422 | `EXTRACTION_FAILED` | OpenAI extraction error |
| 502 | `HUBSPOT_ERROR` | HubSpot CRM write failure |
 
### `GET /api/lead`
 
Returns a status message indicating the API is operational.
 
## Supported File Types
 
| MIME Type | Extension | Library |
|-----------|-----------|---------|
| `application/pdf` | `.pdf` | pdf-parse |
| `application/vnd.openxmlformats-officedocument.wordprocessingml.document` | `.docx` | mammoth |
| `text/plain` | `.txt` | Built-in |
 
## Idempotency
 
Every `POST` request **must** include an `Idempotency-Key` header with a unique value. The middleware caches the first successful response keyed by this header:
 
- **Cache hit**: duplicate submissions return the original response without re-processing.
- **Cache miss**: the handler executes and the result is cached (24-hour TTL).
- **Concurrent requests**: distributed locking ensures only one handler runs per key.
 
This prevents ghost leads when network retries or double-clicks submit the same form twice.
 
## Local Development
 
```bash
pnpm install
pnpm dev            # start Next.js dev server on localhost:3000
pnpm test           # run vitest with coverage
pnpm typecheck       # TypeScript type checking
pnpm lint            # ESLint (strict type-checked rules)
```
 
## REAA Packages
 
This recipe demonstrates three `@reaatech/*` packages:
 
| Package | Role |
|---------|------|
| `@reaatech/confidence-router` | Classifies lead intent (buyer/seller/renter) and urgency via keyword-based routing with configurable confidence thresholds |
| `@reaatech/idempotency-middleware` | Prevents duplicate lead submissions through distributed locking and response caching |
| `@reaatech/idempotency-middleware-express` | Express adapter for the idempotency middleware (used for type reference; Next.js routes use the core API directly) |
 
## Testing
 
The test suite uses **vitest** with **MSW** for HTTP mocking. All external calls (OpenAI, HubSpot) are mocked.
 
Coverage thresholds (enforced by `vitest.config.ts`):
- Lines: ≥90%
- Branches: ≥90%
- Functions: ≥98%
- Statements: ≥90%
 
## License
 
MIT — see [LICENSE](./LICENSE).