Small real estate agencies lose leads in overflowing inboxes and spend hours manually entering data from buyer forms and pre‑qualification documents into their CRM.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small real-estate agencies lose leads in overflowing inboxes and spend hours manually entering buyer form data and pre-qualification documents into their CRM. This tutorial walks you through building an OpenAI Lead Intake Agent — a Next.js API that accepts form submissions and PDF/DOCX attachments, extracts structured lead data using OpenAI’s Responses API, classifies intent (buyer/seller/renter) with @reaatech/confidence-router, prevents duplicate entries with @reaatech/idempotency-middleware, and writes everything to HubSpot. By the end you’ll have a working API you can drop into any SMB real-estate website.
Also update .env.example so the repo documents every variable. The four Langfuse env vars are optional — the glue code checks for their presence and falls back to no-ops if missing.
Expected output: Both .env.local (with real values) and .env.example (with placeholder values) exist.
Step 3: Define shared domain types
Start with the core types the entire pipeline uses. These live in src/lib/types.ts.
Expected output: 61 lines of type definitions covering every data shape the pipeline touches — from raw form submissions to classified leads to API error payloads.
Step 4: Create typed custom errors
Every failure mode in the pipeline gets its own Error class with a string code property. This lets the API route handler map errors to HTTP status codes without fragile instanceof chains against generic errors.
Expected output: Nine error classes, each carrying a unique code string and domain-specific properties (mimeType, hsStatusCode, step, or detail as appropriate).
Step 5: Add Zod validation schemas
Zod validates incoming form payloads and the structured data extracted from OpenAI. Using z.email() enforces email format, and z.enum() constrains lead sources and intents.
Expected output: Four Zod schemas — FileAttachmentSchema, RawLeadSubmissionSchema, ExtractedLeadSchema, and RoutingResultSchema.
Step 6: Build the document extractor
The DocumentExtractor class converts uploaded PDF, DOCX, and plain-text files into raw text. It delegates to pdf-parse for PDFs and mammoth for DOCX files.
ts
import mammoth from "mammoth";import { PDFParse } from "pdf-parse";import { EmptyFileError, FileParseError, UnsupportedFileTypeError } from "../lib/errors.js";export class DocumentExtractor { static readonly SUPPORTED_MIME_TYPES: readonly string[] = [ "application/pdf", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "text/plain", ]; async extractText(buffer: Buffer, mimeType: string): Promise<string> { if (buffer.length === 0) { throw new EmptyFileError(); } switch (mimeType) { case "application/pdf": return this.extractPdfText(buffer); case "application/vnd.openxmlformats-officedocument.wordprocessingml.document": return this.extractDocxText(buffer); case "text/plain": return buffer.toString("utf-8"); default: throw new UnsupportedFileTypeError(mimeType); } } private async extractPdfText(buffer: Buffer): Promise<string> { try { const parser = new PDFParse(new Uint8Array(buffer)); const result = await parser.getText(); return result.text; } catch (error) { throw new FileParseError("Failed to parse PDF file", { cause: error }); } } private async extractDocxText(buffer: Buffer): Promise<string> { try { const result = await mammoth.extractRawText({ buffer }); return result.value; } catch (error) { throw new FileParseError("Failed to parse DOCX file", { cause: error }); } }}
Save as src/services/document-extractor.ts.
Expected output: A class with a static SUPPORTED_MIME_TYPES array and an extractText method that handles PDF (via PDFParse class instantiation), DOCX (via mammoth.extractRawText), and plain-text buffers, throwing typed errors for empty buffers, unsupported types, and parse failures.
Step 7: Build the OpenAI lead extractor
The LeadExtractor wraps the OpenAI Responses API with a structured tool-call definition. It defines an extract_lead_data function with a JSON Schema that asks the model for firstName, lastName, email, phone, preferredContactMethod, propertyInterest, and notes, plus optional budgetRange and moveTimeline.
ts
import OpenAI from "openai";import { ExtractedLeadSchema } from "../schemas/lead-schemas.js";import { EmptyInputError, ExtractionError, ParsingError } from "../lib/errors.js";import type { ExtractedLead, LeadSource } from "../lib/types.js";const EXTRACTION_INSTRUCTIONS = "You are a real estate lead intake assistant. Extract structured lead data from the provided text. " + "Use the extract_lead_data function to return the information in a structured format. " + "If a field is missing from the text, leave it as an empty string. " + "Always extract firstName, lastName, email, phone, preferredContactMethod, propertyInterest, and notes.";const EXTRACTION_TOOLS: OpenAI.Responses.ResponseCreateParams["tools"] = [ { type:
Save as src/services/lead-extractor.ts.
Expected output: 141 lines. The class sends text to gpt-5.2 with a structured function-call tool definition, parses the tool arguments from the response, and validates them against ExtractedLeadSchema. It handles empty input, API errors, malformed JSON, and failed Zod validation with typed errors.
Step 8: Build the lead classifier using ConfidenceRouter
The LeadClassifier uses @reaatech/confidence-router to classify both lead intent (buyer/seller/renter) and urgency (high/medium/low) based on keyword matching.
ts
import { ConfidenceRouter, KeywordClassifier,} from "@reaatech/confidence-router";import { mergeConfig } from "@reaatech/confidence-router-core";import { ClassificationError } from "../lib/errors.js";import type { ClassifiedLead, DecisionType, ExtractedLead, LeadIntent, LeadUrgency,} from "../lib/types.js";const ROUTER_CONFIG = { routeThreshold: 0.1, fallbackThreshold: 0.05, clarificationEnabled: true,};const BUYER_KEYWORDS =
Save as src/services/lead-classifier.ts.
Expected output: 128 lines. Two ConfidenceRouter instances — one for intent (buyer/seller/renter), one for urgency (high/medium/low). Each registers a KeywordClassifier with domain-specific keyword lists. The classify method maps RoutingDecision.type (ROUTE/CLARIFY/FALLBACK) to DecisionType and extracts confidence, with graceful catch-all fallbacks.
Step 9: Build the HubSpot CRM client
The HubSpotCRM class wraps @hubspot/api-client to search for existing contacts by email, create or update contacts, create deals, and associate deals with contacts.
ts
import { Client } from "@hubspot/api-client";import { HubSpotError } from "../lib/errors.js";import type { ClassifiedLead, ExtractedLead, LeadUrgency } from "../lib/types.js";interface SearchResult { id: string;}function urgencyToDealStage(urgency: LeadUrgency): string { switch (urgency) { case "high": return "qualifiedtobuy"; case "medium": return "appointmentscheduled"; case "low": return "new";
Save as src/services/hubspot-client.ts.
Expected output: 169 lines. Three key operations: getContactByEmail (searches by email, returns ID or null), createOrUpdateContact (upserts with all ExtractedLead fields including preferred_contact_method, property_interest, budget_range, move_timeline), and createDeal (creates a deal and calls associateDealWithContact). Urgency maps to deal stages: high → qualifiedtobuy, medium → appointmentscheduled, low → new.
Step 10: Add idempotency guard and observability
Two supporting services wrap cross-cutting concerns.
Idempotency guard (src/services/idempotency.ts):
ts
import { MemoryAdapter, IdempotencyMiddleware } from "@reaatech/idempotency-middleware";import { idempotentExpress } from "@reaatech/idempotency-middleware-express";export interface IdempotencyDeps { adapter: MemoryAdapter; middleware: IdempotencyMiddleware;}/** * Creates the Express-compatible idempotency middleware function. * Useful when mounting in an Express app; in Next.js App Router * the core IdempotencyMiddleware.execute() API is used instead. */export function buildExpressMiddleware(adapter: MemoryAdapter): void { idempotentExpress(adapter, { ttl: 86_400_000, methods: ["POST"] });}export async function createIdempotencyMiddleware(): Promise<IdempotencyDeps> { const adapter = new MemoryAdapter(); await adapter.connect(); const middleware = new IdempotencyMiddleware(adapter, { ttl: 86_400_000, lockTimeout: 30_000, lockTtl: 60_000, }); return { adapter, middleware };}export function extractIdempotencyKey(headers: Headers): string | undefined { const key = headers.get("idempotency-key") ?? headers.get("Idempotency-Key"); return key ?? undefined;}
The factory creates an in-memory adapter with a 24-hour TTL and 30-second lock timeout. The extractIdempotencyKey helper reads the Idempotency-Key header case-insensitively. An additional buildExpressMiddleware helper wraps the adapter for use in Express apps if needed.
When environment variables are missing, every method becomes a no-op — the pipeline runs without observability overhead.
Expected output: Two files. idempotency.ts exports buildExpressMiddleware, createIdempotencyMiddleware, and extractIdempotencyKey. observability.ts exports createObservability with graceful fallback to no-ops when Langfuse keys are absent.
Step 11: Wire the orchestration layer
The LeadIntakeGlue class orchestrates the full pipeline: idempotency check, file extraction, OpenAI extraction, classification, HubSpot write, and Langfuse tracing — in that order.
ts
import { IdempotencyError } from "@reaatech/idempotency-middleware";import { DocumentExtractor } from "./document-extractor.js";import { LeadExtractor } from "./lead-extractor.js";import { LeadClassifier } from "./lead-classifier.js";import { HubSpotCRM } from "./hubspot-client.js";import { LeadIntakeError } from "../lib/errors.js";import type { ObservationApi, TraceSpan } from "./observability.js";import type { IdempotencyDeps } from "./idempotency.js";import type { ExtractedLead, RawLeadSubmission, RoutingResult } from "../lib/types.js";interface GlueServices { documentExtractor
Save as src/services/lead-intake-glue.ts.
Expected output: 130 lines. processLead wraps the handler in middleware.execute() for idempotency. The handler runs ordered steps: start trace → extract file text → call LeadExtractor → call LeadClassifier → create/update HubSpot contact → create HubSpot deal (with automatic association) → score trace success. Errors are wrapped in LeadIntakeError with error-type-based step detection, except IdempotencyError which propagates through for the route handler to map correctly.
Step 12: Create the API route
The POST /api/lead route handler parses multipart form data, validates it with Zod, instantiates all services, and dispatches errors to appropriate HTTP status codes.
ts
import { type NextRequest, NextResponse } from "next/server";import OpenAI from "openai";import { Client } from "@hubspot/api-client";import { DocumentExtractor } from "../../../src/services/document-extractor.js";import { LeadExtractor } from "../../../src/services/lead-extractor.js";import { LeadClassifier } from "../../../src/services/lead-classifier.js";import { HubSpotCRM } from "../../../src/services/hubspot-client.js";import { LeadIntakeGlue } from "../../../src/services/lead-intake-glue.js";import { createIdempotencyMiddleware, extractIdempotencyKey } from "../../../src/services/idempotency.js";import { createObservability } from "../../../src/services/observability.js"
Save as app/api/lead/route.ts.
Expected output: 229 lines. The POST handler extracts the Idempotency-Key header, parses multipart formData(), reads text fields and file attachments, validates with RawLeadSubmissionSchema, instantiates a singleton LeadIntakeGlue (lazy-loaded), and dispatches every error type to the correct HTTP status. Also exports a GET handler returning a simple status check.
Step 13: Write and run the tests
The test suite uses vitest with manual mocks for all external calls. There are 102 tests across 9 files covering every service and the API route.
Here is one representative test file — the document extractor tests (tests/services/document-extractor.test.ts):
The coverage thresholds are configured in vitest.config.ts at 90% for lines, branches, functions, and statements across runtime code only (src/**/*.ts and app/**/route.ts). UI files (*.tsx) and Next.js layout files are excluded from coverage.
Next steps
Add a pipeline selector — extend LeadClassifier with more intent labels (investor, commercial) using additional KeywordClassifier instances or a custom classifier adapter
Replace the memory adapter — swap MemoryAdapter for a Redis or PostgreSQL-backed adapter in createIdempotencyMiddleware for production deployments across multiple server instances
Plug into a real form — create a small React component at app/lead-form/page.tsx that submits multipart form data with fetch and the Idempotency-Key header, giving your agents a browser-based intake UI
Add webhook notifications — after associateDealWithContact, emit a webhook event so downstream systems (email automations, Slack bots) react to new leads in real time
"function"
,
name: "extract_lead_data",
description: "Extract structured lead information from real estate intake text",
strict: true,
parameters: {
type: "object",
properties: {
firstName: { type: "string", description: "First name of the lead" },
lastName: { type: "string", description: "Last name of the lead" },