SMB sales teams lose revenue because inbound leads fall through the cracks or receive no timely follow-up. Manual qualification is slow, inconsistent, and often misses key buyer signals hidden in form submissions and attached documents.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an Express server that receives inbound leads, parses attached PDF and DOCX files, classifies intent through a keyword matcher and OpenAI LLM, routes each lead by confidence score, hands off high-confidence leads to a webhook, and syncs contacts to HubSpot. You will wire together six services and end with a running server you can POST lead data against.
Prerequisites
Node.js 22 or later
pnpm installed (npm install -g pnpm)
An OpenAI API key with access to gpt-5.2-mini
A Langfuse project (self-hosted or cloud.langfuse.com)
A HubSpot private app access token
A webhook URL to receive routed leads
Step 1: Install dependencies
Start from the project root. Copy the .env.example file, fill in your keys, then install all packages:
terminal
cp .env.example .envpnpm install
Expected output: pnpm prints resolution tables and concludes with Done.
The package.json pins all dependencies exactly, including the four REAA packages:
The Express server reads process.env at startup. Every service that calls an external API reads its own key from this file — nothing is hardcoded.
Step 3: Define shared types
Create src/lib/types.ts. Every service in this recipe shares these interfaces, so TypeScript can verify the shape of data as it flows through the pipeline:
ts
import { z } from "zod";export interface FileAttachment { filename: string; mimeType: string; buffer:
The LeadRequest Zod schema validates incoming API payloads: text is required and capped at 10,000 characters, while email, firstName, lastName, company, and metadata are all optional.
Step 4: Build the document parser
Create src/lib/parser.ts. This module extracts text from PDF and DOCX attachments so their content feeds into the classifier alongside the form text:
ts
import { extractText, getDocumentProxy } from "unpdf";import mammoth from "mammoth";import type { FileAttachment, ParsedDocument } from "./types.js";export
parseFile dispatches on MIME type: PDF uses unpdf’s getDocumentProxy + extractText, DOCX uses mammoth’s extractRawText. Any other MIME type throws a ParserError with code unsupported_mime_type. parseFiles wraps each parse in Promise.allSettled so one bad attachment does not crash the whole batch — only the successful parses are returned.
Step 5: Create the in-memory spend store
Create src/services/spend-store.ts. The BudgetController from @reaatech/agent-budget-engine expects a SpendStore instance. This in-memory implementation satisfies that interface without external infrastructure:
ts
import { SpendStore } from "@reaatech/agent-budget-spend-tracker";export function createInMemorySpendStore(): SpendStore { return new SpendStore();}
The SpendStore class from @reaatech/agent-budget-spend-tracker already implements the full record, getSpend, getAllScopes, and reset API surface. The factory just returns a fresh instance shared across all requests.
Step 6: Set up the confidence classifier
Create src/services/classifier.ts. This wires the two classifiers from @reaatech/confidence-router-classifiers into a registry: a keyword classifier runs first (fast, no API cost), then falls back to the OpenAI LLM classifier for ambiguous inputs:
ts
import { ClassifierRegistry, KeywordClassifier, LLMClassifier } from "@reaatech/confidence-router-classifiers";interface ClassificationResult { predictions: Array<{ label: string; confidence: number }>; metadata?: Record<
getFallbackChain tries each enabled classifier in priority order until one succeeds. If all fail, the safe fallback returns "other" at confidence 1.0 so the pipeline always has a result.
Step 7: Configure the OpenAI pricing provider and budget controller
Create src/services/pricing.ts first, then src/services/budget.ts.
src/services/pricing.ts implements the PricingProvider interface expected by BudgetController:
src/services/budget.ts wraps the BudgetController with per-user budget operations:
registerBudgetEvents wires hard-stop and threshold-breach controller events into Langfuse so every budget event is traceable.
Step 8: Set up lead routing
Create src/services/router.ts. This wraps the ConfidenceRouter from @reaatech/confidence-router and maps its internal decision types to the RoutingOutcome shape used throughout the pipeline:
ts
import { ConfidenceRouter } from "@reaatech/confidence-router";import type { RoutingOutcome } from "../lib/types.js";interface ClassificationResult { predictions: Array<{ label: string; confidence: number }>; metadata?: Record<string, unknown>;}
The default thresholds are 0.7 for routing and 0.3 for falling back. Confidence between those bounds triggers a clarification prompt. Pass a custom config to createLeadRouter to adjust these values.
Step 9: Wire up the Express server
Create server.ts at the project root. This is the entry point — it creates all service singletons once, wires them into the LeadProcessor, and mounts the HTTP routes:
ts
import express from "express";import cors from "cors";import { createLeadRouter }
Start the server with:
terminal
npx tsx server.ts
Expected output:Lead intake server listening on port 3000.
Send a test request:
terminal
curl -X POST http://localhost:3000/api/lead \ -H "Content-Type: application/json" \ -H "x-user-id: user-1" \ -d '{"text": "I want to buy a demo of your enterprise plan"}'
Replace the hardcoded defineLeadBudget call with a per-request budget check so each user gets their own spend limit from a database or JWT claim.
Add OpenTelemetry exports alongside Langfuse so traces flow into your existing observability platform.
Extend the ClassifierRegistry with a third classifier (e.g., a vector-search embedding classifier) to improve accuracy on ambiguous inputs before falling back to the LLM.
Configure express.raw with a verify function to validate multipart body size and reject oversized uploads before they reach the handler.