SMBs using live chat or contact forms receive a flood of unqualified inquiries. Sales teams waste time on generic questions and miss real buying signals, while no follow-up history persists across sessions.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
A conversational lead intake agent classifies every incoming chat message into intent categories (pricing, demo, support) using pluggable classifiers. When confidence is high, it routes directly to a sales handoff; under ambiguity it asks clarifying questions. Session continuity preserves the entire conversation with compression so returning leads pick up where they left off, and structured lead records are transferred to a downstream CRM.
This tutorial walks you through building this system with Next.js (App Router), OpenAI, Langfuse, and the @reaatech package family. You’ll create the intent classification tree, a multi-classifier routing engine, session management with token-budget compression, a sales handoff service with retry logic, and Langfuse telemetry wrapping every pipeline step.
Prerequisites
Node.js >= 22 and pnpm (install pnpm with corepack enable && corepack prepare pnpm@latest --activate)
An OpenAI API key with access to gpt-4o-mini and gpt-5.2 (set as OPENAI_API_KEY in .env)
A Langfuse account (cloud or self-hosted) with secret key, public key, and base URL (set as LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_BASE_URL in .env)
Familiarity with Next.js App Router route handlers and TypeScript
Step 1: Create the Next.js project and install dependencies
Start by creating a new Next.js project with TypeScript and the App Router. Then install all the dependencies at their pinned versions.
Expected output:node_modules/ is populated and pnpm typecheck exits without errors.
Step 2: Configure environment variables
Copy the .env.example file and fill in your API keys. Every environment variable the system reads is listed here.
terminal
cp .env.example .env
Your .env.example should contain placeholder entries for every integration:
env
# Env vars used by openai-lead-intake-with-intent-routing-for-smbs.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentOPENAI_API_KEY=<your-openai-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_BASE_URL=https://us.cloud.langfuse.comAGENT_HANDOFF_ENDPOINT=<webhook-or-crm-endpoint>
Fill in your real values in .env (which is gitignored). AGENT_HANDOFF_ENDPOINT points to the webhook or CRM endpoint where qualified leads are delivered.
Step 3: Define shared types with Zod
Create src/lib/types.ts to define the Zod schemas and TypeScript types that flow through the pipeline. This file covers the intent categories, the chat request schema, the lead record schema, and the response schema.
Create src/lib/intents.ts to define the static intent configurations with keywords and FAQ answers. This is both the data source for the keyword classifier and the FAQ fallback handler.
ts
import type { IntentConfig, IntentCategory } from "./types.js";export const INTENT_CONFIGS: IntentConfig[] = [ { label: "pricing", description: "Questions about cost, plans, or billing", keywords: ["pricing", "cost", "price", "how much", "subscription", "plan", "billing", "trial"], faqAnswer: "Our plans start at $29/mo for the Starter tier, which includes up to 500 leads and basic analytics. The Growth tier at $79/mo adds CRM integrations and team accounts. Would you like me to walk through the details?", }, { label: "demo", description: "Requests to see the product in action", keywords: ["demo", "walkthrough", "show me", "see it", "tour", "try"], }, { label: "support", description: "Technical help or issue reports", keywords: ["help", "issue", "problem", "broken", "error", "not working", "bug", "support"], }, { label: "general", description: "Hello messages or company inquiries", keywords: ["hello", "hi", "company", "about", "contact", "info"], faqAnswer: "Thanks for reaching out! We help SMBs capture and route qualified leads. How can I assist you today?", }, { label: "unqualified", description: "Spam, job applications, unsolicited sales", keywords: ["spam", "job", "internship", "sell", "buy followers"], },];export function getIntentConfig(label: IntentCategory): IntentConfig | undefined { return INTENT_CONFIGS.find((c) => c.label === label);}export function getFaqAnswer(label: IntentCategory): string | undefined { const config = getIntentConfig(label); return config?.faqAnswer;}export function buildKeywordClassifierEntries( configs: IntentConfig[],): Array<{ label: string; keywords: string[] }> { return configs.map((c) => ({ label: c.label, keywords: c.keywords }));}
Expected output:getFaqAnswer("pricing") returns the pricing FAQ string. getFaqAnswer("demo") returns undefined since demo has no FAQ answer.
The buildKeywordClassifierEntries function transforms the intent configs into the shape KeywordClassifier from @reaatech/confidence-router-classifiers expects.
Step 5: Create the OpenAI client with rate limiting
Create src/lib/openai-client.ts to wrap the OpenAI SDK with retry logic and concurrency limiting via p-limit.
ts
import OpenAI from "openai";import { limitFunction } from "p-limit";let _openaiClient: OpenAI | undefined;function getOpenAIClient(): OpenAI { if (!_openaiClient) { const apiKey = process.env.OPENAI_API_KEY; if (!apiKey) throw new Error("OPENAI_API_KEY environment variable is required"); _openaiClient = new OpenAI({ apiKey }); } return _openaiClient;}async function generateReply( messages: Array<{ role: "user" | "assistant"; content: string }>,): Promise<string> { const instructions = "You are a helpful SMB sales intake agent. Respond conversationally and professionally."; const input = messages.map((m) => `${m.role}: ${m.content}`).join("\n"); let lastError: Error | undefined; const maxRetries = 3; const baseDelayMs = 500; for (let attempt = 0; attempt <= maxRetries; attempt++) { try { const response = await getOpenAIClient().responses.create({ model: "gpt-5.2", instructions, input, }); return response.output_text; } catch (err: unknown) { lastError = err instanceof Error ? err : new Error(String(err)); if (err instanceof OpenAI.APIError) { if (err.status === 429 && attempt < maxRetries) { const delayMs = baseDelayMs * Math.pow(2, attempt); await new Promise((resolve) => setTimeout(resolve, delayMs)); continue; } } else if (attempt < maxRetries) { const delayMs = baseDelayMs * Math.pow(2, attempt); await new Promise((resolve) => setTimeout(resolve, delayMs)); continue; } throw lastError; } } throw lastError ?? new Error("Failed to generate reply");}const generateReplyLimited = limitFunction(generateReply, { concurrency: 5 });export { _openaiClient as openaiClient, generateReplyLimited };
Expected output: The client lazily initialises the OpenAI SDK on the first call and reuses the connection. generateReplyLimited ensures at most 5 concurrent API calls. On HTTP 429 responses, it retries with exponential backoff (500ms, 1s, 2s). The limitFunction wrapper is essential in production — without it, a burst of concurrent chat requests could swamp both your OpenAI rate limit and your network egress.
Step 6: Implement the classifier service
Create src/lib/classifier-service.ts to wire the ConfidenceRouter from @reaatech/confidence-router with a KeywordClassifier and an LLMClassifier from @reaatech/confidence-router-classifiers.
Expected output: The ConfidenceRouter uses the LLM classifier as the default ("intent-llm"). If the LLM fails or returns low-confidence predictions, the keyword classifier acts as a synchronous fallback. The decision thresholds work as a two-tier system:
Route (confidence >= 0.8) — the intent is clear, generate a reply and trigger a sales handoff
Clarify (confidence >= 0.3 and < 0.8) — the intent is ambiguous, ask a follow-up question
Fallback (confidence < 0.3) — low confidence, serve a static FAQ answer if one exists
Step 7: Build the in-memory storage adapter
Create src/lib/storage-adapter.ts implementing the IStorageAdapter interface from @reaatech/session-continuity. This provides an in-memory backend for session and message storage.
ts
import { ConcurrencyError } from "@reaatech/session-continuity";import type { IStorageAdapter, Session, Message, SessionId, MessageId, SessionFilters, MessageQueryOptions, UpdateSessionOptions, HealthStatus,} from "@reaatech/session-continuity";import { nanoid } from "nanoid";export class InMemoryStorageAdapter implements IStorageAdapter { private sessions = new Map<string, Session>(); private messages = new Map<string, Message[]>();
Expected output: Every method of the IStorageAdapter interface is implemented. Sessions are identified by nanoid-generated IDs, messages carry a monotonic sequence for deterministic ordering, and updateSession enforces optimistic concurrency via expectedVersion.
Step 8: Create the token counter
Create src/lib/token-counter.ts implementing the TokenCounter interface. This simple tokenizer approximates tokens as characters divided by 4 (a reasonable heuristic for English text).
ts
import type { TokenCounter, Message } from "@reaatech/session-continuity";export class SimpleTokenCounter implements TokenCounter { readonly model = "simple"; readonly tokenizer = "char-div-4"; count(text: string): number { if (text.length === 0) return 0; return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { const text = typeof msg.content === "string" ? msg.content : extractTextFromContent(msg.content); total += this.count(text) + 3; } return total; }}function extractTextFromContent(content: Message["content"]): string { if (typeof content === "string") return content; const parts: string[] = []; for (const block of content) { if (block.type === "text") { parts.push(block.text); } } return parts.join(" ");}
Expected output:new SimpleTokenCounter().count("hello world") returns 3 (11 characters / 4, rounded up). The countMessages method adds 3 tokens per message as overhead for role and metadata tokens.
For production you’d replace this with a tokenizer that matches your model’s actual vocabulary (like tiktoken for OpenAI models), but this simple approximation is enough for the sliding-window compression to work correctly.
Step 9: Implement session continuity
Create src/lib/session-service.ts to manage the conversation lifecycle using @reaatech/session-continuity’s SessionManager. This service maps external session IDs to internal IDs and provides methods for loading, messaging, and reading conversation context.
ts
import { SessionManager } from "@reaatech/session-continuity";import type { Session, Message } from "@reaatech/session-continuity";import { InMemoryStorageAdapter } from "./storage-adapter.js";import { SimpleTokenCounter } from "./token-counter.js";import { nanoid } from "nanoid";export class SessionService { private manager: SessionManager; private externalToInternal = new Map<string, string>(); private internalToExternal = new Map<string, string>(); constructor
Expected output: The SessionManager is configured with an 8192-token budget, 1024 reserved tokens for the response, and a sliding-window compression strategy targeting 6000 tokens. When a conversation exceeds the budget, older messages are evicted automatically. The session:created and message:added event handlers log lifecycle events — in production you’d wire these to telemetry.
Step 10: Build the sales handoff service
Create src/lib/sales-handoff.ts to transform a routing decision into a structured LeadRecord and send it to the configured CRM endpoint with exponential retry via @reaatech/agent-handoff.
ts
import { createHandoffConfig, withRetry, TransportError, TimeoutError, RejectionError, RoutingError, HandoffError } from "@reaatech/agent-handoff";import type { HandoffResult, AgentCapabilities } from "@reaatech/agent-handoff";import type { LeadRecord, IntentCategory } from "./types.js";import { LeadRecordSchema } from "./types.js";import { nanoid } from "nanoid";import type { RoutingDecision as CRRoutingDecision } from "@reaatech/confidence-router";import type { Message } from "@reaatech/session-continuity";export class SalesHandoffService { private config: ReturnType<typeof createHandoffConfig>; private
Expected output: When handleRouteDecision is called with a ROUTE decision, it builds a validated LeadRecord and POSTs it to AGENT_HANDOFF_ENDPOINT with up to 3 retries (exponential backoff: 500ms, 1s, 2s). If the handoff fails, it catches each HandoffError subclass (TransportError, TimeoutError, RejectionError, RoutingError) and returns a failure HandoffResult with the appropriate reason.
Step 11: Set up Langfuse telemetry
Create src/lib/telemetry.ts to wrap every pipeline step in a Langfuse trace with named spans. This gives SMB operators visibility into funnel drop-offs and timing.
Expected output: The Langfuse client initialises from environment variables (all made optional so the system works without Langfuse configured). The traceSpan helper creates a named span, runs the async function, records the result on success, or records an ERROR level on failure and re-throws.
Step 12: Configure instrumentation and Next.js for startup telemetry
Create src/instrumentation.ts to run a Langfuse connectivity check on server startup and flush telemetry on SIGTERM. Then enable it in next.config.ts.
The register() function uses a NEXT_RUNTIME === "nodejs" guard so it only runs in the Node.js server, not in the Edge runtime. Dynamic import() is required for the telemetry module because it imports Node-only APIs.
Enable this instrumentation hook in next.config.ts:
Expected output: On pnpm dev, the server console logs [instrumentation] Langfuse connectivity OK when Langfuse is reachable. On SIGTERM, pending telemetry is flushed before the process exits.
Step 13: Wire the API route handler
Create app/api/chat/route.ts to wire all the services together into a single POST endpoint. This is where the classification pipeline, session continuity, sales handoff, and telemetry converge.
ts
import { NextRequest, NextResponse } from "next/server";import { ChatRequestSchema } from "../../../src/lib/types.js";import type { ChatResponse, IntentCategory } from "../../../src/lib/types.js";import { ClassifierService } from "../../../src/lib/classifier-service.js";import type { RoutingDecision } from "../../../src/lib/classifier-service.js";import { SessionService } from "../../../src/lib/session-service.js";import { SalesHandoffService } from "../../../src/lib/sales-handoff.js";import { generateReplyLimited } from "../../../src/lib/openai-client.js";import { getFaqAnswer } from "../../../src/lib/intents.js";import { telemetryTrace, traceSpan, finaliseTrace, langfuseFlush } from "../../../src/lib/telemetry.js"
Expected output: The route handler invokes the full pipeline on each request:
Validates the request body with Zod
Creates a Langfuse trace named chat-handle
Loads or creates a session (logged under session-load span)
Appends the user message to the session history (add-user-message span)
Classifies the intent and decides the action (classify-and-route span)
Based on the decision type, generates a reply with context:
ROUTE — passes the handoff context so the LLM knows the lead was already forwarded; triggers sales handoff (sales-handoff span)
CLARIFY — passes an ambiguity context with the suspected intent name
FALLBACK — returns a static FAQ answer if available, otherwise generates a general reply
Persists the assistant reply to session history
Finalises and flushes the Langfuse trace
Returns the JSON response with reply, sessionId, action, and optional lead
On any unhandled error, the catch block flushes Langfuse and returns a 500.
Step 14: Run the tests
The project comes with a complete vitest test suite covering every module. Run it to verify everything works end-to-end.
terminal
pnpm test
Expected output: All test files pass. The suite covers 11 test files that together exercise every layer of the system:
API chat route — mocks all services and tests every decision branch: route with lead, route without lead (null handoff), clarify, fallback with FAQ, fallback without FAQ, empty content (400), service errors (500), and long messages
Session service — session creation, message add/retrieve, session caching for repeat external IDs, stale mapping recovery, stats, and event subscription
OpenAI client — missing API key throws, successful reply generation, 429 retry behavior, empty messages, and non-API error retries
In-memory storage adapter — CRUD for sessions and messages, all filter combinations (userId, status, activeAgentId, tags, date ranges, limit, offset), concurrency error on version mismatch, expiry detection, health check, and close/cleanup
Telemetry — trace creation with sessionId, span success/failure recording, non-Error rejection handling, finaliseTrace flush, langfuseFlush, and env var passthrough to Langfuse constructor
Intent configs and types — Zod schema validation, intent config shape verification, FAQ answer lookup, and keyword classifier entry generation
Classifier service — ROUTE/CLARIFY/FALLBACK decision delegation and the exact 0.8 boundary
Next steps
Add a database-backed storage adapter — swap InMemoryStorageAdapter for a PostgreSQL or DynamoDB adapter (the IStorageAdapter interface makes this a drop-in replacement)
Integrate a real CRM — replace the webhook endpoint with Salesforce, HubSpot, or a custom CRM API, parsing the LeadRecord into the target schema
Add an embedding classifier — register an EmbeddingSimilarityClassifier from @reaatech/confidence-router-classifiers to catch semantically similar intents that don’t share keywords
Build a web chat UI — create a Next.js client component that POSTs to /api/chat, displays the conversation, and passes sessionId back on every request for continuity
Add an admin dashboard — query Langfuse traces by session to visualise the intent funnel, handoff rates, and clarification-to-route conversion
Multi-language support — configure the ConfidenceRouter with clarificationLanguages: ["en", "es", "fr"] and provide localised intent configs
private messageSequences = new Map<string, number>();
const handoffContext = "[SYSTEM: The lead has been forwarded to the sales team for follow-up. Your reply should confirm the handoff and set expectations.]";
const clarificationContext = `[SYSTEM: The user's intent is ambiguous — it might be "${target}". Ask a clarifying question to determine whether they need help with ${target} or something else.]`;