A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an AI-powered lead intake webhook for small law firms. You’ll create an Express server (co-hosted inside a Next.js project) that uses Claude to qualify incoming leads, extract structured client data, route inquiries to the right practice area, and create matters in Clio — all with cost tracking, idempotency, and session continuity via DynamoDB.
Prerequisites
Node.js 22+ and pnpm 10 installed
An Anthropic API key (for Claude Haiku and Sonnet)
A Clio Manage account with OAuth 2.0 client credentials (client ID, client secret, account ID)
An AWS account with DynamoDB (for session storage) — or a DynamoDB local instance for development
A Langfuse account (free tier works) for observability
Familiarity with TypeScript, Express, and Next.js basics
Step 1: Scaffold the project and install dependencies
Create a new directory and initialize your project:
Now create src/lib/config.ts to load and validate these values with Zod:
ts
import { loadConfig } from "@reaatech/llm-cost-telemetry";import { z } from "zod";const appConfigSchema = z.object({ ANTHROPIC_API_KEY: z.string().min(1, "ANTHROPIC_API_KEY is required"), CLIO_CLIENT_ID: z.string().min(1, "CLIO_CLIENT_ID is required"), CLIO_CLIENT_SECRET: z.string().min(1, "CLIO_CLIENT_SECRET is required"), CLIO_ACCOUNT_ID: z.string().min(1, "CLIO_ACCOUNT_ID is required"), AWS_REGION: z.string().optional().default("us-east-1"), DYNAMODB_TABLE_NAME: z.string().optional().default("sessions"), LANGFUSE_SECRET_KEY: z.string().optional().default(""), LANGFUSE_PUBLIC_KEY: z.string().optional().default(""), LANGFUSE_HOST: z.string().optional().default("https://cloud.langfuse.com"), PORT: z.coerce.number().optional().default(3001), DEFAULT_DAILY_BUDGET: z.coerce.number().optional().default(5.0),});export function loadAppConfig() { const reaaConfig = loadConfig(); const parsed = appConfigSchema.parse(process.env); return { ...reaaConfig, anthropicApiKey: parsed.ANTHROPIC_API_KEY, clioClientId: parsed.CLIO_CLIENT_ID, clioClientSecret: parsed.CLIO_CLIENT_SECRET, clioAccountId: parsed.CLIO_ACCOUNT_ID, awsRegion: parsed.AWS_REGION, dynamoDbTableName: parsed.DYNAMODB_TABLE_NAME, langfuseSecretKey: parsed.LANGFUSE_SECRET_KEY, langfusePublicKey: parsed.LANGFUSE_PUBLIC_KEY, langfuseHost: parsed.LANGFUSE_HOST, port: parsed.PORT, defaultDailyBudget: parsed.DEFAULT_DAILY_BUDGET, };}export type AppConfig = ReturnType<typeof loadAppConfig>;
Expected output: Running npx tsx src/lib/config.ts either logs an error about a missing env var or returns a config object silently. The Zod schema provides clear error messages for each missing required field.
Step 3: Define the core types
Create src/lib/types.ts with the data structures that flow through your system — contacts, matters, lead messages, classifications, and extracted client data:
Create src/lib/llm-service.ts — this wraps the Anthropic SDK to provide three capabilities: classifying intents (using Claude Haiku for speed and low cost), extracting structured client data (using Claude Sonnet for accuracy), and generating clarifying questions (using Haiku):
classifyIntent uses Haiku (cheap, fast) — every message gets classified first
extractClientData uses Sonnet (expensive, accurate) — only called when the lead is qualified
Both methods return usage metadata for cost tracking downstream
Raw LLM text is preserved via rawText so the repair pipeline can fix malformed JSON
Expected output: The service is pure logic with no side effects. It calls the Anthropic API and returns typed results.
Step 5: Build the lead classifier with confidence routing
Create src/lib/lead-classifier.ts. This combines Claude’s intent classification with the @reaatech/confidence-router package to produce a routing decision (ROUTE / CLARIFY / FALLBACK):
Expected output: When you call classify() with a high-confidence intent like “I need a divorce lawyer”, the router returns a ROUTE decision. Low-confidence messages return FALLBACK.
Step 6: Build the output repair module
When Claude returns JSON, it sometimes wraps it in markdown fences or includes trailing commas. Create src/lib/output-repair.ts to handle these edge cases using @reaatech/structured-repair-core:
ts
import { z } from "zod";import { repair, repairOutput, isValid } from "@reaatech/structured-repair-core";import { ExtractedClientData } from "./types.js";const PRACTICE_AREAS = [ "family-law", "personal-injury", "criminal-defense", "real-estate", "business-formation", "estate-planning", "general",] as const;const ExtractedClientDataSchema = z.object({ name: z.string().optional(), email: z.email().optional(), phone: z.string().optional(), description: z.string(), practiceArea: z.enum(PRACTICE_AREAS).optional(), urgency: z.enum(["low", "medium", "high"]).optional(),}).strict();export function repairClientData(raw: string): Promise<ExtractedClientData> { return repair(ExtractedClientDataSchema, raw);}export function repairWithDiagnostics(raw: string) { return repairOutput({ schema: ExtractedClientDataSchema, input: raw, debug: true });}export function validateExtractedData(raw: string): boolean { return isValid(ExtractedClientDataSchema, raw);}
The isValid function is a quick gate — use it to pre-check before attempting a full repair. The repair function is the workhorse: it strips fences, fixes trailing commas, and parses the JSON against your schema.
Step 7: Build the practice area router
Create src/lib/agent-router.ts to route qualified leads to the right practice area specialist. This registers seven practice area agents and uses @reaatech/agent-handoff-routing for capability-based routing:
ts
import { createHandoffConfig, HandoffError, withRetry } from "@reaatech/agent-handoff";import type { AgentCapabilities, HandoffPayload, RoutingDecision } from "@reaatech/agent-handoff";import { CapabilityBasedRouter, AgentRegistry } from "@reaatech/agent-handoff-routing";import { ExtractedClientData, PracticeArea } from "./types.js";const AGENT_IDS: PracticeArea[] = [ "family-law", "personal-injury", "criminal-defense", "real-estate", "business-formation", "estate-planning", "general",];const AGENT_SKILLS: Record<PracticeArea,
Key design insight:withRetry wraps the routing call with exponential backoff (200ms base, 5s max, 3 retries). If all retries fail with a HandoffError, it gracefully falls back to "general" — the lead still gets a response.
Step 8: Build the Clio client
Create src/lib/clio-client.ts to authenticate with Clio’s OAuth 2.0 (using JWT client assertions via the jose library) and manage contacts and matters via the Clio REST API:
ts
import * as jose from "jose";import { ClioContact, ClioMatter, ClioApiError } from "./types.js";interface ClioTokenResponse { access_token: string; expires_in?: number;}async function parseErrorBody(response: Response): Promise<Record<string, unknown>> { try { const text = await response.text(); return JSON.parse(text) as Record<string,
Key design decisions:
getAccessToken() caches the token and refreshes it 60 seconds before expiry
findOrCreateContact searches by name first, then creates if not found — avoiding duplicate contacts
Helper functions parseErrorBody, parseJson, and safeString centralize error parsing so Clio’s sometimes non-JSON error responses don’t crash the handler
Step 9: Build the telemetry and cost tracking service
Create src/lib/telemetry.ts to track LLM costs and export observability data to Langfuse. This uses @reaatech/llm-cost-telemetry for cost calculation:
Expected output:recordCostSpan with inputTokens: 500, outputTokens: 200, model: "claude-sonnet-4-6" produces a cost of approximately (500/1_000_000 * 3.00) + (200/1_000_000 * 15.00) = $0.0045.
Step 10: Build session manager and idempotency middleware
Create src/lib/session-manager.ts to maintain conversation continuity across webhook calls using DynamoDB:
ts
import { SessionManager, Session, SessionNotFoundError, type Message, type TokenCounter } from "@reaatech/session-continuity";import { DynamoDBAdapter } from "@reaatech/session-continuity-storage-dynamodb";import { DynamoDBClient } from "@aws-sdk/client-dynamodb";import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";export class SimpleTokenCounter implements TokenCounter { readonly model = "claude-sonnet-4-20250514"; readonly tokenizer = "simple-estimate"; count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { let text = ""; if (typeof msg.content === "string") { text = msg.content; } else { for (const block of msg.content) { if (block.type === "text") { text += block.text + " "; } } } total += this.count(text) + 8; } return total; }}let cachedSessionManager: SessionManager | null = null;export function getSessionManager(region: string, tableName: string): SessionManager { if (cachedSessionManager) { return cachedSessionManager; } const ddbClient = new DynamoDBClient({ region }); const ddbDocClient = DynamoDBDocumentClient.from(ddbClient); cachedSessionManager = new SessionManager({ storage: new DynamoDBAdapter({ client: ddbDocClient, tableName }), tokenCounter: new SimpleTokenCounter(), tokenBudget: { maxTokens: 4096, reserveTokens: 500, overflowStrategy: "compress", }, compression: { strategy: "sliding_window", targetTokens: 3500, }, }); return cachedSessionManager;}export async function ensureSession( sessionId: string, sessionManager: SessionManager,): Promise<Session> { try { return await sessionManager.getSession(sessionId); } catch (error) { if (error instanceof SessionNotFoundError) { return await sessionManager.createSession({ userId: sessionId }); } throw error; }}
Now create src/lib/idempotency.ts. This ensures that if the same webhook event arrives twice (due to retries), the Clio API is only called once:
ts
interface IdempotencyEntry { status: "pending" | "completed"; result?: unknown; expiresAt: number;}const idempotencyStore = new Map<string, IdempotencyEntry>();const CLEANUP_INTERVAL_MS = 60_000;export function cleanupExpiredEntries(): void { const now = Date.now(); for (const [key, entry] of idempotencyStore) { if (entry.expiresAt < now) { idempotencyStore.delete(key); } }}setInterval(cleanupExpiredEntries, CLEANUP_INTERVAL_MS);export async function generateIdempotencyKey(sessionId: string, lastMessageContent: string): Promise<string> { const combined = `${sessionId}:${lastMessageContent}`; const encoded = new TextEncoder().encode(combined); const hashBuffer = await crypto.subtle.digest("SHA-256", encoded); const hashArray = Array.from(new Uint8Array(hashBuffer)); const hashHex = hashArray.map((b) => b.toString(16).padStart(2, "0")).join(""); return `idem-${sessionId.slice(0, 8)}-${hashHex}`;}export async function withIdempotency<T>( key: string, ttlMs: number, fn: () => Promise<T>,): Promise<T> { const existing = idempotencyStore.get(key); if (existing && existing.expiresAt > Date.now()) { if (existing.status === "completed") { return existing.result as T; } return await pollForResult<T>(key); } if (existing && existing.expiresAt <= Date.now()) { idempotencyStore.delete(key); } const entry: IdempotencyEntry = { status: "pending", expiresAt: Date.now() + ttlMs, }; idempotencyStore.set(key, entry); try { const result = await fn(); entry.status = "completed"; entry.result = result; return result; } catch (error) { idempotencyStore.delete(key); throw error; }}async function pollForResult<T>(key: string, intervalMs = 100, timeoutMs = 30_000): Promise<T> { const start = Date.now(); while (Date.now() - start < timeoutMs) { const entry = idempotencyStore.get(key); if (!entry) { throw new Error("Idempotency entry removed while polling"); } if (entry.status === "completed") { return entry.result as T; } await new Promise((resolve) => setTimeout(resolve, intervalMs)); } throw new Error("Idempotency polling timed out");}
Step 11: Wire it all together in the webhook handler
Create src/webhook/clio-intake.ts. This is the main orchestration function — it receives a webhook event, coordinates every service, and returns the result:
ts
import type { Request, Response } from "express";import { z } from "zod";import { LeadClassifierService } from "../lib/lead-classifier.js";import { PracticeAreaRouter } from "../lib/agent-router.js";import { ClaudeService } from "../lib/llm-service.js";import { ClioClient } from "../lib/clio-client.js";import { repairClientData } from "../lib/output-repair.js";import { TelemetryService } from "../lib/telemetry.js";import { getSessionManager, ensureSession } from "../lib/session-manager.js";import { generateIdempotencyKey, withIdempotency } from "../lib/idempotency.js";import { loadAppConfig }
The handler implements a three-branch routing decision:
ROUTE (confidence >= 0.8): Extract data with Sonnet, repair JSON, route to practice area, create Clio matter
CLARIFY (confidence between 0.3 and 0.8): Generate a clarification question using Haiku
FALLBACK (confidence < 0.3): Return a polite message promising follow-up
Step 12: Create the Express server
Create src/server.ts to mount the webhook handler, initialize telemetry, and set up graceful shutdown:
The server exports createApp() so it can be tested without binding to a real port. When run directly (npx tsx src/server.ts), it starts listening on the configured PORT. On SIGTERM/SIGINT, it flushes Langfuse telemetry and closes the DynamoDB session manager before exiting.
All tests pass and coverage exceeds 90% across lines, branches, functions, and statements.
Next steps
Add a chat UI frontend — the project includes a Next.js App Router scaffold (app/page.tsx). Build an interactive chat component that posts messages to /api/webhook/clio-intake and renders the assistant responses as chat bubbles
Extend the practice area registry — add more specialized agents (immigration law, employment law, intellectual property) with their own skill sets
Replace in-memory idempotency — switch the idempotency store to Redis or DynamoDB so it survives server restarts across multiple instances
Add invoice tracking — after a matter is created, automatically generate and send a retainer agreement via Clio’s billing API
Connect to Twilio or Telegram — receive incoming SMS or Telegram messages, pipe them through the webhook handler, and reply as the law firm’s intake bot
null
};
usage: { inputTokens: number; outputTokens: number };
"Extract structured client data from the conversation. Return JSON with keys: name, email, phone, description, practiceArea, urgency (low/medium/high).",