A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a conversational lead intake system that qualifies inbound leads, classifies their intent, and automatically books meetings via Chili Piper when a hot lead comes in. You’ll use Google Gemini (Vertex AI) for language understanding, the @reaatech package family for confidence-based routing and session management, and Redis for conversation persistence. By the end, you’ll have a ready-to-deploy Next.js API that turns a chat message like “I want to book a demo” into a scheduled Chili Piper meeting with a personalized confirmation.
Prerequisites
Node.js >= 22 and pnpm 10 installed
A GCP project with the Vertex AI API enabled and a service account key downloaded
A Redis instance (local or remote) — redis://localhost:6379 works for development
A Chili Piper account with OAuth2 client credentials (client ID and secret)
Familiarity with TypeScript, Next.js App Router route handlers, and basic Redis concepts
Step 1: Set up environment variables
Copy the example env file and fill in your credentials. The project reads all configuration from environment variables at runtime.
terminal
cp .env.example .env.local
Open .env.local and replace every <...> placeholder with your real credentials. The file expects these values:
env
# --- Google Vertex AI (Gemini) ---GOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1GOOGLE_GENAI_USE_ENTERPRISE=trueGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json# --- Redis (session-continuity) ---REDIS_URL=redis://localhost:6379# --- Chili Piper (meeting booking) ---CHILIPIPER_CLIENT_ID=<your-chilipiper-client-id>CHILIPIPER_CLIENT_SECRET=<your-chilipiper-client-secret>CHILIPIPER_BASE_URL=https://api.chilipiper.com# --- Langfuse (observability) ---LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key># --- Routing thresholds ---ROUTE_THRESHOLD=0.8FALLBACK_THRESHOLD=0.3# --- Budget ---MAX_SESSION_TOKENS=4096DAILY_BUDGET_USD=5.0
The GOOGLE_GENAI_USE_ENTERPRISE=true flag instructs the @google/genai SDK to route through Vertex AI instead of the public Gemini API. The threshold values control when the router books a meeting (>= 0.8), asks a clarifying question (between 0.3 and 0.8), or falls back to an FAQ response (< 0.3). The Langfuse keys are optional — the config loader treats them as such, so the system runs without observability if you skip them.
Expected output: An .env.local file with all placeholders replaced by your real credentials. The config loader in src/lib/config.ts will validate these at runtime and throw a descriptive error if any required value is missing.
Step 2: Install dependencies
The project is already scaffolded with all dependencies pinned at exact versions. Install them with pnpm:
terminal
pnpm install
Your package.json lists every dependency at an exact version — no ^ or ~ prefixes. The key packages are:
@google/genai@2.10.0 — Google Gemini SDK for Vertex AI
Expected output: Three type modules under src/types/ defining 6 lead intents, the Chili Piper API surface (OAuth tokens, meeting booking, time slots), and the chat protocol.
Step 4: Create the configuration loader
Create src/lib/config.ts to load and validate all environment variables through a single Zod schema. The module reads from process.env, validates every field, and caches the result so loadAppConfig() can be called anywhere without repeated parsing:
Expected output: A typed, validated, cached config object. Calling loadAppConfig() the first time parses all env vars through Zod — if GOOGLE_CLOUD_PROJECT is missing, it throws with a clear message. Subsequent calls return the cached object instantly.
Step 5: Initialize the Gemini client
Create src/lib/llm.ts — the Gemini client wrapper that handles model calls, cost tracking, and error normalization. It creates a single GoogleGenAI instance at module scope (with enterprise: true to route through Vertex AI), then exposes three functions:
ts
import { GoogleGenAI, ApiError } from "@google/genai";import { recordCost } from "./budget.js";const ai = new GoogleGenAI({ enterprise: true, project: process.env.GOOGLE_CLOUD_PROJECT, location: process.env.GOOGLE_CLOUD_LOCATION,});export class LLMError extends Error { status: number; provider: string; constructor(params: { status: number; message: string; provider: string }) { super(params.message); this.name = "LLMError"; this.status = params.status; this.provider = params.provider; }}export async function generateResponse(prompt: string): Promise<string> { try { const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: prompt, }); const usage = response.usageMetadata; if (usage) { recordCost({ model: "gemini-2.5-flash", inputTokens: usage.promptTokenCount ?? 0, outputTokens: usage.candidatesTokenCount ?? 0, feature: "generateResponse", }); } return response.text ?? ""; } catch (error) { if (error instanceof ApiError) { throw new LLMError({ status: error.status, message: error.message, provider: "google", }); } throw error; }}export async function generateStreamedResponse(prompt: string) { try { const stream = await ai.models.generateContentStream({ model: "gemini-2.5-flash", contents: prompt, }); return stream; } catch (error) { if (error instanceof ApiError) { throw new LLMError({ status: error.status, message: error.message, provider: "google", }); } throw error; }}export async function classifyIntent( input: string, labels: string[],): Promise<{ predictions: Array<{ label: string; confidence: number }> }> { const labelsStr = labels.map((l) => `"${l}"`).join(", "); const classificationPrompt = `Classify the following input into one or more of these labels: ${labelsStr}.\nInput: "${input}"\nRespond with a JSON object containing a "predictions" array of objects with "label" (string) and "confidence" (number between 0 and 1) keys. Return ONLY valid JSON, no markdown fences or extra text.`; try { const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: classificationPrompt, }); const usage = response.usageMetadata; if (usage) { recordCost({ model: "gemini-2.5-flash", inputTokens: usage.promptTokenCount ?? 0, outputTokens: usage.candidatesTokenCount ?? 0, feature: "classifyIntent", }); } const text = response.text ?? "{}"; const parsed = JSON.parse(text) as { predictions: Array<{ label: string; confidence: number }>; }; return parsed; } catch (error) { if (error instanceof ApiError) { throw new LLMError({ status: error.status, message: error.message, provider: "google", }); } throw error; }}
Expected output: The enterprise: true flag in the constructor routes all requests through Vertex AI (not the public Gemini API). Every successful call logs a cost span to the budget module. API errors are caught and rethrown as typed LLMError instances with status codes for the caller to handle gracefully.
Step 6: Add cost telemetry
Create src/lib/budget.ts to track per-request Gemini spend and enforce a daily budget:
Expected output: An in-memory cost store that records each Gemini call with token counts and a USD cost. The checkBudgetExceeded function computes today’s window boundaries via getWindowStart/getWindowEnd and sums all spans inside it. Gemini prices are $0.15 per million input tokens and $0.60 per million output tokens — the calculateCostFromTokens helper handles the math. The getCostSpans and resetCostSpans helpers are exported for testing.
Step 7: Build the intent classifier
Create src/services/classifier.ts — a hybrid classifier that tries fast keyword matching first and falls back to Gemini for ambiguous input:
Expected output: When a user sends “I want to book a demo,” the keyword classifier spots “book” and “demo” and returns schedule_demo with high confidence — no Gemini call needed. When the input has no keyword matches, it falls back to Gemini’s classifyIntent, which uses a structured prompt to produce confidence scores across all six labels. Empty input returns zero-confidence predictions without calling the LLM at all.
Step 8: Route classified intents
Create src/services/routing.ts — the confidence-based router that decides whether to book a meeting, ask a clarifying question, or give an FAQ response:
Expected output: The ConfidenceRouter uses the thresholds you set in .env.local: when top confidence is >= 0.8 (ROUTE), it routes to Chili Piper booking; between 0.3 and 0.8 (CLARIFY), it asks the user what they meant; below 0.3 (FALLBACK), it returns an FAQ-style response. If the classification has zero predictions at all, routeLead returns FALLBACK immediately. The processLeadInput helper chains classification + routing into a single call.
Step 9: Score leads
Create src/services/lead-scorer.ts to convert classification confidence into a numeric lead score and tier:
Expected output: A lead with 95% confidence scores 95 and gets "hot" tier — the system will book a meeting immediately. A lead at 55% scores 55 ("warm") and the router asks a clarifying question. When the daily budget is exhausted, scores are capped at 30 and tier is forced to "warm", preventing costly demo bookings outside budget.
Step 10: Build the Chili Piper client
Create src/lib/chilipiper.ts — an HTTP client that authenticates with Chili Piper via OAuth2 and books meetings. The client handles automatic token refresh on 401 responses and throws typed errors (rate limits, validation failures, booking rejections) that the orchestration layer can act on:
ts
import type { ChiliPiperConfig, AuthToken, MeetingBooking, BookingResult, TimeSlot,} from "../types/chilipiper.js";interface OAuthTokenResponse { access_token: string; refresh_token: string; expires_in: number;}export class ChiliPiperAuthError extends Error { constructor(message: string) { super(message); this.name = "ChiliPiperAuthError"; }}export class ChiliPiperRateLimitError
Expected output: A fully typed Chili Piper HTTP client. The bookMeeting method validates lead name and email before making any HTTP call (throwing ValidationError on bad input). The authFetch internal method handles 401 → token refresh → retry, 429 → ChiliPiperRateLimitError with Retry-After, and non-ok responses as ChiliPiperAPIError. The goal is for callers in the orchestration layer to get structured errors they can match on.
Step 11: Set up session management with Redis
Create src/lib/session.ts — a Redis-backed session store using @reaatech/session-continuity. The RedisStorageAdapter implements the IStorageAdapter interface, persisting sessions and messages as JSON in Redis. You also define a CharacterTokenCounter (approximating 1 token ≈ 4 characters) and a createSessionManager factory that wires everything together with sliding-window compression:
ts
import { SessionManager, SessionNotFoundError, ConcurrencyError, type IStorageAdapter, type TokenCounter, type Session, type SessionId, type Message, type MessageId, type Participant, type HealthStatus, type SessionFilters, type MessageQueryOptions, type UpdateSessionOptions,} from "@reaatech/session-continuity";type RedisClient = { set(key: string, value: string): Promise<unknown>; get(key: string): Promise<string | null>; del(key: string): Promise<unknown>; keys(pattern: string): Promise<string[]>; ping(): Promise<unknown>; quit(): Promise<unknown>;};export class RedisStorageAdapter implements IStorageAdapter { private client: RedisClient; constructor(client: RedisClient) { this.client = client; } // ... implements createSession, getSession, updateSession, deleteSession, // listSessions, addMessage, getMessages, updateMessage, deleteMessage, // deleteAllMessages, getExpiredSessions, health, close // (see the recipe source for the full 296-line implementation)}export class CharacterTokenCounter implements TokenCounter { readonly model = "character-based"; readonly tokenizer = "character"; count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { return messages.reduce((total, msg) => { if (typeof msg.content === "string") { return total + this.count(msg.content); } const parts = msg.content as Array<{ type: string; text?: string }>; return ( total + parts.reduce((sum, part) => { if (part.type === "text" && part.text) { return sum + this.count(part.text); } return sum; }, 0) ); }, 0); }}export function createSessionManager( redisClient: RedisClient,): SessionManager { return new SessionManager({ storage: new RedisStorageAdapter(redisClient), tokenCounter: new CharacterTokenCounter(), tokenBudget: { maxTokens: 4096, reserveTokens: 500, overflowStrategy: "compress", }, compression: { strategy: "sliding_window", targetTokens: 3500, }, });}export async function getOrCreateSession( manager: SessionManager, sessionId?: string,): Promise<Session> { if (sessionId) { return await manager.getSession(sessionId); } return await manager.createSession();}export async function addUserMessage( manager: SessionManager, sessionId: string, content: string,): Promise<Message> { return await manager.addMessage(sessionId, { role: "user", content });}export async function getContext( manager: SessionManager, sessionId: string,): Promise<Message[]> { return await manager.getConversationContext(sessionId);}
Expected output: The RedisStorageAdapter persists sessions under Redis keys sc:session:<id> and messages under sc:msg:<sessionId>:<messageId>. The CharacterTokenCounter approximates token counts for the sliding-window compression. When a session exceeds 4,096 tokens of history, the manager compresses it to 3,500 tokens using a sliding window that keeps the most recent messages. The thin wrappers getOrCreateSession, addUserMessage, and getContext simplify the calling code.
Step 12: Wire the orchestration handler
Create src/api/chat/handler.ts — the core pipeline that ties everything together. This is where classification, routing, lead scoring, budget enforcement, session management, and Gemini response generation all meet:
ts
import { createClient } from "redis";import type { ChatResponse } from "../../types/chat.js";import { generateResponse } from "../../lib/llm.js";import { checkBudgetExceeded } from "../../lib/budget.js";import { ChiliPiperClient } from "../../lib/chilipiper.js";import { createSessionManager, getOrCreateSession, addUserMessage, getContext } from "../../lib/session.js";import { loadAppConfig } from "../../lib/config.js";import { classifyLeadIntent } from "../../services/classifier.js";import { routeLead } from "../../services/routing.js";import { scoreLead } from "../../services/lead-scorer.js";export
Expected output: An 11-step pipeline: (1) validate input, (2) connect to Redis with error suppression, (3) create or resume session, (4) save user message, (5) classify intent with fallback to general_inquiry on error, (6) route the decision, (7) check budget, (8) score the lead, (9) for ROUTE decisions targeting schedule_demo, book via Chili Piper with score-gating and generate a confirmation message (with nested fallback error messages), (10) for CLARIFY decisions, build a clarifying question from conversation context and classification options, (11) for FALLBACK decisions, respond with an FAQ-style answer. Every Gemini call is wrapped in its own try/catch with graceful fallback text so the chat never fails silently.
Step 13: Create the API routes
Create app/api/chat/route.ts — the Next.js App Router route handler that validates request bodies with Zod and delegates to the orchestration handler:
Expected output: Two app/ directory route handlers. POST /api/chat validates the JSON body with Zod (returning 400 on bad input), then delegates to the full orchestration pipeline. GET /api/chat returns a simple { status: "ok" }. GET /api/health pings Redis and reports connected or degraded with a 503 status code. Both use NextResponse and NextRequest — never bare Request/Response.
Step 14: Run the tests and try the recipe
The project includes test files under tests/ covering every module — the LLM client, budget tracker, classifier, router, lead scorer, Chili Piper client, session manager, the orchestration handler, plus integration tests for both API routes. The test suite mocks @google/genai (Gemini), redis, and Chili Piper HTTP endpoints via MSW so tests run entirely offline.
Run the full test suite with coverage:
terminal
pnpm test
You should see all tests pass with coverage above 90% across lines, branches, functions, and statements.
Try the chat endpoint yourself once the dev server is running:
terminal
pnpm dev
Then in another terminal:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -d '{"message":"I want to book a demo","userEmail":"lead@example.com"}'
Add Langfuse tracing — the config.ts already reads LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY. Wire up Langfuse OpenTelemetry instrumentation to trace every classification and routing decision end-to-end.
Add a streaming endpoint — the generateStreamedResponse function in llm.ts is ready. Create a POST /api/chat/stream route that returns a Server-Sent Events stream for real-time token-by-token responses.
Replace the in-memory cost store — the budget.ts module stores cost spans in a local array. For production, persist them to Redis or a time-series database so spend tracking survives restarts.
extends
Error {
retryAfter: number;
constructor(retryAfter: number) {
super(`Rate limited, retry after ${String(retryAfter)}s`);
this.name = "ChiliPiperRateLimitError";
this.retryAfter = retryAfter;
}
}
export class ChiliPiperAPIError extends Error {
status: number;
body: unknown;
constructor(status: number, body: unknown) {
super(`Chili Piper API error: ${String(status)}`);
`Confirm the following meeting booking: ${JSON.stringify(booking)}. Generate a friendly confirmation message for the lead. Include the meeting URL: ${booking.meetingUrl}`,
);
} catch {
reply = "Sorry, I'm having trouble right now. Please try again.";
}
} catch {
try {
reply = await generateResponse(
"The user wanted to book a demo but there was an error with the booking system. Apologize and ask them to try again later or contact support.",
);
} catch {
reply = "Sorry, I'm having trouble right now. Please try again.";
`The user's intent is ambiguous. Based on conversation context:\n${contextStr}\nGenerate a clarifying question to understand what they need help with. Options to clarify: ${decision.options?.join(", ") ?? "booking a demo, pricing, support, or partnership"}.`,
);
} catch {
reply = "Sorry, I'm having trouble right now. Please try again.";
`Answer the following as an FAQ-style response based on conversation context:\n${contextStr}\nIf you cannot answer, direct them to schedule a demo or contact support.`,
);
} catch {
reply = "Sorry, I'm having trouble right now. Please try again.";