A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an AI-powered voice agent that takes restaurant food orders over the phone. When a customer calls your restaurant, the agent answers, listens to their order using speech-to-text, interprets their intent with Anthropic’s Claude, manages the conversation across multiple utterances, and submits the finalized order directly to Toast POS. You’ll use the @reaatech/* package family for voice pipeline orchestration, session continuity, confidence-based routing, and human handoff, all wired into a Next.js 16 App Router project.
If you run a restaurant or build software for one, this recipe shows you how to turn a missed call after hours into captured revenue.
Prerequisites
Node.js 22+ and pnpm 10+ installed
An Anthropic API key — sign up at console.anthropic.com
A Twilio account with a phone number that has voice capabilities (free trial works)
A Toast POS API key and restaurant GUID (or use the mock for local development)
Redis running locally (install via brew install redis on macOS, or use a Docker container)
Deepgram and ElevenLabs API keys for STT/TTS (optional — mock providers are included for development)
Langfuse account for LLM observability (optional — the recipe runs without it)
Familiarity with TypeScript, Next.js App Router, and basic API design concepts
Step 1: Scaffold the project and install dependencies
Create a new Next.js project and install all the required packages. The @reaatech/* packages handle voice pipeline, session management, confidence routing, handoff, and cost telemetry. Third-party packages include the Anthropic SDK, Twilio, Redis, LiveKit Agents, Langfuse, and Zod.
Replace each <your-...> placeholder with your actual credentials. DEEPGRAM_API_KEY and ELEVENLABS_API_KEY can be left blank — the recipe falls back to mock providers when they’re empty. REDIS_URL defaults to redis://localhost:6379 which works out of the box with a local Redis instance.
Expected output: A complete .env file with all your credentials. The application validates them at load time and throws a clear error if any required key is missing.
Step 3: Define the type system
Start with the types that flow through the system. You need types for menu items, orders, call context, and agent actions. Create these files under src/types/.
Expected output: Three type files that define the data contracts for menu items, orders, call sessions, and agent decisions. These types are importable across the rest of the codebase.
Step 4: Build the environment configuration loader with Zod
Create a schema-validated config loader so your application fails fast at startup if any required environment variable is missing.
ts
// src/config/env.tsimport { z } from "zod";const configSchema = z.object({ ANTHROPIC_API_KEY: z.string().min(1), TWILIO_ACCOUNT_SID: z.string().min(1), TWILIO_AUTH_TOKEN: z.string().min(1), TWILIO_PHONE_NUMBER: z.string().min(1), REDIS_URL: z.string().default("redis://localhost:6379"), TOAST_API_BASE_URL: z.string().min(1), TOAST_API_KEY: z.string().min(1), TOAST_RESTAURANT_GUID: z.string().min(1), DEEPGRAM_API_KEY: z.string().default(""), ELEVENLABS_API_KEY: z.string().default(""), AGENT_BASE_URL: z.string().min(1),});export type AppConfig = z.infer<typeof configSchema>;export function loadConfig(): AppConfig { const result = configSchema.safeParse(process.env); if (!result.success) { const missing = result.error.issues .filter((i) => i.code === "invalid_type") .map((i) => i.path.join(".")); throw new Error( `Missing or invalid environment variables: ${missing.join(", ")}` ); } return result.data;}
Also create the sample menu data and the system prompt for your order agent:
// src/config/prompts.tsexport const ORDER_AGENT_SYSTEM_PROMPT = `You are an AI phone agent for a restaurant called "Toast Bistro". Your job is to take food orders over the phone.Guidelines:1. Greet the caller warmly and ask what they would like to order.2. Listen for menu items and quantities. If an item is unclear, ask for clarification.3. Always confirm each item before adding it to the order: repeat the item name, modifiers, and quantity.4. After each item, ask "Would you like anything else?".5. When the caller says "that's all", "that's it", or similar, summarize the entire order back to them.6. Ask if they want any modifications (extra sauce, no onions, etc.).7. Ask for the caller's name for the order.8. If the caller asks to speak to a manager or says "speak to a human", "handoff", or similar, initiate a handoff.9. Keep responses brief and conversational — this is a phone conversation.10. Never read prices unless asked.`;
Expected output: A Zod-validated config loader that extracts and validates all environment variables at startup, plus a 13-item sample menu and the system prompt that will guide the order agent’s behavior.
Step 5: Create the Anthropic client wrapper
The voice agent needs to call Claude for natural language understanding. Create a typed wrapper around the Anthropic SDK that parses responses into a consistent shape and supports both one-shot and streaming calls.
The generateResponse() method sends a system prompt and message history to Claude. The streamResponse() method returns an async iterable of delta events for real-time streaming.
ts
// src/lib/anthropic-client.tsimport Anthropic from "@anthropic-ai/sdk";import type { ToolUnion } from "@anthropic-ai/sdk/resources/messages/messages.js";export interface GenerateResponseParams { system: string; messages: Array<{ role: "user" | "assistant"; content: string }>; tools?: Array<{ name: string; description: string; input_schema: Record<string, unknown>; }>; maxTokens?: number
Expected output: A typed Anthropic client in src/lib/anthropic-client.ts with generateResponse() for single-turn calls and streamResponse() for real-time streaming. It uses the claude-haiku-4-5-20251001 model and returns parsed text, stop reason, and token usage counts.
Step 6: Set up Redis and session continuity
Redis stores conversation sessions so the agent can maintain context across multiple utterances. The @reaatech/session-continuity package handles session lifecycle with token budgeting and automatic compression.
The token counter estimates token counts so the session manager knows when to compress. You’ll use a character-based approximation since this runs in a restaurant environment where precise tokenization isn’t critical:
ts
// src/lib/token-counter.tsimport type { TokenCounter, Message } from "@reaatech/session-continuity";export class CharacterTokenCounter implements TokenCounter { count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { total += this.count( typeof msg.content === "string" ? msg.content : JSON.stringify(msg.content) ); } return total; } readonly model = "character-count"; readonly tokenizer = "character-4";}export const tokenCounter: TokenCounter = new CharacterTokenCounter();
Now wire the session manager with Redis-backed storage:
Expected output: A Redis client factory, a character-based token counter, and a session manager configured with a 4096-token budget that compresses via sliding window when it hits 3500 tokens. Sessions expire after the configured TTL and are cleaned up every 5 minutes.
Step 7: Build the confidence router for menu disambiguation
When a caller says something ambiguous like “medium” (drink size? steak doneness? wing heat?), the confidence router uses keyword classification to resolve the intent.
Expected output: A ConfidenceRouter with keyword classifiers across all five menu categories. When confidence drops below 0.7 it returns a CLARIFY decision, prompting the agent to ask the caller for more detail.
Step 8: Create the Twilio client and Toast POS integration
The Twilio client initiates outbound calls and updates them with TwiML. The Toast client creates tabs, adds items, and submits orders.
Expected output:TwilioClient wraps the Twilio SDK for creating, updating, and checking calls. ToastApiClient wraps the Toast REST API for creating tabs (/orders/v2/tabs), adding items, and submitting tabs. Both include typed error classes.
Step 9: Implement LLM cost telemetry
Track every Anthropic API call with cost calculation. The @reaatech/llm-cost-telemetry package provides calculateCostFromTokens, generateId, and CostSpanSchema — use them to create validated cost spans.
Expected output:trackLLMCall() accepts the model name, token counts, and a tenant identifier (your session ID), calculates cost from per-million-token pricing, and returns a Zod-validated cost span object. This data can be exported to observability platforms.
Step 10: Build the Order Agent
The OrderAgent is the brain of the system. It manages call initialization, processes each utterance through Claude with session context, handles handoff requests, and finalizes orders through the Toast API.
ts
// src/agents/order-agent.tsimport { SessionManager } from "@reaatech/session-continuity";import { ConfidenceRouter } from "@reaatech/confidence-router";import { createHandoffConfig, HandoffError, withRetry, TypedEventEmitter } from "@reaatech/agent-handoff";import type { HandoffPayload, CompressedContext, HandoffTrigger, UserMetadata, ConversationState, Message as HandoffMessage,} from "@reaatech/agent-handoff";import { createPipeline, defineConfig, createCostTracker, createLatencyBudget, LatencyBudgetEnforcer } from "@reaatech/voice-agent-core";import { trackLLMCall } from "../lib/llm-cost-telemetry.js";import type { AnthropicClient } from "../lib/anthropic-client.js";
Expected output:OrderAgent with four public methods — initializeCall() creates a session, processUtterance() sends each transcript to Claude with full conversation context and returns a structured AgentAction, handoffToHuman() builds a HandoffPayload and emits it via a typed event emitter, and finalizeOrder() extracts items from the session and submits them to Toast with retry logic. The LLM prompt instructs Claude to respond in JSON so the agent can parse structured actions.
Step 11: Wire the voice pipeline with STT and TTS providers
The voice pipeline connects speech-to-text (Deepgram), LLM reasoning (via the MCP client), and text-to-speech (ElevenLabs) into a real-time processing loop.
ts
// src/pipeline/stt-tts-providers.tsimport { createMockSTTProvider, createMockTTSProvider,} from "@reaatech/voice-agent-core";import * as livekit from "@livekit/agents";import { TTS as DeepgramTTS } from "@livekit/agents-plugin-deepgram";void DeepgramTTS;import { TTS as ElevenLabsTTS } from "@livekit/agents-plugin-elevenlabs";export function _ensureLiveKitImport() { return livekit;}export function createSTTProvider(apiKey: string) { if (!apiKey) { return createMockSTTProvider(); } return { provider: "deepgram", options: { apiKey, model: "nova-2", language: "en", interimResults: true } };}export function createTTSProvider(apiKey: string) { if (!apiKey) { return createMockTTSProvider(); } const ttsInstance = new ElevenLabsTTS({ apiKey }); return { provider: "elevenlabs", instance: ttsInstance };}
Expected output: The voice pipeline receives audio chunks from Twilio, processes them through STT, MCP (your Order Agent), and TTS, and enforces an 800ms latency budget. It handles barge-in (caller interrupts the agent) and cleanly ends sessions on call completion. Mock providers are used by default so you can test without API keys.
Step 12: Create the Twilio webhook and handoff API routes
The webhook receives incoming calls from Twilio and returns TwiML that connects the call to a WebSocket media stream. The handoff endpoint escalates a session to a human operator.
ts
// app/api/calls/webhook/route.tsimport { type NextRequest, NextResponse } from "next/server";export async function POST(req: NextRequest) { const formData = await req.formData(); const callSid = formData.get("CallSid") as string | null; const fromNumber = formData.get("From") as string | null; const callStatus = formData.get("CallStatus") as string | null; if (!callSid || !fromNumber) { return NextResponse.json( { error: "Missing required fields: CallSid, From" }, { status: 400 } ); } const agentBaseUrl = process.env["AGENT_BASE_URL"] ?? "http://localhost:3000"; if (callStatus === "ringing" || callStatus === "in-progress") { // In production: const sessionId = await orderAgent.initializeCall(callSid, fromNumber); const host = new URL(agentBaseUrl).host; const twiml = `<?xml version="1.0" encoding="UTF-8"?><Response> <Connect> <Stream url="wss://${host}/ws" /> </Connect></Response>`; return new NextResponse(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, }); } if (callStatus === "completed") { // In production: await orderAgent.endSession(sessionId); return new NextResponse(null, { status: 200 }); } const twiml = `<?xml version="1.0" encoding="UTF-8"?><Response> <Say>Thank you for calling Toast Bistro. Goodbye.</Say></Response>`; return new NextResponse(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, });}export function GET() { return NextResponse.json({ status: "ok" });}
ts
// app/api/handoff/route.tsimport { type NextRequest, NextResponse } from "next/server";import { z } from "zod";const handoffRequestSchema = z.object({ sessionId: z.string().min(1), reason: z.string().min(1),});export async function POST(req: NextRequest) { try { const body = (await req.json()) as Record<string, unknown>; const parsed = handoffRequestSchema.parse(body); // In production: await orderAgent.handoffToHuman(parsed.sessionId, parsed.reason); return NextResponse.json({ status: "escalated", agentId: "human-1", sessionId: parsed.sessionId, }); } catch (err: unknown) { if (err instanceof z.ZodError) { return NextResponse.json( { error: "Validation failed", details: err.issues }, { status: 400 } ); } return NextResponse.json( { error: "Internal server error" }, { status: 500 } ); }}
Expected output: Two API routes. POST /api/calls/webhook accepts Twilio’s form-encoded webhook, returns TwiML with a <Connect><Stream> element to establish a WebSocket media stream, and handles ringing, in-progress, and completed call statuses. POST /api/handoff validates a sessionId and reason with Zod, then returns an escalation confirmation. In-production comments show where session lifecycle calls should be wired in.
Step 13: Add Next.js instrumentation for observability
The instrumentation hook initializes observability when the Node.js runtime starts. You must enable it in next.config.ts or the register() function is dead code.
Expected output:next.config.ts sets experimental.instrumentationHook: true — the exact spelling matters; misspelling it as clientInstrumentationHook is a common pitfall. src/instrumentation.ts exports register() which only runs in the Node.js runtime and calls initializeObservability from the voice-agent-core package.
Step 14: Run the type checker and tests
Verify everything compiles and all tests pass. The test suite mocks external services (Anthropic, Twilio, Redis, Toast) so you can run it without API keys.
terminal
pnpm typecheck
Expected output:tsc --noEmit exits 0 with no errors.
Now run the full test suite with coverage:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
These tests cover happy paths (a valid ringing call returns TwiML, the order agent processes a greeting as ask_question), error paths (missing CallSid returns 400, empty session throws on finalize), and edge cases (ambiguous transcript triggers clarify, streaming client handles empty content).
Here’s a sample test for the webhook handler:
ts
// tests/api/calls/webhook.test.tsimport { describe, it, expect } from "vitest";import { POST, GET } from "../../../app/api/calls/webhook/route.js";describe("POST /api/calls/webhook", () => { it("returns 200 with TwiML for valid ringing call", async () => { process.env["AGENT_BASE_URL"] = "http://localhost:3000"; const body = new URLSearchParams({ CallSid: "CA123", From: "+1****23", CallStatus: "ringing" }); const req = new Request("http://localhost/api/calls/webhook", { method: "POST", body, headers: { "content-type": "application/x-www-form-urlencoded" }, }); const res = await POST(req as never); expect(res.status).toBe(200); const text = await res.text(); expect(text).toContain("<Connect>"); }); it("returns 400 when CallSid is missing", async () => { const body = new URLSearchParams({ CallStatus: "ringing" }); const req = new Request("http://localhost/api/calls/webhook", { method: "POST", body, headers: { "content-type": "application/x-www-form-urlencoded" }, }); const res = await POST(req as never); expect(res.status).toBe(400); });});describe("GET /api/calls/webhook", () => { it("returns status ok", async () => { const res = GET(); expect(res.status).toBe(200); const json: Record<string, string> = await res.json() as Record<string, string>; expect(json).toEqual({ status: "ok" }); });});
Step 15: Expose the library exports
Create a barrel export so consumers can import everything from a single path:
ts
// src/index.tsexport { AnthropicClient } from "./lib/anthropic-client.js";export { TwilioClient, TwilioApiError } from "./lib/twilio-client.js";export { createRedisClient } from "./lib/redis.js";export { CharacterTokenCounter, tokenCounter } from "./lib/token-counter.js";export { trackLLMCall } from "./lib/llm-cost-telemetry.js";export { createLangfuseTracer } from "./lib/observability.js";export { createSessionManager } from "./lib/session-continuity.js";export { createMenuRouter } from "./lib/confidence-router.js";export { ToastApiClient, ToastApiError } from "./integrations/toast-client.js";export { OrderAgent } from "./agents/order-agent.js";export { setupCallHandler } from "./pipeline/voice-pipeline.js";export { createSTTProvider, createTTSProvider,} from "./pipeline/stt-tts-providers.js";export { loadConfig } from "./config/env.js";export { ORDER_AGENT_SYSTEM_PROMPT } from "./config/prompts.js";export { SAMPLE_MENU } from "./config/menu-data.js";
Expected output: A clean public API surface. External consumers can import { OrderAgent, setupCallHandler, loadConfig } from "./src/index.js" to reuse any component.
Next steps
Deploy to production — Run the app behind ngrok or deploy to Vercel, point your Twilio phone number’s voice webhook URL to https://your-domain.com/api/calls/webhook, and start taking orders.
Add real STT/TTS — Configure Deepgram and ElevenLabs API keys in your .env file to replace the mock providers with production-grade speech recognition and synthesis.
Extend the menu — Add seasonal items, combos, and specials to SAMPLE_MENU. The confidence router uses keyword matching, so new items need classifiers, but the order agent will recognize them automatically.
Implement the WebSocket handler — The app/api/calls/webhook/route.ts returns a <Connect><Stream> element that expects a WebSocket endpoint at /ws. Build that handler to accept media streams and feed them through the voice pipeline.
Add a dashboard — Use the cost telemetry data to build a real-time dashboard showing per-call costs, average handling time, and order volume, using Langfuse or Grafana.
;
}
export interface GenerateResponseResult {
text: string;
stopReason: string;
usage: { inputTokens: number; outputTokens: number };