A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a voice agent that accepts inbound phone calls via Twilio, classifies caller intent and extracts order numbers using Cohere’s Command model, looks up order status in Shopify, and responds with speech — all protected by circuit breaker resilience, per-call budget controls, and Zendesk escalation. Deepgram and ElevenLabs client modules are prepared for the real-time audio streaming path available in the next steps. You’ll wire up full observability through Langfuse and end with a test suite achieving 100% code coverage.
This tutorial is for TypeScript developers comfortable with Next.js App Router and REST APIs. You’ll use pnpm as the package manager.
Prerequisites
Node.js >= 22 and pnpm 10
A Twilio account with a purchased phone number
A Deepgram API key
A Cohere API key (Command model access)
An ElevenLabs API key
A Shopify store with Admin API access (read_orders scope)
A Zendesk account (for human escalation)
A Langfuse account (for LLM tracing)
ngrok (for exposing your dev server to Twilio webhooks)
Step 1: Scaffold the project and install dependencies
Scaffold a fresh Next.js App Router project, then replace the generated package.json with the recipe’s exact-pinned dependencies.
Expected output:pnpm install completes without errors and your .env file is populated with real API keys.
Step 2: Validate environment with Zod
Create a single source of truth for your environment variables. This file parses process.env through a Zod schema and fails fast at startup if anything is missing.
import type { HandoffPayload, RoutingDecision } from "@reaatech/agent-handoff";export type { HandoffPayload, RoutingDecision };export interface OrderStatus { orderNumber: string; status: string; fulfillmentStatus: string; estimatedDelivery?: string; trackingNumber?: string; items: Array<{ name: string; quantity: number }>;}export type OrderLookupErrorCode = "not_found" | "unauthorized" | "rate_limited" | "timeout";export interface OrderLookupError { code: OrderLookupErrorCode; message: string;}
Expected output: TypeScript compiles both files without errors. CallSession.status is typed as SessionStatus from @reaatech/agent-mesh; HandoffPayload and RoutingDecision are re-exported from @reaatech/agent-handoff.
Step 4: Build the Cohere LLM client
This client wraps Cohere’s chat API for three purposes: classifying caller intent, extracting order numbers, and generating natural responses.
Create src/services/cohere/client.ts:
ts
import pRetry from "p-retry";import { CohereClientV2, type Cohere } from "cohere-ai";import { env } from "../../config/env.js";import type { TurnEntry, CallSession, IntentClassification } from "../../types/voice.js";const cohere = new CohereClientV2({});function buildMessages(systemPrompt: string, transcript: TurnEntry[]): Cohere.ChatMessageV2[] { const messages: Cohere.ChatMessageV2[] = [{ role: "system" as const, content: systemPrompt }]; for (const turn of transcript) { if (turn.role === "agent") { messages.push({ role: "assistant" as const, content: turn.content }); } else { messages.push({ role: "user" as const, content: turn.content }); } } return messages;}function extractResponseText(response: Cohere.V2ChatResponse): string { const textItem = response.message.content?.find( (c): c is Cohere.AssistantMessageResponseContentItem & { type: "text"; text: string } => c.type === "text" ); return textItem?.text ?? "";}export async function classifyIntent(transcript: TurnEntry[]): Promise<IntentClassification> { return pRetry(async () => { const response = await cohere.chat({ model: env.COHERE_MODEL, messages: buildMessages( "Classify the user intent into one of: check_order, escalate, goodbye, unknown. " + "Respond with a JSON object: {\"intent\": string, \"confidence\": number, \"entities\": {}}. " + "For check_order, extract order number into entities.order_number.", transcript, ), }); const text = extractResponseText(response); return JSON.parse(text) as IntentClassification; }, { retries: 3 });}export async function generateResponse(session: CallSession): Promise<string> { if (session.transcript.length === 0) { return "Hello! How can I help you today?"; } const response = await cohere.chat({ model: env.COHERE_MODEL, messages: buildMessages( "You are a helpful voice assistant. Keep responses short and natural.", session.transcript, ), }); return extractResponseText(response);}export async function extractOrderNumber(userInput: string): Promise<string | null> { return pRetry(async () => { const response = await cohere.chat({ model: env.COHERE_MODEL, messages: [ { role: "system", content: "Extract the order number from the user's message. Return only the number, or 'null' if no order number is found.", }, { role: "user", content: userInput }, ] as Cohere.ChatMessageV2[], }); const text = extractResponseText(response); if (text === "null" || text === "none" || text === "") { return null; } return text; }, { retries: 3 });}
Expected output: The CohereClientV2 is initialized with an empty options object (the SDK reads COHERE_API_KEY from the environment automatically). Each function wraps the Cohere chat() call with p-retry for transient failure resilience.
Step 5: Create the Deepgram and ElevenLabs audio clients
These clients provide speech-to-text transcription and text-to-speech synthesis. While the orchestrator currently uses Twilio’s built-in Gather and Say for the main call flow, these clients are wired up for real-time audio streaming, which you can activate from the Next steps section.
Create src/services/deepgram/client.ts:
ts
import { DeepgramClient } from "@deepgram/sdk";import { env } from "../../config/env.js";const client = new DeepgramClient({ apiKey: env.DEEPGRAM_API_KEY });export async function transcribeFile(audioBuffer: Buffer): Promise<string> { const response = await client.listen.v1.media.transcribeFile(audioBuffer, { model: env.DEEPGRAM_MODEL, }); if ("results" in response) { return response.results.channels[0]?.alternatives?.[0]?.transcript ?? ""; } return "";}export async function createLiveConnection() { return client.listen.v1.connect({ model: env.DEEPGRAM_MODEL, language: "en", punctuate: "true", interim_results: "true", Authorization: env.DEEPGRAM_API_KEY, });}
Create src/services/elevenlabs/client.ts:
ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";import { env } from "../../config/env.js";const elevenlabs = new ElevenLabsClient({ apiKey: env.ELEVENLABS_API_KEY });export async function synthesizeSpeech(text: string): Promise<ReadableStream> { if (text.length === 0) { throw new Error("Text cannot be empty"); } return await elevenlabs.textToSpeech.convert(env.ELEVENLABS_VOICE_ID, { text, modelId: env.ELEVENLABS_TTS_MODEL, }) as ReadableStream;}export async function streamSpeech(text: string): Promise<ReadableStream> { return await elevenlabs.textToSpeech.stream(env.ELEVENLABS_VOICE_ID, { text, modelId: env.ELEVENLABS_TTS_MODEL, }) as ReadableStream;}
Expected output: The Deepgram client provides both batch transcription (transcribeFile) and a live WebSocket connection factory. The ElevenLabs client supports both full-file synthesis and streaming TTS via streamSpeech.
Step 6: Build the Twilio client and webhook parsers
This module generates TwiML responses and parses incoming Twilio webhook payloads.
Create src/services/twilio/client.ts:
ts
import twilio from "twilio";import { env } from "../../config/env.js";export const twilioClient = twilio(env.TWILIO_ACCOUNT_SID, env.TWILIO_AUTH_TOKEN);export function generateTwimlSay(message: string): string { return `<Response><Say>${message}</Say></Response>`;}export function generateTwimlGather(speechPrompt: string): string { return `<Response><Gather input="speech" action="/api/twilio/gather" method="POST"><Say>${speechPrompt}</Say></Gather></Response>`;}export function generateTwimlHangup(): string { return `<Response><Hangup/></Response>`;}export function parseIncomingCall(body: Record<string, string>): { callSid: string; from: string } { if (!body.CallSid) { throw new Error("Missing CallSid in Twilio webhook body"); } return { callSid: body.CallSid, from: body.From };}export function parseGatherResult(body: Record<string, string>): { callSid: string; speechResult?: string; digits?: string; confidence?: string;} { return { callSid: body.CallSid, speechResult: body.SpeechResult, digits: body.Digits, confidence: body.Confidence, };}export async function createCall(to: string, webhookUrl: string): Promise<string> { const call = await twilioClient.calls.create({ to, from: env.TWILIO_PHONE_NUMBER, url: webhookUrl, }); return call.sid;}
Expected output:parseIncomingCall throws a descriptive error when CallSid is missing. All TwiML generators return valid XML strings wrapped in Response.
Step 7: Build the Shopify order lookup with circuit breaker
The Shopify client queries the Admin REST API. It’s protected by a concurrency limiter (p-limit, max 3 concurrent calls) and p-retry for transient HTTP errors. The circuit breaker wraps the entire lookup to prevent cascading failures during Shopify outages.
Create src/services/shopify/client.ts:
ts
import "@shopify/shopify-api/adapters/node";import { shopifyApi, ApiVersion, Session } from "@shopify/shopify-api";import pRetry, { AbortError } from "p-retry";import pLimit from "p-limit";import { env } from "../../config/env.js";import type { OrderStatus } from "../../types/shopify.js";const shopify = shopifyApi({ apiKey: env.SHOPIFY_API_KEY, apiSecretKey: env.SHOPIFY_API_SECRET, scopes: ["read_orders"], hostName: env.SHOPIFY_SHOP_DOMAIN, apiVersion: ApiVersion.July25, isEmbeddedApp: false,});const session = new Session({ id: "offline", shop: env.SHOPIFY_SHOP_DOMAIN, state: "", isOnline: false, accessToken: env.SHOPIFY_ACCESS_TOKEN,});const limit = pLimit(3);interface ShopifyOrder { id: number; name: string; fulfillment_status: string | null; fulfillments?: Array<{ tracking_number?: string; estimated_delivery_at?: string; }>; line_items: Array<{ name: string; quantity: number }>; financial_status: string; processed_at?: string;}export async function lookupOrder(orderNumber: string): Promise<OrderStatus> { return limit(async () => { return pRetry(async () => { const client = new shopify.clients.Rest({ session }); const response = await client.get({ path: `/admin/api/2025-07/orders.json?name=#${orderNumber}`, }); const body = response.body as { orders: ShopifyOrder[] }; const orders = body.orders; if (orders.length === 0) { throw new AbortError(`Order #${orderNumber} not found`); } const order = orders[0]; const fulfillment = order.fulfillments?.[0]; return { orderNumber: order.name.replace("#", ""), status: order.financial_status, fulfillmentStatus: order.fulfillment_status ?? "unfulfilled", estimatedDelivery: fulfillment?.estimated_delivery_at, trackingNumber: fulfillment?.tracking_number, items: order.line_items.map((item) => ({ name: item.name, quantity: item.quantity, })), }; }, { retries: 3 }); });}export function formatOrderResponse(order: OrderStatus): string { if (order.items.length === 0) { return `Your order #${order.orderNumber} has no items listed. The current status is ${order.status}.`; } const itemsList = order.items .map((item) => `${String(item.quantity)}x ${item.name}`) .join(", "); let response = `Your order #${order.orderNumber} has ${String(order.items.length)} item(s): ${itemsList}. `; response += `The order status is ${order.status} and fulfillment status is ${order.fulfillmentStatus}.`; if (order.trackingNumber) { response += ` Your tracking number is ${order.trackingNumber}.`; } if (order.estimatedDelivery) { response += ` Estimated delivery is ${order.estimatedDelivery}.`; } return response;}
import { CircuitBreaker, InMemoryAdapter, CircuitOpenError,} from "@reaatech/circuit-breaker-agents";export const shopifyBreaker = new CircuitBreaker({ name: "shopify-order-api", failureThreshold: 5, recoveryTimeoutMs: 30_000, persistence: new InMemoryAdapter(),});export { CircuitOpenError };export async function executeWithBreaker<T>( operation: () => Promise<T>,): Promise<T> { return shopifyBreaker.execute(() => operation());}
Expected output: After 5 consecutive failures the circuit opens, rejecting subsequent calls for 30 seconds. Once the recovery timeout elapses, a single trial request is allowed through (half-open state) and the circuit closes on success.
Step 8: Build the Zendesk handoff handler
When the caller asks for a human or the intent confidence is too low, the orchestrator creates a Zendesk ticket with the full transcript.
Expected output:createHandoffConfig() is called at module load time to set the global confidence threshold. Tickets are retried up to 3 times with exponential backoff via @reaatech/agent-handoff’s withRetry.
Step 9: Build the budget engine
Three files implement per-call cost tracking. The pricing provider converts token counts to USD for Cohere, Deepgram, and ElevenLabs models. The spend store accumulates costs per scope. The controller ties them together and enforces soft and hard caps.
import type { SpendEntry } from "@reaatech/agent-budget-types";const store = new Map<string, number>();export function record(entry: SpendEntry): number { const key = `${entry.scopeType}:${entry.scopeKey}`; const current = store.get(key) ?? 0; const total = current + entry.cost; store.set(key, total); return total;}export function getCurrentSpend(scopeType: string, scopeKey: string): number { return store.get(`${scopeType}:${scopeKey}`) ?? 0;}export function resetSpend(scopeType: string, scopeKey: string): void { store.delete(`${scopeType}:${scopeKey}`);}
Expected output: The budget controller defines a global per-call budget of $0.50 with a soft cap at 80% and a hard cap at 100%. Budget events (threshold-breach, hard-stop) are logged to Langfuse when it’s available.
Step 10: Build the session store and observability
The session store holds active call state in memory with automatic expiry. The Langfuse module traces every Cohere request, Shopify lookup, and handoff event.
Create src/services/voice/session-store.ts:
ts
import type { CallSession } from "../../types/voice.js";const sessions = new Map<string, CallSession>();function getMaxDurationMs(): number { return (Number(process.env.MAX_CALL_DURATION_SECONDS) || 600) * 1000;}function isExpired(session: CallSession): boolean { return Date.now() - session.createdAt.getTime() > getMaxDurationMs();}export function getSession(callSid: string): CallSession | undefined { const session = sessions.get(callSid); if (!session) return undefined; if (isExpired(session)) { sessions.delete(callSid); return undefined; } return session;}export function createSession(callSid: string, from: string): CallSession { const session: CallSession = { callSid, from, status: "active", transcript: [], escalated: false, createdAt: new Date(), }; sessions.set(callSid, session); return session;}export function updateSession(callSid: string, updates: Partial<CallSession>): CallSession | undefined { const session = sessions.get(callSid); if (!session) return undefined; if (isExpired(session)) { sessions.delete(callSid); return undefined; } const updated = { ...session, ...updates }; sessions.set(callSid, updated); return updated;}export function deleteSession(callSid: string): boolean { return sessions.delete(callSid);}
Expected output: The session store expires calls that exceed MAX_CALL_DURATION_SECONDS. Langfuse is gracefully optional — all span functions check for null before calling.
Step 11: Build the voice orchestrator
This is the main conversation loop. It ties together every service: session management, intent classification, order lookup (via circuit breaker), Zendesk escalation, budget checks, and Langfuse tracing.
Create src/services/voice/orchestrator.ts:
ts
import { getSession, createSession, updateSession, deleteSession } from "./session-store.js";import { classifyIntent, extractOrderNumber } from "../cohere/client.js";import { lookupOrder, formatOrderResponse } from "../shopify/client.js";import { executeWithBreaker } from "../circuit-breaker/shopify-breaker.js";import { createZendeskTicket, evaluateEscalationNeeded } from "../handoff/zendesk-handler.js";import { checkBudget, recordCallCost, initBudgetController } from "../budget/controller.js";import { createCallTrace, finalizeTrace, initLangfuse } from "../observability/langfuse.js";import { generateTwimlGather, generateTwimlSay, generateTwimlHangup } from "../twilio/client.js";import type { SessionStatus } from "../../types/voice.js";import { TurnEntrySchema } from "../../types/voice.js"
Expected output: The orchestrator maps Twilio CallStatus strings to the SessionStatus enum from @reaatech/agent-mesh. On unknown intent, it gives the caller 3 retry attempts before escalating to Zendesk. Budget is checked on every incoming call and recorded after every Cohere API call.
Step 12: Create the application entry point and API route handlers
The entry point bootstraps the budget controller and Langfuse. The route handlers receive Twilio webhooks and return TwiML XML.
Create src/index.ts:
ts
import "dotenv/config";import { initBudgetController } from "./services/budget/controller.js";import { initLangfuse } from "./services/observability/langfuse.js";const budgetController = initBudgetController();const langfuse = initLangfuse();export { budgetController, langfuse };export * from "./services/voice/orchestrator.js";export * from "./services/voice/session-store.js";export * from "./services/twilio/client.js";
Create app/api/twilio/incoming/route.ts:
ts
import { processIncomingCall } from "../../../../src/services/voice/orchestrator.js";import { parseIncomingCall } from "../../../../src/services/twilio/client.js";// Accept any object with formData() — works with both NextRequest and test mocksexport async function POST(request: { formData(): Promise<FormData> }): Promise<Response> { const data = await request.formData(); const body: Record<string, string> = {}; for (const [key, value] of data.entries()) { body[key] = typeof value === "string" ? value : ""; } const { callSid, from } = parseIncomingCall(body); const twiml = await processIncomingCall(callSid, from); return new Response(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, });}
Create app/api/twilio/gather/route.ts:
ts
import { handleUserInput } from "../../../../src/services/voice/orchestrator.js";import { parseGatherResult } from "../../../../src/services/twilio/client.js";export async function POST(request: { formData(): Promise<FormData> }): Promise<Response> { const data = await request.formData(); const body: Record<string, string> = {}; for (const [key, value] of data.entries()) { body[key] = typeof value === "string" ? value : ""; } const { callSid, speechResult, digits } = parseGatherResult(body); const twiml = await handleUserInput(callSid, speechResult ?? digits ?? ""); return new Response(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, });}
Create app/api/twilio/status/route.ts:
ts
import { handleCallStatusUpdate, endCall } from "../../../../src/services/voice/orchestrator.js";export async function POST(request: { formData(): Promise<FormData> }): Promise<Response> { const data = await request.formData(); const callStatus = data.get("CallStatus") as string | null; const callSid = data.get("CallSid") as string | null; if (callSid && callStatus) { await handleCallStatusUpdate(callSid, callStatus); if (["completed", "failed", "busy", "no-answer"].includes(callStatus)) { await endCall(callSid); } } return new Response(null, { status: 200 });}
Create app/api/health/route.ts:
ts
import { NextResponse } from "next/server";export function GET(): Response { return NextResponse.json({ status: "ok" });}
Expected output: All route handlers return the correct content types — Twilio routes return text/xml, the health endpoint returns application/json. Each route accepts a duck-typed object with formData() so it’s testable without a real HTTP server.
Step 13: Run the tests
The test suite covers every service, every route, and the end-to-end call flow with all external dependencies mocked. Run it with:
terminal
pnpm test
Here’s what the Cohere service test looks like as a sample:
Add streaming speech: Wire the Deepgram live connection and ElevenLabs streamSpeech() into the orchestrator for real-time audio streaming during the call rather than batch processing.
Persist sessions to Redis: Replace the in-memory Map-based session store with Redis so the agent can scale across multiple server instances.
Add more intent types: Extend the Cohere classification prompt to handle order cancellations, returns, and shipping address changes — each routed to a different Shopify API or business logic path.
;
function twilioToSessionStatus(twilioStatus: string): SessionStatus {
switch (twilioStatus) {
case "ringing":
case "in-progress":
return "active";
case "completed":
return "completed";
case "failed":
case "busy":
case "no-answer":
return "error";
default:
return "active";
}
}
const budgetController = initBudgetController();
const langfuse = await initLangfuse();
export function processIncomingCall(callSid: string, from: string): Promise<string> {
createSession(callSid, from);
createCallTrace(langfuse, callSid, from);
checkBudget(budgetController, callSid, 0.01);
return Promise.resolve(generateTwimlGather("Hi, I can help with your order status. What's your order number?"));
}
export async function handleUserInput(callSid: string, speechText: string): Promise<string> {
const session = getSession(callSid);
if (!session) {
return generateTwimlSay("Your session has expired. Please call again.");