A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a conversational voice agent that lets callers book, reschedule, or cancel Cal.com appointments by speaking naturally over the phone. Incoming audio from Twilio Media Streams is transcribed with Deepgram, intent is classified through @reaatech/confidence-router, appointments are managed via the Cal.com REST API, and responses are spoken back through ElevenLabs. All LLM calls go through OpenRouter, giving you provider flexibility without vendor lock-in. By the end you’ll have a working Next.js 16 app with unit-tested services, mock-HTTP tests, and 90%+ code coverage.
Prerequisites
Node.js 22+ and pnpm 10 installed on your machine
A Cal.com account (cloud or self-hosted) with OAuth2 client credentials
OpenRouter API key (free tier available at openrouter.ai/keys)
Deepgram API key (sign up at deepgram.com)
ElevenLabs API key and a voice ID (elevenlabs.io)
Langfuse account (optional — for observability tracing)
Familiarity with TypeScript and Next.js App Router basics
Step 1: Scaffold the project and install dependencies
Create the project directory and initialize it with a Next.js 16 App Router shell, or use the scaffold already provided in this recipe. Every dependency in package.json is pinned to an exact version so builds are reproducible.
Expected output: pnpm resolves all dependencies from the lockfile. No warnings about missing peer deps.
Step 2: Configure environment variables
Create a .env file from the example template. Every service the voice agent talks to gets a placeholder here.
env
# Env vars used by openrouter-voice-agent-for-calcom-appointment-scheduling.# Keep placeholders only — never commit real values.NODE_ENV=development# OpenRouterOPENROUTER_API_KEY=<your-openrouter-key>OPENROUTER_MODEL=openai/gpt-5.2# Deepgram (STT)DEEPGRAM_API_KEY=<your-deepgram-key># ElevenLabs (TTS)ELEVENLABS_API_KEY=<your-elevenlabs-key>ELEVENLABS_VOICE_ID=<your-elevenlabs-voice-id># Cal.com OAuth2CALCOM_CLIENT_ID=<your-calcom-client-id>CALCOM_CLIENT_SECRET=<your-calcom-client-secret>CALCOM_API_URL=https://api.cal.com/v2CALCOM_WEBHOOK_SECRET=<your-calcom-webhook-secret># Langfuse (observability — optional)LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=<your-langfuse-host># Pipeline MCP endpointMCP_ENDPOINT=http://localhost:3000/api/mcpMCP_TIMEOUT=400# Session configurationSESSION_TTL=3600SESSION_MAX_TURNS=20SESSION_MAX_TOKENS=4000# Barge-in settingsBARGE_IN_ENABLED=trueBARGE_IN_MIN_SPEECH=300BARGE_IN_CONFIDENCE=0.7BARGE_IN_SILENCE=0.3# Budget limitsBUDGET_LIMIT=10.0BUDGET_SOFT_CAP=0.8BUDGET_HARD_CAP=1.0
Expected output: A plain .env file with all the placeholder keys your agent needs.
Step 3: Validate configuration with a Zod schema
The src/lib/config.ts module uses Zod to validate every environment variable at startup. If a required key is missing, the app throws immediately instead of failing silently mid-call.
Expected output:loadConfig() runs once, caches the parsed config, and throws with a tree-formatted Zod error if any required variable is missing.
Step 4: Define shared TypeScript types
The src/lib/types.ts file declares the shapes that flow through every stage of the pipeline — call sessions, caller intents, appointment details, booking requests and responses, and more.
Expected output: Types are exported and used across all services — the TypeScript compiler catches mismatches at build time.
Step 5: Build the Cal.com API client
The CalcomApiClient in src/lib/calcom-api.ts handles OAuth2 client-credentials authentication, request retries with p-retry, a common request helper for all HTTP calls, and Zod response validation. Every API response is checked against a schema so a malformed Cal.com reply never reaches the pipeline unmasked.
Expected output: The client acquires OAuth tokens lazily, caches them until 60 seconds before expiry, retries transient failures, and throws typed CalcomApiError instances on HTTP errors. A shared request<T> helper keeps all the retry and auth logic in one place.
Step 6: Create the OpenRouter LLM service
The src/services/openrouter-service.ts wraps OpenAI’s SDK pointed at OpenRouter’s base URL. It provides a non-streaming generateResponse, a streaming generateResponseStream for real-time TTS, and a cost estimator that knows per-model pricing.
ts
import OpenAI from "openai";import { getConfig } from "../lib/config.js";let _client: OpenAI | null = null;function getClient(): OpenAI { if (!_client) { const config = getConfig(); _client = new OpenAI({ baseURL: "https://openrouter.ai/api/v1", apiKey: config.OPENROUTER_API_KEY, defaultHeaders: { "HTTP-Referer": "https://github.com/reaatech/openrouter-voice-agent", "X-OpenRouter-Title": "OpenRouter Voice Agent for Cal.com", },
Expected output: The service is a thin adapter — it uses the OpenAI SDK with a custom baseURL so OpenRouter handles the routing. The buildSystemPrompt tells the LLM to emit JSON for intent classification and appointment extraction.
Step 7: Build the speech service (Deepgram STT + ElevenLabs TTS)
The src/services/speech-service.ts creates and connects the speech providers. The synthesizeSpeech function splits long TTS responses into sentences, synthesizes each one individually, and inserts a 500ms silence gap between utterances for natural pacing.
ts
import { DeepgramSTTProvider, STTProviderInterface, type DeepgramConfig } from "@reaatech/voice-agent-stt";import { ElevenLabsTTSProvider as ElevenLabsProvider, type ElevenLabsConfig, TTSProviderInterface } from "@reaatech/voice-agent-tts";import { DeepgramClient } from "@deepgram/sdk";import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";import { getConfig } from "../lib/config.js";import type { AudioChunk } from "@reaatech/voice-agent-core";export type { ElevenLabsProvider };export function createSTTProvider(): DeepgramSTTProvider { return new DeepgramSTTProvider();}export async function connectSTT(stt: DeepgramSTTProvider): Promise<void> { const config = getConfig(); const deepgramConfig: DeepgramConfig = { provider: "deepgram", apiKey: config.DEEPGRAM_API_KEY, model: "nova-2", language: "en", sampleRate: 8000, encoding: "mulaw", smartFormat: true, interimResults: true, endpointing: 300, }; await stt.connect(deepgramConfig);}export function createTTSProvider(): ElevenLabsProvider { return new ElevenLabsProvider();}export function convertAudioChunk(chunk: AudioChunk): AudioChunk { return STTProviderInterface.convertAudioFormat(chunk, 8000, "mulaw");}export async function* synthesizeSpeech( tts: ElevenLabsProvider, text: string): AsyncIterable<AudioChunk> { const config = getConfig(); const ttsConfig: ElevenLabsConfig = { provider: "elevenlabs", modelId: "eleven_flash_v2_5", voiceId: config.ELEVENLABS_VOICE_ID, outputFormat: "mulaw_8000", }; const sentences = TTSProviderInterface.chunkTextForStreaming(text, 200); for (let i = 0; i < sentences.length; i++) { const sentence = sentences[i]; const stream = tts.synthesize(sentence, ttsConfig); for await (const chunk of stream) { yield TTSProviderInterface.formatAudioForTwilio(chunk); } if (i < sentences.length - 1) { const silence = TTSProviderInterface.createSilenceChunk(500); yield silence; } }}export function cancelTTS(tts: ElevenLabsProvider): void { tts.cancel();}export async function closeTTS(tts: ElevenLabsProvider): Promise<void> { if (typeof (tts as { close?: () => Promise<void> }).close === "function") { await (tts as { close: () => Promise<void> }).close(); }}export function getAvailableSDKs(): { DeepgramClient: typeof DeepgramClient; ElevenLabsClient: typeof ElevenLabsClient } { return { DeepgramClient, ElevenLabsClient };}
Expected output: The STT provider connects with Nova-2 model at 8kHz mulaw (Twilio’s required format). TTS chunks are formatted for Twilio and paced with silence gaps, keeping perceived latency low. The closeTTS function safely tears down the provider at call end.
Step 8: Classify caller intent with the Confidence Router
The src/services/intent-classifier.ts sends the caller’s transcript to OpenRouter, parses the JSON response, and passes the prediction to the ConfidenceRouter for routing decisions. Intent classification is split into a standalone classifyIntent function that classifyAndRoute wraps.
ts
import { ConfidenceRouter, type RoutingDecision } from "@reaatech/confidence-router";import { generateResponse, buildSystemPrompt } from "./openrouter-service.js";import type { AppointmentDetails } from "../lib/types.js";let _router: ConfidenceRouter | null = null;export function getRouter(): ConfidenceRouter { if (!_router) { _router = new ConfidenceRouter({ routeThreshold: 0.8, fallbackThreshold: 0.3, clarificationEnabled: true, }); } return _router;}export function resetRouter(): void { _router = null;}export async function classifyIntent( transcript: string): Promise<{ intent: string; confidence: number }> { const systemPrompt = buildSystemPrompt(); const result = await generateResponse([ { role: "system", content: systemPrompt }, { role: "user", content: `Classify this caller's intent: "${transcript}". Respond with JSON: {"intent": "<intent>", "confidence": <0-1>}`, }, ]); let parsed: { intent?: string; confidence?: number }; try { parsed = JSON.parse(result.text) as Record<string, unknown>; } catch { return { intent: "unknown", confidence: 0 }; } return { intent: typeof parsed.intent === "string" ? parsed.intent : "unknown", confidence: typeof parsed.confidence === "number" ? parsed.confidence : 0, };}export async function classifyAndRoute( transcript: string, router?: ConfidenceRouter): Promise<RoutingDecision> { const r = router ?? getRouter(); const { intent, confidence } = await classifyIntent(transcript); const decision = r.decide({ predictions: [{ label: intent, confidence }], }); return decision;}export async function extractAppointmentDetails( transcript: string): Promise<AppointmentDetails> { const systemPrompt = buildSystemPrompt(); const result = await generateResponse([ { role: "system", content: systemPrompt }, { role: "user", content: `Extract appointment details from this request: "${transcript}". Respond with JSON: {"date": "...", "time": "...", "duration": 30, "description": "...", "attendeeName": "...", "attendeeEmail": "...", "attendeePhone": "..."}. Use null for missing fields.`, }, ]); let parsed: Record<string, unknown>; try { parsed = JSON.parse(result.text) as Record<string, unknown>; } catch { return {}; } const details: AppointmentDetails = {}; if (typeof parsed.date === "string") details.date = parsed.date; if (typeof parsed.time === "string") details.time = parsed.time; if (typeof parsed.attendeeName === "string") details.attendeeName = parsed.attendeeName; if (typeof parsed.attendeeEmail === "string") details.attendeeEmail = parsed.attendeeEmail; if (typeof parsed.attendeePhone === "string") details.attendeePhone = parsed.attendeePhone; return details;}
Expected output: The router returns one of three decision types — ROUTE (confidence >= 0.8), CLARIFY (0.3 to 0.8), or FALLBACK (< 0.3) — each with an optional prompt for the generated TTS reply.
Step 9: Wire the Cal.com business logic
The src/services/calcom-service.ts contains the intent handlers invoked after routing. Each handler calls the CalcomApiClient, formats a TTS-friendly confirmation string, and handles error cases gracefully.
ts
import { CalcomApiClient, CalcomApiError } from "../lib/calcom-api.js";import type { AppointmentDetails } from "../lib/types.js";export async function handleBookIntent( details: AppointmentDetails, api: CalcomApiClient): Promise<string> { if (!details.attendeeEmail || !details.attendeeName) { throw new CalcomApiError("Attendee name and email are required", 400); } const eventTypes = await api.getEventTypes(); if (eventTypes.eventTypes.length === 0
Expected output: Each handler returns a spoken-friendly sentence. The handleBookIntent looks up event types, picks the first available one, creates the booking, and returns a confirmation with the date, time, email, and reference code. The handleCheckAvailability and findBookingByPhone utilities are available for extending the voice agent later.
Step 10: Build the budget service
The src/services/budget-service.ts wraps @reaatech/agent-budget-engine with per-call budget tracking. Before every LLM call you check whether the session has budget remaining; after every call you record the cost.
Expected output: The budget controller tracks per-session spend. When the soft cap is breached, a threshold-breach event fires. At the hard cap, hard-stop fires and further LLM calls are blocked. The getBudgetStatus helper lets you query the current spend at any point.
Step 11: Create the pipeline service
The src/services/pipeline-service.ts ties together all the services. It creates a Pipeline instance from @reaatech/voice-agent-core with a session manager, latency enforcer, STT/TTS providers, and an MCP client adapter that routes utterances through intent classification and Cal.com handlers.
ts
import { createPipeline, createLatencyBudget, initializeSessionManager, defineConfig, LatencyBudgetEnforcer, type Pipeline, type SessionManager, type MCPClient, type AgentResponse,} from "@reaatech/voice-agent-core";import type { DeepgramSTTProvider } from "@reaatech/voice-agent-stt";import type { ElevenLabsTTSProvider } from "@reaatech/voice-agent-tts";import { getConfig } from "../lib/config.js";let _sessionManager: SessionManager | null = null;export function getSessionManager()
Expected output: The pipeline wireframe is ready. The createMCPClient adapter connects the pipeline’s MCP event system to your custom classifyAndRoute → handleBookIntent → synthesizeSpeech flow.
Step 12: Wire the voice route handler
The app/api/voice/route.ts is the entry point. It accepts Twilio Media Streams WebSocket connections, creates a per-call pipeline, and wires all handler events (audio received, barge-in detected, DTMF received, call start, call end, TTS events).
ts
import { NextResponse } from "next/server";import type WebSocket from "ws";export type { NextRequest } from "next/server";import { createTwilioHandler, type TwilioMediaStreamHandler } from "@reaatech/voice-agent-telephony";import { createSTTProvider, connectSTT, createTTSProvider, closeTTS, cancelTTS } from "../../../src/services/speech-service.js";import { createVoicePipeline, createMCPClient } from "../../../src/services/pipeline-service.js";import { classifyAndRoute, extractAppointmentDetails, getRouter } from "../../../src/services/intent-classifier.js";import { handleBookIntent, handleRescheduleIntent, handleCancelIntent } from "../../../src/services/calcom-service.js";import { CalcomApiClient } from "../../../src/lib/calcom-api.js";import
Expected output: The route accepts WebSocket connections, initializes the STT provider once (reused across calls), and wires every Twilio event to the corresponding pipeline action. The DTMF handler maps digits 1-3 to book/reschedule/cancel intents for callers navigating by keypad.
Step 13: Add the health check and Cal.com webhook routes
The health endpoint at app/api/health/route.ts returns uptime. The Cal.com webhook at app/api/calcom/webhook/route.ts accepts booking lifecycle events (created, rescheduled, cancelled).
ts
// app/api/health/route.tsimport { NextResponse } from "next/server";export function GET(): NextResponse { return NextResponse.json({ status: "ok", uptime: process.uptime() });}
Expected output:GET /api/health returns { status: "ok", uptime: <seconds> }. POST /api/calcom/webhook validates the trigger token, logs the event, and returns { ok: true }.
Step 14: Run the tests
The test suite uses Vitest with MSW to mock all external HTTP calls — OpenRouter, Cal.com, Deepgram, and ElevenLabs — so the suite runs without any live API keys. Coverage is configured at 90% minimum across lines, branches, functions, and statements.
terminal
pnpm typecheckpnpm lintpnpm test
Expected output:
pnpm typecheck exits 0 with no type errors
pnpm lint exits 0 with no ESLint violations
pnpm test exits 0 with numFailedTests: 0 and all four coverage metrics at 90% or higher
The test suite covers the CalcomApiClient (OAuth token fetching, caching, booking CRUD, retry logic), openrouter-service (response generation, streaming, cost estimation), speech-service (STT/TTS provider creation, audio conversion), intent-classifier (routing decisions at each confidence threshold), budget-service (spend tracking, state transitions), and calcom-service (intent handlers returning confirmation strings).
Next steps
Add a custom server with WebSocket upgrade — Next.js 16 App Router does not natively upgrade WebSocket connections on API routes. Create a server.ts at the repo root using node:http to handle the WebSocket upgrade for /api/voice, then forward handleConnection() for each new socket.
Wire Langfuse observability — The observability-service.ts module is ready to go. Call createSessionTrace() at call start and traceLLMCall() after each LLM completion to see full traces in the Langfuse dashboard.
Extend budget auto-downgrade — The budget policy already defines autoDowngrade rules. Expand the model list and add more granular rules — downgrade to deepseek/deepseek-v4-flash after 3 turns, or block expensive models entirely after the soft cap is breached.
Add DTMF fallback routes — The voice route already listens for dtmf:received events. Wire digits 1-3 to pre-defined book/reschedule/cancel flows so callers without reliable ASR can still navigate the system.
Deploy with a Twilio Media Streams URL — Point your Twilio phone number’s voice webhook to your deployed app’s /api/voice WebSocket endpoint. Set the MCP_ENDPOINT and all API keys, and your voice agent is live.
`You're booked for ${startDate.toLocaleDateString()} at ${startDate.toLocaleTimeString()}. ` +
`A confirmation has been sent to ${booking.attendee.email}. ` +
`Your booking reference is ${booking.uid}.`
);
}
export async function handleRescheduleIntent(
bookingUid: string,
newStartTime: string,
api: CalcomApiClient
): Promise<string> {
const result = await api.rescheduleBooking(bookingUid, newStartTime);
const startDate = new Date(result.booking.startTime);
return (
`Your appointment has been rescheduled to ${startDate.toLocaleDateString()} at ${startDate.toLocaleTimeString()}. ` +
`A confirmation has been sent to ${result.booking.attendee.email}.`
);
}
export async function handleCancelIntent(
bookingUid: string,
api: CalcomApiClient
): Promise<string> {
const booking = await api.getBooking(bookingUid);
if (booking.status === "cancelled") {
return "This appointment has already been cancelled.";
}
const result = await api.cancelBooking(bookingUid);
return `Your appointment on ${new Date(result.booking.startTime).toLocaleDateString()} has been cancelled. A confirmation has been sent to ${result.booking.attendee.email}.`;
}
export async function handleCheckAvailability(
date: string,
api: CalcomApiClient
): Promise<string> {
const eventTypes = await api.getEventTypes();
if (eventTypes.eventTypes.length === 0) {
return "There are no available appointment types at the moment.";
}
return "We have " + String(eventTypes.eventTypes.length) + " appointment type(s) available on " + (date || "your requested date") + ". Would you like to book one?";