A front-desk receptionist at an urgent care or mental health clinic manually calls or texts each no-show patient, often leaving voicemails that go unanswered. This process takes 10-15 minutes per patient and fails to rebook most appointments. With 10-20 no-shows daily, the clinic loses revenue and staff time. The receptionist needs an automated system that calls patients, understands their availability, and rebooks directly into the scheduling system.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build an AI-powered voice agent that automatically calls patients who missed their appointments, has a natural conversation with them to understand their availability, and rebooks their appointment — all in under 30 seconds per call. You’ll wire together the REAA voice agent ecosystem (speech-to-text, text-to-speech, pipeline orchestration, session continuity, and handoff routing) with Next.js, Hono, Twilio Media Streams, and the Vercel AI SDK. By the end, you’ll have a working Twilio voice webhook that streams audio through a STT → LLM → TTS pipeline in real time.
Prerequisites
Node.js 22+ and pnpm 10 installed
A Twilio account with a phone number that has voice capabilities (trial accounts work)
A Deepgram API key (for speech-to-text and text-to-speech)
An OpenAI API key (for LLM-powered conversation and intent extraction)
A Langfuse account (for observability — free tier works)
Basic familiarity with TypeScript, Next.js App Router, and WebSocket concepts
Step 1: Scaffold the project and install dependencies
Start with a fresh Next.js 16+ project using the App Router, then install all the dependencies you’ll need.
Expected output: Two files — .env.local with real credentials and .env.example with placeholder values, including NODE_ENV=development.
Step 3: Create the config module with Zod validation
Create src/config.ts. This module validates all required environment variables at startup using Zod, and defines the voice agent pipeline configuration using the defineConfig helper from @reaatech/voice-agent-core.
Expected output: A typed config module. When any required env var is missing, loadAppConfig() throws a ValidationError with the names of the missing keys. The voiceAgentConfig object is ready to pass into the pipeline constructor.
Step 4: Build the token counter and in-memory storage adapter
The @reaatech/session-continuity package requires two implementations: a TokenCounter for estimating LLM token usage, and an IStorageAdapter for persisting sessions and messages. For this recipe you’ll use in-memory storage — suitable for development and single-server deployments.
Create src/services/token-counter.ts:
ts
import type { TokenCounter, Message } from '@reaatech/session-continuity';export type { TokenCounter } from '@reaatech/session-continuity';export class SimpleTokenCounter implements TokenCounter { readonly model = 'gpt-4o'; readonly tokenizer = 'simple-char-based'; count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { return messages.reduce((acc, msg) => { const text = typeof msg.content === 'string' ? msg.content : ''; return acc + this.count(text); }, 0); } countTextTokens(text: string): number { return Math.ceil(text.length / 4); } countMessageTokens(msg: { content: string }): number { return this.countTextTokens(msg.content); }}export function createTokenCounter(): TokenCounter { return new SimpleTokenCounter();}
The token counter uses a 4-characters-per-token heuristic. For English medical scheduling conversations, this is a close enough approximation for budgeting and compression decisions.
Create src/services/storage-adapter.ts:
ts
import type { IStorageAdapter, Session, SessionId, Message, MessageId, SessionFilters, MessageQueryOptions, UpdateSessionOptions, HealthStatus,} from '@reaatech/session-continuity';export type { IStorageAdapter } from '@reaatech/session-continuity';import { ConcurrencyError } from '@reaatech/session-continuity';export class MemoryStorageAdapter implements IStorageAdapter { private sessions = new Map<string, Session>(); private messages = new Map<string, Message
Expected output:SimpleTokenCounter tokenizes text at 4 characters per token, and MemoryStorageAdapter stores sessions and messages in Map instances with monotonically incrementing message sequences and optimistic concurrency via expectedVersion.
Step 5: Create the session continuity service
Now wire the token counter and storage adapter into a SessionManager from @reaatech/session-continuity. This service manages multi-turn conversation state with token budgeting and sliding-window compression.
Expected output:createSessionManager() returns a configured SessionManager with 4,096 token budget, sliding-window compression when exceeding 3,500 tokens, and hourly session TTL. The withRetryOnConflict helper retries conflicting writes up to 3 times with 100ms base delay.
Step 6: Wire Deepgram STT and TTS providers
The audio service creates providers for speech-to-text and text-to-speech using Deepgram, plus utility functions for Twilio audio format conversion.
Create src/services/audio-service.ts:
ts
import { DeepgramSTTProvider } from '@reaatech/voice-agent-stt';import { TTSProviderInterface, createTTSProvider } from '@reaatech/voice-agent-tts';import type { TTSProvider, AudioChunk } from '@reaatech/voice-agent-core';export async function createSttProvider(apiKey: string): Promise<DeepgramSTTProvider> { const provider = new DeepgramSTTProvider(); await provider.connect({ provider: 'deepgram', apiKey, model: 'nova-2', language: 'en', sampleRate: 8000, encoding: 'mulaw', smartFormat: true, interimResults: true, endpointing: 300, }); return provider;}export function createTtsProvider(apiKey: string): TTSProvider { if (!apiKey) { throw new Error('DEEPGRAM_API_KEY is required'); } return createTTSProvider({ provider: 'deepgram', config: { provider: 'deepgram', apiKey, voice: 'asteria', model: 'aura', encoding: 'mulaw', sampleRate: 8000 }, });}export function convertAudioForTwilio(chunk: AudioChunk): AudioChunk { return TTSProviderInterface.formatAudioForTwilio(chunk);}export function createSilence(durationMs: number): AudioChunk { return TTSProviderInterface.createSilenceChunk(durationMs, 8000);}export function chunkTextForTts(text: string, maxSize?: number): string[] { return TTSProviderInterface.chunkTextForStreaming(text, maxSize ?? 200);}
Expected output: Four utility functions plus provider factories. createSttProvider connects to Deepgram with the Nova-2 model at 8kHz mu-law encoding, and createTtsProvider uses the Deepgram Aura voice “asteria”. The audio conversion helpers adapt chunks between the pipeline and Twilio’s streaming format.
Step 7: Build the voice pipeline
The pipeline orchestrates the full STT → MCP (LLM) → TTS lifecycle with latency enforcement and event-driven observability.
Expected output: A pipeline that enforces a 800ms target latency with per-stage budgets (STT: 200ms, MCP: 400ms, TTS: 200ms) and a 1200ms hard cap. Seven lifecycle events are wired for observability.
Step 8: Build the no-show recovery agent
The agent uses the Vercel AI SDK to have a natural conversation with the patient and extract structured rebooking intent. It uses generateText for dialogue and generateText with Output.object for intent extraction.
Create src/services/agent-service.ts:
ts
import { generateText } from 'ai';import { Output } from 'ai';import { openai } from '@ai-sdk/openai';import { z } from 'zod';export const RebookingDecisionSchema = z.object({ action: z.enum(['reschedule', 'cancel', 'leave_voicemail', 'escalate']), preferredDate: z.string().optional(), preferredTime: z.string().optional(), confirmed: z.boolean(), summary: z.string(),});export type RebookingDecision = z.infer<typeof RebookingDecisionSchema>;export type AgentResponse = { text: string; toolCalls?: unknown[]; latency: number;};export class NoShowRecoveryAgent { private config: { apiKey: string }; constructor(config: { apiKey: string }) { this.config = config; } async processUtterance( transcript: string, sessionContext: Array<{ role: 'user' | 'assistant'; content: string }> ): Promise<{ text: string; toolCalls?: unknown[]; latency: number }> { const start = Date.now(); const result = await generateText({ model: openai('gpt-4o'), system: 'You are a courteous medical scheduling assistant for an urgent care clinic. A patient missed their appointment. Your goal: (1) greet warmly, (2) confirm their name, (3) explain you are calling about their missed appointment, (4) ask if they would like to reschedule, (5) collect their preferred date/time, (6) confirm the new appointment, (7) thank them and end the call. Be empathetic - they may be unwell. Keep responses under 2 sentences.', messages: [ ...sessionContext, { role: 'user' as const, content: transcript }, ], }); return { text: result.text, toolCalls: result.toolCalls, latency: Date.now() - start, }; } async extractRebookingIntent(transcript: string): Promise<RebookingDecision> { const result = await generateText({ model: openai('gpt-4o'), output: Output.object({ schema: RebookingDecisionSchema }), prompt: 'Extract the patient rebooking intent: ' + transcript, }); return result.output; }}export function createAgent(apiKey: string): NoShowRecoveryAgent { return new NoShowRecoveryAgent({ apiKey });}
Expected output: An agent that converses with patients using a medical-scheduling system prompt, then extracts structured rebooking decisions (reschedule, cancel, leave voicemail, or escalate) via Zod-validated structured output.
Step 9: Build the scheduling and handoff services
The scheduling service communicates with a backend appointment system via REST calls with exponential backoff retries. The handoff router decides whether to keep the call with the AI agent or escalate to a human based on confidence thresholds.
Expected output:SchedulingService fetches available slots and books appointments with exponential backoff. HandoffRouterService routes calls with confidence >= 0.7 to the AI agent, and routes low-confidence or unintelligible calls to human escalation or voicemail.
Step 10: Create the call orchestrator
The orchestrator ties Twilio outbound calling, the voice pipeline, and the media stream handler together into a single coordinated workflow.
Expected output:VoiceCallOrchestrator initiates outbound Twilio calls with a webhook URL, creates a session for each call, and returns a media stream handler that wires audio chunks, barge-in detection, and call-end cleanup.
Step 11: Build the Hono WebSocket server for Twilio Media Streams
Twilio sends bidirectional audio through Media Streams over WebSocket. You’ll use a standalone Hono HTTP+WebSocket server to accept these connections and relay audio between Twilio and the pipeline.
Create src/services/hono-server.ts:
ts
import { Hono } from 'hono';import { serve, upgradeWebSocket } from '@hono/node-server';import type { ServerType } from '@hono/node-server';import { WebSocketServer } from 'ws';import type { VoiceCallOrchestrator } from './call-orchestrator.js';import { TwilioMediaStreamHandler } from '@reaatech/voice-agent-telephony';interface WebSocketLike { send(data: string): void; readyState?: number;}let activeStreamSid: string | null =
Expected output: A Hono server listening on the configured port (default 3001) with a /media-stream WebSocket endpoint. It handles Twilio start, media, and stop events, decoding incoming base64 audio and encoding outbound TTS audio back to Twilio’s format.
Step 12: Create the API routes and instrumentation
Create the Twilio webhook route that receives inbound voice calls and returns TwiML to start a Media Stream. Also create a health-status endpoint.
Create app/api/twilio/route.ts:
ts
import { NextRequest, NextResponse } from 'next/server';import twilio from 'twilio';let twilioClient: ReturnType<typeof twilio> | null = null;export function getTwilioClient(): ReturnType<typeof twilio> { if (!twilioClient) { twilioClient = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN); } return twilioClient;}export async function POST(req: NextRequest): Promise<NextResponse> { const formData = await req.formData(); const callSid = formData.get('CallSid') as string | null; if (!callSid) { return NextResponse.json({ error: 'Missing CallSid' }, { status: 400 }); } const fromNumber = formData.get('From') as string | null; const toNumber = formData.get('To') as string | null; const callStatus = formData.get('CallStatus') as string | null; if (callStatus === 'ringing') { const baseUrl = process.env.VOICE_AGENT_BASE_URL ?? 'http://localhost:3000'; const wsPort = process.env.HONO_WS_PORT ?? '3001'; const host = new URL(baseUrl).hostname; const streamUrl = `wss://${host}:${wsPort}/media-stream`; const twiml = `<?xml version=\"1.0\" encoding=\"UTF-8\"?><Response> <Say voice=\"alice\">Hello, this is the clinic calling about your missed appointment.</Say> <Connect> <Stream url=\"${streamUrl}\"/> </Connect></Response>`; return new NextResponse(twiml, { headers: { 'Content-Type': 'text/xml' }, }); } if (callStatus === 'completed' || callStatus === 'no-answer' || callStatus === 'busy' || callStatus === 'failed') { void callSid; void fromNumber; void toNumber; return NextResponse.json({ ok: true, callStatus, callSid }); } return NextResponse.json({ ok: true });}
Create app/api/status/route.ts:
ts
import { NextResponse } from 'next/server';export function GET(): NextResponse { return NextResponse.json({ status: 'ok', timestamp: new Date().toISOString(), version: '0.1.0', activeCalls: 0, });}
Now create the instrumentation module that initializes Langfuse telemetry and starts the Hono server when the app boots in the Node.js runtime. Stub functions provide no-op implementations for providers and pipeline that get replaced at call time by the actual media stream handler.
Expected output:POST /api/twilio handles Twilio callbacks — returning TwiML with a <Stream> URL for ringing calls, and acknowledging status callbacks for completed/no-answer/busy/failed. GET /api/status returns a health-check JSON response. The instrumentation hooks into Next.js server startup to initialize Langfuse telemetry and the Hono WebSocket server.
Step 13: Run the test suite
The project includes a comprehensive test suite with 17 test files covering every service, route, and boundary case. Run the tests with coverage:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: All 171 tests pass (numFailedTests: 0), and coverage reports show >= 90% on lines, branches, functions, and statements across the runtime code in src/ and app/api/.
To start the full system in development mode:
terminal
pnpm dev
This starts Next.js on port 3000 (serving the webhook routes) and the Hono WebSocket server on port 3001 (handling Twilio Media Streams). Configure your Twilio phone number’s voice webhook to point at https://your-domain/api/twilio, and the no-show recovery voice agent is ready to call.
Next steps
Add a human-escalation endpoint — When the handoff router decides type: "fallback", forward the call transcript to a WebSocket dashboard for a human receptionist to take over.
Replace the in-memory storage with PostgreSQL — Implement the IStorageAdapter interface using Prisma or Drizzle to persist sessions and messages across server restarts.
Add a daily batch scheduler — Write a cron job that queries the clinic’s EHR system for no-shows each morning and calls initiateRecoveryCall for each patient automatically.
Integrate DTMF input — Use @reaatech/voice-agent-core’s built-in DTMF input support to let patients press 1 to reschedule or 2 to speak to a human without speech recognition.
Deploy to production — Deploy the Next.js app to Vercel or a Node.js host, and the Hono server to a VM or container with a public WebSocket endpoint. Add Twilio Elastic SIP Trunking for carrier-grade voice quality.
[]>();
private messageSequences = new Map<string, number>();