A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a voice AI receptionist that answers after-hours phone calls, understands natural speech, responds intelligently using Mistral AI, and books appointments on Google Calendar. You’ll wire up Twilio telephony, Deepgram speech-to-text, ElevenLabs text-to-speech, session continuity with agent memory, and observability via Langfuse — orchestrated by a Fastify server with Next.js App Router webhook endpoints.
Prerequisites
Node.js 22+ and pnpm 10 installed
A Twilio account with a phone number that has Media Streams enabled
Deepgram API key (Nova-2 model)
ElevenLabs API key (Turbo v2.5 model)
Mistral AI API key
Google Cloud service account with Calendar API enabled (for appointment booking)
OpenAI API key (optional — used by the agent memory module for embeddings and fact extraction)
Langfuse account (optional — for observability)
Familiarity with TypeScript and async/await patterns
Step 1: Set up the project scaffold
Create the project directory and initialize a pnpm workspace. A Fastify server handles the voice agent runtime, while Next.js App Router routes serve as Twilio webhook endpoints.
Create next-env.d.ts so the TypeScript includes resolve:
ts
/// <reference types="next" />/// <reference types="next/image-types/global" />import "./.next/types/routes.d.ts";// NOTE: This file should not be edited// see https://nextjs.org/docs/app/api-reference/config/typescript for more information.
Expected output: Three config files plus a Next.js types reference that the TypeScript compiler and test runner can load without errors.
Step 3: Define core domain types
Create src/types.ts with interfaces for the call session, appointment requests, FAQs, business hours, and calendar slots:
Expected output: A types file that makes the rest of the code self-documenting. Every subsequent module imports from here.
Step 4: Create typed configuration from environment variables
Create src/config.ts — a single entry point that reads all process.env values and returns a typed AppConfig object. Missing critical variables throw immediately so the server fails fast at startup.
Expected output: Three constructible error classes. new CalendarError("msg") instanceof VoiceAgentError evaluates to true.
Step 6: Set up session continuity
Create src/session-store.ts — this module wraps @reaatech/session-continuity’s SessionManager to track each phone call’s conversation state, apply token budgets, and automatically trigger agent memory extraction when new messages arrive.
Expected output: A sessionManager singleton that creates sessions, tracks messages up to a 4096-token budget using a sliding-window compression strategy, and triggers memory extraction after each pair of messages.
Step 7: Wire up agent memory
Create src/agent-memory.ts — this module uses @reaatech/agent-memory to extract facts and preferences from conversations and retrieve them as context for future calls. The memory module uses OpenAI for both embeddings and extraction, so set OPENAI_API_KEY in your environment.
Expected output: An agentMemory singleton that can extract facts and preferences from conversations and retrieve relevant memories to inject into the Mistral system prompt.
Step 8: Connect Deepgram speech-to-text
Create src/stt-provider.ts — wraps @reaatech/voice-agent-stt’s DeepgramSTTProvider with connectivity, streaming, and event wiring.
Expected output: After connectSTT(config), audio chunks streamed via streamAudio(chunk) produce final and interim transcripts through the Deepgram Nova-2 model at 8kHz mu-law encoding.
Step 9: Connect ElevenLabs text-to-speech
Create src/tts-provider.ts — wraps @reaatech/voice-agent-tts’s ElevenLabsTTSProvider to synthesize speech and stream it in a format Twilio can play.
ts
import { ElevenLabsTTSProvider, TTSProviderInterface } from "@reaatech/voice-agent-tts";import type { AudioChunk } from "@reaatech/voice-agent-core";import type { AppConfig } from "./config.js";export const ttsProvider = new ElevenLabsTTSProvider();export async function speak( text: string, sendAudio: (chunk: AudioChunk) => void, config: AppConfig,): Promise<void> { const sentences = TTSProviderInterface.chunkTextForStreaming(text, 200); for (let i = 0; i < sentences.length; i++) { const silence = TTSProviderInterface.createSilenceChunk(i === 0 ? 200 : 300); sendAudio(silence); for await (const chunk of ttsProvider.synthesize(sentences[i], { provider: "elevenlabs", modelId: "eleven_turbo_v2_5", apiKey: config.elevenlabsApiKey, })) { sendAudio(TTSProviderInterface.formatAudioForTwilio(chunk)); } }}export function cancelSpeech(): void { ttsProvider.cancel();}
Expected output: Calling speak("Hello", sendAudio, config) splits the text into sentences, prepends a brief silence chunk, synthesizes each sentence through ElevenLabs Turbo v2.5, and streams the audio through the provided callback.
Step 10: Integrate Mistral AI for conversation
Create src/mistral-chat.ts — this module talks to the Mistral API to generate responses, builds the system prompt from the current memory context, and classifies the caller’s intent.
ts
import { Mistral } from "@mistralai/mistralai";import { sessionManager } from "./session-store.js";export const mistral = new Mistral({ apiKey: process.env.MISTRAL_API_KEY ?? "",});export async function generateResponse( conversationText: string, sessionId: string, memoryContext: string,): Promise<string> { try { const context = await sessionManager.getConversationContext(sessionId); const history = context.map((msg) => ({ role: msg.role as "user" | "assistant" | "system", content: typeof msg.content === "string" ? msg.content : "", })); const result = await mistral.chat.complete({ model: "mistral-large-latest", messages: [ { role: "system", content: buildSystemPrompt(memoryContext) }, ...history, { role: "user", content: conversationText }, ], }); const firstChoice = result.choices[0]; const content = firstChoice.message ? firstChoice.message.content ?? undefined : undefined; if (content && typeof content === "string") { return content; } return "I'm sorry, I couldn't process that."; } catch (error) { console.error("Mistral error:", error); return "I'm having trouble connecting to our AI service. Please try again shortly."; }}export function buildSystemPrompt(memoryContext: string): string { return `You are a helpful after-hours receptionist for a business. \Answer FAQs about business hours and services. \Book appointments via the calendar tool. \Escalate urgent requests to a human agent. \Keep responses concise and friendly. \${memoryContext ? `\nRelevant context: ${memoryContext}` : ""}`;}export function parseIntent( response: string,): "faq" | "appointment" | "escalation" | "unknown" { const lower = response.toLowerCase(); if (lower.includes("book") || lower.includes("appointment") || lower.includes("schedule")) { return "appointment"; } if ( lower.includes("hours") || lower.includes("open") || lower.includes("service") || lower.includes("faq") ) { return "faq"; } if (lower.includes("emergency") || lower.includes("urgent") || lower.includes("escalate")) { return "escalation"; } return "unknown";}
Expected output:generateResponse("What are your hours?", "sess-1", "") calls Mistral’s chat completion API and returns the assistant’s text. On API failure it returns a polite fallback instead of crashing.
Step 11: Build the Google Calendar integration
Create src/lib/calendar.ts — this module authenticates with Google Calendar and exposes three functions: availability checking, appointment booking, and listing upcoming events.
ts
import { google } from "googleapis";import { CalendarError } from "../errors.js";import type { AppConfig } from "../config.js";import type { TimeSlot, CalendarEvent, AppointmentRequest } from "../types.js";let cachedConfig: AppConfig | null = null;export function createCalendarClient(config: AppConfig) { cachedConfig = config; const auth = new google.auth.GoogleAuth({ keyFile: config.googleCalendarCredentialsPath, scopes: ["https://www.googleapis.com/auth/calendar"], });
Expected output:checkAvailability("primary", new Date("2026-01-01T09:00:00Z"), new Date("2026-01-01T17:00:00Z")) returns an array of free time slots, excluding periods the Google Calendar API reports as busy.
Step 12: Set up observability with Langfuse
Create src/observability.ts — wraps Langfuse tracing so every call and Mistral interaction is logged for debugging and analytics.
Expected output:createObservability(config) creates a Langfuse client. createCallTrace("CA123") returns a trace object that can record Mistral calls and custom events. All failures are silently logged rather than thrown, so observability issues never crash the call flow.
Step 13: Create the Twilio webhook handler
Create src/api/twilio-webhook.ts — this module handles incoming Twilio phone calls by creating a session, generating TwiML to greet the caller and connect a media stream, and creating the Twilio handler for bidirectional audio.
ts
import { createTwilioHandler } from "@reaatech/voice-agent-telephony";import twilio from "twilio";import { sessionManager } from "../session-store.js";import type { AppConfig } from "../config.js";export async function handleIncomingCall( body: Record<string, unknown>, config: AppConfig,): Promise<{ status: number; body: string; headers?: Record<string, string> }> { try { const callSid = body.CallSid as string | undefined; const from = body.From as string | undefined; if (!callSid) { return { status: 400, body: "<Response><Say>Error: Missing CallSid</Say></Response>" }; } await sessionManager.createSession({ userId: from ?? "unknown" }); const response = new twilio.twiml.VoiceResponse(); response.say("Welcome to after-hours support. How can I help?"); response .start() .stream({ url: `wss://localhost:${String(config.fastifyPort)}/twilio/media-stream` }); return { status: 200, body: response.toString(), headers: { "Content-Type": "text/xml" } }; } catch (error) { console.error("handleIncomingCall error:", error); return { status: 500, body: JSON.stringify({ error: "Failed to create session" }) }; }}export function createMediaStreamHandler() { return createTwilioHandler({ bargeInEnabled: true, minSpeechDuration: 300, });}
Expected output: A POST to /twilio/incoming-call with {CallSid: "CA123", From: "+155****4567"} returns a TwiML <Response> that greets the caller and connects a WebSocket media stream.
Step 14: Orchestrate the voice pipeline
Create src/pipeline.ts — this module assembles the voice-agent-core pipeline: a latency-bounded loop that processes audio through STT, calls Mistral via an MCP client adapter, and returns the TTS response.
Expected output:createVoicePipeline(sttProvider, ttsProvider) assembles the full audio->text->LLM->speech pipeline with an 800ms latency target and event handlers for turn completion and errors.
Step 15: Wire everything in the Fastify server entrypoint
Create src/index.ts — this is the Fastify server that ties every module together: registers the Twilio webhook routes, starts the WebSocket media stream handler, and handles graceful shutdown.
ts
import Fastify, { type FastifyRequest, type FastifyReply, type FastifyInstance } from "fastify";import fastifyWebsocket from "@fastify/websocket";import type { WebSocket } from "ws";import { getConfig } from "./config.js";import { handleIncomingCall, createMediaStreamHandler } from "./api/twilio-webhook.js";import { sttProvider, connectSTT, streamAudio, closeSTT } from "./stt-provider.js";import { ttsProvider, cancelSpeech } from "./tts-provider.js";import { sessionManager } from "./session-store.js";import { agentMemory } from "./agent-memory.js";import { createVoicePipeline, endSession, destroyPipeline }
Expected output: Running pnpm dev:server boots the Fastify server on the configured port. A Twilio webhook to POST /twilio/incoming-call creates a session, returns greeting TwiML, and prepares a WebSocket media stream endpoint at /twilio/media-stream.
While the Fastify server handles the real-time voice pipeline, the recipe also provides Next.js App Router routes as an alternative entry point for Twilio webhooks. These routes parse Twilio’s form-encoded POST body and delegate to the same handleIncomingCall function.
Create app/api/twilio/incoming-call/route.ts:
ts
import { NextRequest, NextResponse } from "next/server";import { handleIncomingCall } from "../../../../src/api/twilio-webhook.js";import { getConfig } from "../../../../src/config.js";export async function POST(req: NextRequest) { const body: Record<string, unknown> = {}; const formData = await req.formData(); for (const [key, value] of formData.entries()) { body[key] = value; } const config = getConfig(); const result = await handleIncomingCall(body, config); return new NextResponse(result.body, { status: result.status, headers: result.headers ?? { "Content-Type": "text/xml" }, });}
Create app/api/twilio/status-callback/route.ts:
ts
import { NextRequest, NextResponse } from "next/server";export async function POST(req: NextRequest) { try { const formData = await req.formData(); const body: Record<string, unknown> = {}; for (const [key, value] of formData.entries()) { body[key] = value; } console.log("Call status update:", body); return NextResponse.json({ status: "ok" }); } catch (error) { console.error("Status callback error:", error); return NextResponse.json( { error: "Failed to process status callback" }, { status: 500 }, ); }}
Expected output: A POST to /api/twilio/incoming-call with Twilio’s form-encoded data returns a TwiML XML response. A POST to /api/twilio/status-callback returns {"status":"ok"}.
Step 17: Configure environment variables
Create .env.example with placeholder entries for every secret the server reads:
env
# Env vars used by mistral-ai-voice-agent-for-after-hours-customer-support.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentTWILIO_ACCOUNT_SID=<your-twilio-account-sid>TWILIO_AUTH_TOKEN=<your-twilio-auth-token>DEEPGRAM_API_KEY=<your-deepgram-api-key>ELEVENLABS_API_KEY=<your-elevenlabs-api-key>MISTRAL_API_KEY=<your-mistral-api-key>GOOGLE_CALENDAR_CREDENTIALS=<path-to-service-account-key-json>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.comFASTIFY_PORT=3000SESSION_TTL_SECONDS=3600
Copy this file to .env and fill in your real API keys:
terminal
cp .env.example .env
The agent memory module also reads OPENAI_API_KEY from your environment if you want memory extraction and retrieval to work. Without it, the memory features gracefully degrade but the voice agent still functions.
Expected output: The server reads these variables at startup. Any missing required variable (TWILIO_ACCOUNT_SID, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, MISTRAL_API_KEY) throws a clear error.
Step 18: Run the tests
The project ships with a comprehensive test suite covering every module. Run it with:
terminal
pnpm test
The test suite covers:
Configuration — missing env vars throw, empty strings fall back to defaults
Errors — error classes chain correctly via instanceof
Expected output:numTotalTests >= 30, numFailedTests === 0, and coverage above 90% for lines, branches, functions, and statements.
Next steps
Wire the WebSocket URL dynamically — replace the hardcoded wss://localhost in handleIncomingCall with the server’s public hostname, read from an env var or the Twilio Host header.
Add DTMF menu navigation — the media stream handler already logs DTMF digits from the dtmf:received event; extend it to route callers to specific departments or FAQs based on keypresses.
Connect a real database — replace the in-memory MemoryAdapter (session store) and storage: "memory" (agent memory) with PostgreSQL or Redis for persistence across server restarts.
Deploy behind a reverse proxy — add an nginx or Caddy layer for TLS termination and domain routing; Twilio requires HTTPS/WSS for production webhooks.
Add escalation routing — when parseIntent returns "escalation", the pipeline could forward a transcript and caller info to a Slack channel or email queue for a human agent to follow up.