Field service businesses miss after-hours booking calls, forcing dispatchers to spend mornings returning voicemails and manually entering job details into ServiceTitan, leading to scheduling delays and lost revenue.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
After-hours calls are a lifeline for field service businesses, but dispatchers spend every morning returning voicemails, manually entering job details into ServiceTitan, and juggling schedules that are already out of date by 9 AM. This recipe builds a voice agent that answers those calls automatically, understands spoken requests via Deepgram STT, handles the conversation with Mistral AI’s LLM, and books jobs directly into ServiceTitan’s REST API — all within a Next.js 16 app using the App Router.
Prerequisites
Node.js 22+ and pnpm 10+ installed on your machine
A Twilio account with a phone number that has voice capabilities
A Deepgram API key (speech-to-text and text-to-speech)
A Mistral AI API key (LLM for intent classification and job detail extraction)
An OpenAI API key (for the semantic cache embedder)
A ServiceTitan developer account with OAuth 2.0 credentials (client ID, client secret, tenant ID)
An AWS account (optional — for exporting LLM cost metrics to CloudWatch)
A Langfuse account (optional — for LLM observability)
Familiarity with TypeScript and Next.js App Router — you’ll be reading and editing route handlers, services, and configuration files
Step 1: Create the project from the starter
The recipe ships as a Next.js 16 project using the App Router. Start by cloning it and installing dependencies.
terminal
git clone <repo-url
>
mistral-voice-agent
cd mistral-voice-agent
pnpm install
This installs all dependencies including the REAA voice-agent packages (@reaatech/voice-agent-core, @reaatech/voice-agent-telephony, @reaatech/voice-agent-stt, @reaatech/voice-agent-tts), the Mistral SDK, Twilio, Deepgram, Langfuse, and the AWS CloudWatch client.
Expected output: pnpm creates a node_modules/ directory and a pnpm-lock.yaml file. No errors.
Step 2: Configure environment variables
Copy the example environment file and fill in your credentials. Every variable the agent reads at runtime is listed here.
Expected output: A .env file with all placeholders replaced by real API keys. Never commit this file.
Step 3: Build the Zod-validated configuration module
The app reads all configuration from environment variables at startup. A Zod schema validates that required values are present and provides defaults for optional ones. Create src/lib/config.ts:
loadConfigFromEnv() is called during server startup in src/instrumentation.ts. If a required variable is missing, Zod throws a descriptive error immediately, before any calls are handled.
Expected output: The file compiles without type errors. Run pnpm typecheck to confirm.
Step 4: Create the Mistral AI service
The voice agent uses Mistral AI’s chat completion API for three tasks: classifying the caller’s intent (book, inquire, cancel, reschedule), extracting structured job details from the conversation, and generating a natural-language confirmation. Create src/lib/mistral.ts:
handleBookingIntent classifies what the caller wants. If the intent is book, the orchestrator calls extractJobDetails to parse the customer name, address, phone, and issue description, then generateConfirmation to produce a spoken response.
Expected output: The three methods compile. Run pnpm typecheck again.
Step 5: Build the ServiceTitan API client
The agent creates jobs and appointments in ServiceTitan using OAuth 2.0 client credentials. The client handles token caching and retries transient failures with p-retry. Create src/services/servicetitan.ts:
ts
import pRetry, { AbortError } from "p-retry";async function safeJson(response: Response): Promise<Record<string, unknown>> { const text = await response.text(); try { return JSON.parse(text) as Record<string, unknown>; } catch { return {}; }}export interface ServiceTitanJob { id?: string; customerId
Three patterns to notice:
Token caching — ensureToken() reuses the access token until it expires, avoiding an OAuth handshake on every request.
Retry with abort — 4xx errors throw AbortError (no retry), 5xx errors retry up to 3 times.
Error wrapping — All API errors are wrapped in ServiceTitanError with the HTTP status and response body attached.
Expected output: The file typechecks. The tests in tests/services/servicetitan.test.ts verify the retry and error-handling logic using MSW.
Step 6: Add LLM response caching
Every LLM call is expensive. The @reaatech/llm-cache package provides semantic caching: it embeds the prompt with OpenAI’s text-embedding-3-small and returns a cached response if a semantically similar prompt was seen recently. Create src/services/llm-cache.ts:
createCacheEngine sets up in-memory storage, an OpenAI embedder, a cosine-similarity threshold of 0.85, and per-use-case TTLs. LruCacheService wraps this with get/set methods that default to the servicetitan-booking use case.
Expected output: The file typechecks. The cache is wired into the orchestrator in the next step.
Step 7: Track LLM costs with CloudWatch export
Every LLM call contributes to your monthly spend. The cost tracker records each call by provider, model, and token usage, then flushes aggregated costs to CloudWatch as a custom metric. Create src/services/cost-tracker.ts:
getMistralPricing returns per-model prices in dollars per million tokens. recordLlmCall calculates the cost and stores a CostSpan. flushToCloudWatch sends all accumulated spans as CloudWatch metrics, called by the orchestrator when a call ends.
Expected output: The file typechecks. CloudWatch uses Unit: "Count" because the StandardUnit union doesn’t include "Dollars".
Step 8: Wire Twilio webhooks and TwiML responses
When a call comes in, Twilio sends a POST request to your voice webhook URL. The response is TwiML (Twilio’s XML-based instruction format) that tells Twilio to connect the call to a WebSocket media stream. Create src/api/twilio-webhook.ts:
ts
import twilio from 'twilio';export function generateVoiceTwiML(wsHost: string, _wsPort: number): string { return `<?xml version="1.0" encoding="UTF-8"?><Response><Connect><Stream url="wss://${wsHost}:${String(_wsPort)}/media-stream"/></Connect></Response>`;}export function generateErrorTwiML(): string { return '<?xml version="1.0" encoding="UTF-8"?><Response><Say>Sorry, an error occurred. Please try again later.</Say><Hangup/></Response>';}export function createTwilioClient() { return twilio(process.env.TWILIO_ACCOUNT_SID ?? '', process.env.TWILIO_AUTH_TOKEN ?? '');}export interface VoiceWebhookEvent { CallSid: string; From: string; To: string; CallStatus: string;}export interface StatusCallbackEvent { CallSid: string; CallStatus: string; CallDuration?: string;}
Now create the App Router route handler that receives Twilio’s POST and returns TwiML at app/api/twilio/voice/route.ts:
The route uses NextRequest and NextResponse (not bare Request/Response) as required by the App Router. The TwiML response tells Twilio to stream call audio to a WebSocket server running on WS_PORT.
Also create the status callback at app/api/twilio/status/route.ts:
This route receives Twilio’s status callbacks and triggers cleanup (closing the session, flushing cost metrics) when a call ends.
Expected output: Both route files typecheck. The voice route returns XML with a <Connect><Stream> element; the status route returns {"ok": true}.
Step 9: Build the WebSocket media stream handler
The media stream handler bridges the Twilio WebSocket to Deepgram STT and TTS. It receives audio chunks from the caller, streams them to Deepgram for transcription, passes the transcript to the orchestrator, and sends the synthesized response audio back through Twilio. Create src/api/twilio-media-stream.ts:
ts
import { createTwilioHandler, TwilioMediaStreamHandler } from '@reaatech/voice-agent-telephony';import { DeepgramSTTProvider } from '@reaatech/voice-agent-stt';import { DeepgramTTSProvider, TTSProviderInterface } from '@reaatech/voice-agent-tts';import type { AudioChunk } from '@reaatech/voice-agent-core';import { createPipeline, initializeSessionManager, createLatencyBudget, defineConfig, LatencyBudgetEnforcer, MockSTTProvider, MockTTSProvider, MockMCPClient,} from '@reaatech/voice-agent-core';import WebSocket, { WebSocketServer } from 'ws';import type { IncomingMessage } from 'http';import type
The audio bridge flow: createMediaStreamServer starts a WebSocket server on WS_PORT. On connection, a MediaStreamManager accepts the Twilio WebSocket and wires up event handlers for incoming audio, barge-in, and call end. initCall connects to Deepgram STT; interim transcripts feed the barge-in detector, final transcripts go to the orchestrator, and synthesized audio is streamed back through Twilio.
Expected output: The file typechecks. The WebSocket server starts when createMediaStreamServer is called from instrumentation.
Step 10: Build the voice agent orchestrator
The orchestrator is the brain of the system. It manages sessions through @reaatech/voice-agent-core’s SessionManager, processes transcriptions through intent classification and job detail extraction, and coordinates caching and cost tracking. Create src/services/voice-agent.ts:
ts
import { createPipeline, createLatencyBudget, initializeSessionManager, LatencyBudgetEnforcer, defineConfig } from '@reaatech/voice-agent-core';import type { ServiceTitanClient } from '../services/servicetitan.js';import type { MistralService } from '../lib/mistral.js';import type { LruCacheService } from '../services/llm-cache.js';import type { CostTracker } from '../services/cost-tracker.js';export class VoiceAgentOrchestrator { private sessionManager = initializeSessionManager({ defaultTTL: 3600, maxTurns: 20, maxTokens: 4000, }); private latencyEnforcer =
The processTranscription method is the core loop: check cache, classify intent, extract details for bookings, cache the response, record cost, and return the response text for TTS synthesis.
Expected output: The file typechecks. The orchestrator is the central dependency for the media stream handler and instrumentation startup.
Step 11: Wire everything together with instrumentation
Next.js 16’s instrumentation.ts runs once at server startup. This is where you initialize observability, set up the Langfuse logger, instantiate all services, and start the WebSocket media stream server. Create src/instrumentation.ts:
Without this flag, the register() function is dead code and the WebSocket server never starts.
Expected output: The server boots, the WebSocket server starts on port 3001, and the status callback route has a reference to orchestrator.endCall().
Step 12: Run the tests
The project includes a Vitest test suite that mocks all external services (Twilio, Deepgram, Mistral AI, ServiceTitan, CloudWatch) using vi.mock and MSW. Run it to verify everything works:
terminal
pnpm test
This runs vitest run --coverage --reporter=json --outputFile=vitest-report.json. You should see output showing all tests passing and coverage at 90%+ across lines, branches, functions, and statements.
terminal
pnpm typecheckpnpm lint
Both should exit with zero errors.
Expected output: All tests pass. Typecheck and lint are clean. The integration tests verify concurrent call handling, zero-transcript (silence) calls, ServiceTitan retry logic, token caching, and error handling.
Next steps
Deploy with ngrok — Run ngrok http 3000 and point your Twilio phone number’s voice webhook to https://<your-ngrok-url>/api/twilio/voice to test with a real phone call.
Add a web dashboard — Build a real-time dashboard showing active calls, recent transcriptions, and per-call cost breakdowns using the session data from sessionManager.getAllSessions().
Persist the cache — Replace InMemoryAdapter with a Redis or SQLite adapter from @reaatech/llm-cache so cached responses survive server restarts and scale across instances.
Add appointment rescheduling — Extend the intent classifier to handle rescheduling flows, including looking up existing appointments in ServiceTitan and proposing alternate time slots.
Monitor with Langfuse — Connect Langfuse traces to your observability pipeline for full LLM call tracing, prompt debugging, and cost analytics across all provider interactions.
'Classify the user intent as one of: "book", "inquire", "cancel", "reschedule". Return JSON: {"intent": "...", "details": {...}}. Extract partial booking details only if intent is "book".',