Service advisors at independent auto-repair shops are constantly interrupted by phone calls from customers asking for price estimates on common repairs. Each call pulls them away from the shop floor, slowing down bay turnover and frustrating mechanics. Missed or delayed callbacks lead to lost jobs and lower conversion rates. Advisors need a way to handle initial triage without leaving their current task.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Service advisors at independent auto-repair shops spend a significant portion of their day on the phone answering “how much to fix X” calls. Each call pulls them off the shop floor, slows down bay turnover, and leads to missed or delayed callbacks that cost jobs. This recipe builds a voice agent that handles the initial triage — listening to the customer’s description, identifying the repair type, providing a ballpark estimate, and escalating to a human advisor when the request is too complex. You’ll wire six REAA packages into a Next.js dashboard backed by a Fastify WebSocket server, with Deepgram for speech-to-text, ElevenLabs for text-to-speech, OpenAI for LLM-based repair assessment, and Langfuse for observability and cost tracking.
Prerequisites
Node.js >= 22 and pnpm 10.x installed
Twilio account — a phone number with voice capabilities, your Account SID, and Auth Token
Deepgram API key for speech-to-text
ElevenLabs API key for text-to-speech (note the default voice ID used in this recipe)
OpenAI API key for LLM-based repair assessment
Langfuse account — host URL plus public and secret keys for observability tracing
Familiarity with TypeScript, Next.js App Router patterns, and basic WebSocket concepts
Step 1: Set up environment variables
The project scaffold is already on disk — package.json, tsconfig.json, vitest.config.ts, next.config.ts, and the app/ shell are in place. Start by inspecting the environment file and adding your real credentials.
Open .env.example — it lists every variable the system reads:
env
# Env vars used by agnostic-service-advisor-call-agent.# Keep placeholders only — never commit real values.NODE_ENV=development# Twilio credentials for PSTN telephonyTWILIO_ACCOUNT_SID=<your-twilio-account-sid>TWILIO_AUTH_TOKEN=<your-twilio-auth-token>TWILIO_PHONE_NUMBER=<your-twilio-phone-number># Deepgram API key for speech-to-text (STT)DEEPGRAM_API_KEY=<your-deepgram-api-key># ElevenLabs API key for text-to-speech (TTS)ELEVENLABS_API_KEY=<your-elevenlabs-api-key># OpenAI API key for LLM (provider-agnostic LLM via @ai-sdk/openai)OPENAI_API_KEY=<your-openai-api-key># Langfuse credentials for LLM observability and tracingLANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_BASE_URL=<your-langfuse-base-url># Fastify WebSocket server portSERVER_PORT=3001# WebSocket URL exposed to clients (NEXT_PUBLIC_ prefix for client access)NEXT_PUBLIC_WS_URL=ws://localhost:3001# Session and latency configurationSESSION_TTL_SECONDS=3600LATENCY_TARGET_MS=800LATENCY_HARD_CAP_MS=1200
Copy this file to .env.local and fill in every <...> placeholder with your real credentials. The Zod-based config loader will throw a ConfigurationError at startup if any required field is missing.
Run the install command to confirm everything resolves:
terminal
pnpm install
Expected output: No errors. The lockfile is already present so this should be fast.
Step 2: Define types and configuration
Shared TypeScript types live in src/types.ts. They define the domain model for the entire system: repair intents, customer info, estimates, session records, and latency metrics.
Configuration validation lives in src/config.ts. It uses Zod to parse process.env and throws a ConfigurationError from @reaatech/agent-handoff when a required variable is missing:
The config singleton is evaluated at module-import time so every service gets a frozen validated config object.
Expected output: Running pnpm typecheck exits 0 with no errors.
Step 3: Build the pricing database
The pricing database in src/lib/pricing-db.ts provides hardcoded pricing ranges for each repair intent and applies a 1.75x multiplier for luxury vehicle makes.
Notice the unknown intent error — this throw prevents silent fallback to general when an unrecognized intent is passed.
Expected output: TypeScript compiles cleanly. pnpm lint produces no warnings.
Step 4: Create the STT and TTS adapters
The STT adapter wraps @reaatech/voice-agent-stt with Deepgram’s Nova-2 model configured for Twilio’s mu-law 8kHz audio. It includes retry logic via withRetry from @reaatech/agent-handoff.
The TTS adapter wraps @reaatech/voice-agent-tts with ElevenLabs’ eleven_flash_v2_5 model, splitting long responses into sentences, adding 300ms silence gaps between chunks, and falling back to an apology audio when synthesis fails.
ts
import { createTTSProvider, TTSProviderInterface } from "@reaatech/voice-agent-tts";import type { AudioChunk } from "@reaatech/voice-agent-core";import type { AppConfig } from "../types.js";const ELEVENLABS_CONFIG = { provider: 'elevenlabs' as const, modelId: 'eleven_flash_v2_5', voiceId: 'JBFqnCBsd6RMkjVDRZzb', outputFormat: 'mulaw_8000',};export class TextToSpeechService { private provider: ReturnType<typeof createTTSProvider>; readonly name = 'elevenlabs'; readonly supportsStreaming = true; readonly firstByteLatencyMs: number | null = null; constructor(config: AppConfig) { this.provider = createTTSProvider({ provider: 'elevenlabs', config: { ...ELEVENLABS_CONFIG, apiKey: config.elevenlabs.apiKey, }, }); } async *speak(text: string): AsyncIterable<AudioChunk> { if (!text) return; const sentences = TTSProviderInterface.chunkTextForStreaming(text, 200); for (let i = 0; i < sentences.length; i++) { const sentence = sentences[i]; if (!sentence) continue; try { for await (const rawChunk of this.provider.synthesize(sentence, { ...ELEVENLABS_CONFIG, apiKey: '', })) { const formatted = TTSProviderInterface.formatAudioForTwilio(rawChunk); yield formatted; } } catch (err) { console.error('TTS synthesis error:', err); try { for await (const fallbackChunk of this.provider.synthesize( `I'm sorry, I didn't quite catch that.`, { ...ELEVENLABS_CONFIG, apiKey: '' }, )) { const formatted = TTSProviderInterface.formatAudioForTwilio(fallbackChunk); yield formatted; } } catch { const silence = TTSProviderInterface.createSilenceChunk(300); yield silence; } } if (i < sentences.length - 1) { const gap = TTSProviderInterface.createSilenceChunk(300); yield gap; } } } cancel(): void { this.provider.cancel(); }}
Expected output:pnpm typecheck passes. In the test suite these classes are mocked — you never need real Deepgram or ElevenLabs keys to run tests.
Step 5: Wire Twilio telephony
The Twilio call manager in src/telephony/twilio-handler.ts wraps createTwilioHandler from @reaatech/voice-agent-telephony. It manages a WebSocket connection from Twilio’s <Stream> verb and exposes typed events.
This class delegates every audio and control operation to the REAA handler, so the pipeline and server code never touch the Twilio protocol directly.
Expected output: TypeScript compiles. The bargeInEnabled: true setting lets callers interrupt the TTS playback at any time.
Step 6: Build the LLM repair advisor
The repair advisor in src/llm/repair-advisor.ts uses the Vercel AI SDK with OpenAI to turn a customer’s spoken transcript into a structured RepairAssessment. It also triggers the escalation router when the confidence is low.
ts
import { generateText, Output } from "ai";import { openai } from "@ai-sdk/openai";import { z } from "zod";import type { CustomerInfo, RepairAssessment } from "../types.js";import { EscalationRouter } from "../handoff/escalation-router.js";const escalationRouter = new EscalationRouter();const AUTO_REPAIR_INTENTS = [ 'oil_change', 'brake_service', 'tire_service', 'engine_repair', 'transmission', 'ac_service', 'battery', 'general',] as const;export const RepairAssessmentSchema = z.object({ intent: z.enum(AUTO_REPAIR_INTENTS), confidence: z.number().min(0).max(1), estimatedCostLow: z.number(), estimatedCostHigh: z.number(), partsCostEstimate: z.number(), laborCostEstimate: z.number(), estimatedTimeMinutes: z.number(), followUpQuestions: z.array(z.string()), needsHumanHandoff: z.boolean(),});export const AUTO_REPAIR_SYSTEM_PROMPT = `You are an expert auto repair service advisor. Your job is to assess customer repair needs and provide estimates.Return a JSON object matching the schema. Set needsHumanHandoff to true if the issue is complex or you're unsure.Pricing ranges (parts + labor):- Oil change: $30-80- Brake service: $150-300 per axle- Tire service: $100-300 each- Engine repair: $500-2000- Transmission: $1500-4000- AC service: $150-500- Battery: $100-250- General service: $50-500Estimate within these bands based on the customer's description. If the customer's issue is complex, set needsHumanHandoff to true.`;export const FOLLOW_UP_SYSTEM_PROMPT = `You are a friendly auto repair service advisor. Respond to the customer's question based on the repair assessment. Be concise, helpful, and professional. Keep responses under 2 sentences when possible.`;export async function generateRepairAssessment( transcript: string, customerInfo: CustomerInfo,): Promise<RepairAssessment> { const result = await generateText({ model: openai("gpt-4o"), output: Output.object({ schema: RepairAssessmentSchema }), system: AUTO_REPAIR_SYSTEM_PROMPT, prompt: `Customer vehicle: ${customerInfo.vehicleMake ?? 'unknown'} ${customerInfo.vehicleModel ?? ''} ${customerInfo.vehicleYear !== undefined ? String(customerInfo.vehicleYear) : ''}Customer concern: ${transcript}`, }); if (result.output.needsHumanHandoff) { await escalationRouter.evaluateEscalation(result.output, transcript); } return result.output;}export async function generateFollowUpResponse( history: Array<{ role: 'user' | 'assistant'; content: string }>, assessment: RepairAssessment,): Promise<string> { const result = await generateText({ model: openai("gpt-4o"), system: FOLLOW_UP_SYSTEM_PROMPT, messages: [ { role: 'system', content: `Current assessment: ${assessment.intent} repair, estimated $${String(assessment.estimatedCostLow)}-$${String(assessment.estimatedCostHigh)}`, }, ...history, ], }); return result.text;}
The system prompt embeds pricing ranges per intent type so the LLM stays within realistic bands. When needsHumanHandoff is true, the advisor routes through the escalation router before returning.
Expected output: TypeScript compiles. At test time, both ai and @ai-sdk/openai are mocked so no real LLM calls are made.
Step 7: Create the escalation router
The escalation router in src/handoff/escalation-router.ts uses CapabilityBasedRouter and AgentRegistry from @reaatech/agent-handoff-routing. It registers two agents — an automated service-advisor (skilled in triage and pricing) and a human-advisor — and evaluates whether to escalate.
ts
import { CapabilityBasedRouter, AgentRegistry } from "@reaatech/agent-handoff-routing";import { HandoffError } from "@reaatech/agent-handoff";import type { RoutingDecision, HandoffPayload } from "@reaatech/agent-handoff";import type { CallSessionRecord, RepairAssessment } from "../types.js";export class EscalationRouter { private registry: AgentRegistry; private router: CapabilityBasedRouter; constructor() { this.registry = new AgentRegistry(); this.registry.register({ agentId: 'service-advisor', agentName: 'Service Advisor'
The router’s evaluateEscalation wraps router.route() in a try/catch — if the routing layer itself throws, it returns a fallback decision so the call doesn’t hang.
Expected output:pnpm typecheck exits 0. The agent registry is populated with two agents during construction.
Step 8: Implement the session store
The session store in src/session/session-store.ts wraps initializeSessionManager from @reaatech/voice-agent-core, pairing the REAA session with a business-level CallSessionRecord that tracks intent, estimate, and customer info.
The callSidIndex map lets you look up a session by the Twilio call SID — essential for the telephony handler’s onCallEnd event which only has the call SID, not the internal session ID.
Expected output: TypeScript compiles. The store is purely in-memory — sessions are ephemeral and lost on process restart.
Step 9: Set up observability and cost tracking
The observability service in src/observability/telemetry.ts wraps Langfuse for distributed tracing and uses createCostTracker from @reaatech/voice-agent-core to meter usage per provider.
The initObservability / getObservabilityService pair provides a singleton pattern used by the instrumentation hook in the Next.js bootstrap. Cost rates for Deepgram ($0.0059/min), ElevenLabs ($0.000015/char), and OpenAI ($0.00001/$0.00003 per token) are embedded so every call produces a running cost estimate.
Expected output: Module compiles cleanly. The cost tracker only runs when enabled: true.
Step 10: Wire the voice pipeline
The pipeline in src/pipeline/voice-pipeline.ts is the orchestration core. It wraps everything into a single createPipeline() call from @reaatech/voice-agent-core, wiring the STT provider, TTS provider, MCP client (the repair advisor), latency budget, and session manager.
ts
import { createPipeline, createLatencyBudget, initializeSessionManager, defineConfig, LatencyBudgetEnforcer, Pipeline, AudioChunk,} from "@reaatech/voice-agent-core";import type { STTProvider, TTSProvider } from "@reaatech/voice-agent-core";import type { SpeechToTextService } from "../stt/stt-adapter.js";import type { TextToSpeechService } from "../tts/tts-adapter.js";import type { TwilioCallManager } from "../telephony/twilio-handler.js";import type { ObservabilityService } from "../observability/telemetry.js";import type { AppConfig } from "../types.js";
The pipeline handles every event in the STT to MCP to TTS lifecycle: it feeds interim transcripts to the Twilio handler for barge-in detection, forwards TTS chunks to the audio stream, and on error it plays a fallback apology and ends the session. The pipeline:turn:end handler logs stage-level latency and calls trackSTTUsage / trackTTSUsage on the observability service.
Expected output:pnpm typecheck passes. All six pipeline events (stt:interim, tts:chunk, tts:complete, mcp:response, turn:end, error) have handlers.
Step 11: Create the Next.js API routes and instrumentation
Three API routes expose the system status. The status route at app/api/status/route.ts returns a simple health-check response:
Expected output:GET /api/status returns {"status":"ok","activeCalls":0,"uptime":<n>}. GET /api/calls returns []. GET /api/calls/nonexistent returns 404.
Step 12: Bootstrap the Fastify WebSocket server
The server at src/server.ts creates a Fastify instance, registers @fastify/websocket, and wires the Twilio incoming-call webhook to the media-stream WebSocket.
ts
import Fastify from "fastify";import type { FastifyInstance } from "fastify";import type { WebSocket } from "ws";import type { AppConfig } from "./types.js";import { config } from "./config.js";import { SpeechToTextService } from "./stt/stt-adapter.js";import { TextToSpeechService } from "./tts/tts-adapter.js";import { TwilioCallManager } from "./telephony/twilio-handler.js";import { buildVoicePipeline } from "./pipeline/voice-pipeline.js";import { ObservabilityService } from "./observability/telemetry.js";import { CallSessionStore }
The startServer function is the application entry point. It initializes observability, creates the session store, registers the Twilio webhook route (which returns TwiML instructing Twilio to open a <Stream> WebSocket), and creates a fresh pipeline per WebSocket connection. Each connection gets its own STT service, TTS service, and Twilio call manager — so calls are fully isolated.
Expected output:pnpm typecheck and pnpm lint both pass. The server can be started with startServer() and gracefully shut down with shutdownServer().
Step 13: Run the tests
The test suite covers every module with mocked external dependencies. The setup file provides placeholder env vars:
Expected output: All tests pass with coverage thresholds (lines, branches, functions, statements) at 90% or above. The test report is written to vitest-report.json.
Run type checking and linting too:
terminal
pnpm typecheckpnpm lint
Both should exit 0.
Next steps
Deploy to production — wrap the Fastify server behind a TLS-terminating reverse proxy (nginx or Caddy) and configure Twilio’s voice webhook URL to point at your /twilio/incoming-call endpoint over HTTPS.
Persist sessions — replace the in-memory CallSessionStore with a database-backed implementation (Redis or Postgres) so sessions survive server restarts and you can review call history.
Add SMS follow-up — after a call ends, send an SMS with the estimate summary using Twilio’s Messaging API so the customer has a written record.
Expand pricing data — replace the hardcoded pricing map with a database or API lookup so each shop can set their own labor rates and parts markups.
Monitor latency — wire the LatencyBudgetEnforcer metrics into a real-time dashboard so you can spot slow STT or LLM stages before callers notice.
: `I estimate this ${assessment.intent} repair will cost between $${String(assessment.estimatedCostLow)} and $${String(assessment.estimatedCostHigh)}.`,