Small auto-repair shops miss revenue from after-hours callers who need estimates but can’t reach anyone. Manual intake by voicemail is slow, error-prone, and rarely converts to a booked job.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a telephone-based voice agent that answers after-hours calls for small auto-repair shops. When a customer calls, the agent uses STT (OpenAI Whisper) to transcribe their speech, runs a finite-state machine to collect vehicle details (make, model, year, symptom), queries OpenAI’s gpt-4o-mini for a repair cost estimate, and reads the result back via TTS (Deepgram Aura). A confidence-based router classifies caller intent — get an estimate, schedule an appointment, or speak to a human — and routes the conversation accordingly.
The project is scaffolded as a Next.js application. It uses Express internally for the HTTP server and WebSocket endpoint that integrates with Twilio Media Streams. Session state is kept in an in-memory session manager from the @reaatech/voice-agent-core package — no external database required.
Prerequisites
Node.js >= 22 with pnpm@10
A Twilio phone number with Voice and Media Streams enabled
An OpenAI API key (for Whisper STT + gpt-4o-mini estimate generation)
A Deepgram API key (for Deepgram Aura TTS)
A Langfuse account (for observability — public key, secret key, and host URL)
Assumed knowledge
You should be comfortable with TypeScript, Next.js route handlers, and basic WebSocket concepts. No deep Twilio experience is needed — the Twilio wiring is handled by the @reaatech/voice-agent-telephony package.
Step 1: Scaffold the project and install dependencies
Create a new directory and initialize the project. The package.json pins every dependency to an exact version — no ^ or ~ ranges.
Expected output: After pnpm install, your node_modules/ directory is populated and pnpm-lock.yaml exists.
Step 2: Configure environment variables with Zod
Create .env.example with placeholder values for every required environment variable. These are read at server startup and validated through a Zod schema.
env
# Env vars used by openai-voice-agent-for-auto-repair-estimates.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentOPENAI_API_KEY=<your-openai-api-key>DEEPGRAM_API_KEY=<your-deepgram-api-key>TWILIO_ACCOUNT_SID=<your-twilio-account-sid>TWILIO_AUTH_TOKEN=<your-twilio-auth-token>TWILIO_PHONE_NUMBER=<your-twilio-phone-number>PORT=3000LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=<your-langfuse-host>
Now create src/config.ts with a Zod schema that validates every variable and infers a typed Config:
ts
import { z } from "zod";const envSchema = z.object({ OPENAI_API_KEY: z.string().min(1, "OPENAI_API_KEY is required"), DEEPGRAM_API_KEY: z.string().min(1, "DEEPGRAM_API_KEY is required"), TWILIO_ACCOUNT_SID: z.string().min(1, "TWILIO_ACCOUNT_SID is required"), TWILIO_AUTH_TOKEN: z.string().min(1, "TWILIO_AUTH_TOKEN is required"), TWILIO_PHONE_NUMBER: z.string().min(1, "TWILIO_PHONE_NUMBER is required"), PORT: z.coerce.number().int().positive().default(3000), LANGFUSE_PUBLIC_KEY: z.string().min(1, "LANGFUSE_PUBLIC_KEY is required"), LANGFUSE_SECRET_KEY: z.string().min(1, "LANGFUSE_SECRET_KEY is required"), LANGFUSE_HOST: z.string().min(1, "LANGFUSE_HOST is required"),});export type Config = z.infer<typeof envSchema>;export function parseConfig(): Config { return envSchema.parse(process.env);}
Expected output:parseConfig() returns a typed Config object when all env vars are present, or throws a Zod error listing every missing or invalid variable.
Step 3: Define shared types
Create src/types.ts with the types used across all modules — the vehicle info structure, the estimate result shape, the finite-state machine enum, and the intent label union.
Expected output: These types compile cleanly. The EstimateState enum drives the conversation flow: each user utterance advances the state machine by one step until the estimate is generated.
Step 4: Initialize the Langfuse client
Create src/langfuse.ts as a singleton that all services import for observability. Langfuse traces every estimate generation and session cost.
ts
import { Langfuse } from "langfuse";import { parseConfig } from "./config.js";const config = parseConfig();export const langfuse = new Langfuse({ publicKey: config.LANGFUSE_PUBLIC_KEY, secretKey: config.LANGFUSE_SECRET_KEY, baseUrl: config.LANGFUSE_HOST,});
Expected output: A single langfuse instance is ready. The parseConfig() call here means the module throws immediately at import time if env vars are missing — a fast-fail pattern.
Step 5: Build the voice engine
The voice engine is the core audio layer. It creates a session manager, an STT provider (OpenAI Whisper), a TTS provider (Deepgram Aura), a cost tracker, a recording manager, and a latency budget enforcer.
Create src/services/voice-engine.ts:
ts
import { initializeSessionManager, createLatencyBudget, createCostTracker, createRecordingManager, LatencyBudgetEnforcer, type Session, type Turn, type AudioChunk,} from "@reaatech/voice-agent-core";import { createSTTProvider, type STTProvider } from "@reaatech/voice-agent-stt";import { createTTSProvider, type TTSProvider } from "@reaatech/voice-agent-tts";import { langfuse } from "../langfuse.js";export interface MinimalSessionManager { createSession(params: { callSid: string; mcpEndpoint: string; sttProvider
Expected output:createVoiceEngine() returns an object with startSession, processAudioChunk, addTurn, closeSession, and destroy methods. Session metadata stores the estimate FSM state and vehicle info across turns.
Step 6: Wire the telephony handler
The telephony handler connects a Twilio WebSocket to the voice engine. It converts incoming mu-law audio to linear16 for Whisper and handles call lifecycle events.
Expected output: When a Twilio call arrives, the handler creates a session, converts audio chunks, and forwards them to the voice engine. When the caller speaks over the TTS (barge-in), the handler cancels playback and clears the audio buffer.
Step 7: Create the intent router
The intent router uses @reaatech/confidence-router with a KeywordClassifier to classify caller intent. Four intents are defined: get_estimate, schedule_appointment, speak_to_human, and end_call.
Expected output:routeIntent("how much to fix my brakes") returns a ROUTE decision with target: "get_estimate". Empty or ambiguous input returns CLARIFY or FALLBACK, which the orchestrator handles gracefully.
Step 8: Build the estimate collection FSM
The estimate collector is a pure finite-state machine that processes one turn at a time. Each state extracts the needed information from the transcript and returns the next state plus a spoken response.
Create src/services/estimate-collector.ts:
ts
import { EstimateState, type VehicleInfo } from "../types.js";const MAKE_PATTERN = /\b(toyota|honda|ford|chevy|chevrolet|bmw|mercedes|audi|nissan|subaru|dodge|jeep|hyundai|kia|vw|volkswagen)\b/i;const YEAR_PATTERN = /\b(19[89]\d|20[0-2]\d)\b/;const YES_PATTERN = /\b(yes|
Expected output: Calling processTurn(EstimateState.IDLE, "hello", null) returns { nextState: AWAITING_MAKE, responseText: "Welcome!..." }. Each call advances the FSM by one step, accumulating vehicle info along the way.
Step 9: Create the estimate composer with OpenAI
The estimate composer sends the collected vehicle details to gpt-4o-mini and parses the JSON response. It includes a retry loop with exponential backoff for rate limits and server errors.
Create src/services/estimate-composer.ts:
ts
import OpenAI from "openai";import { z } from "zod";import type { VehicleInfo, EstimateResult } from "../types.js";import type { Config } from "../config.js";import { langfuse } from "../langfuse.js";const SYSTEM_PROMPT = `You are an experienced auto repair estimator. Given a vehicle's make, model, year, and symptom, return a JSON object estimating repair costs. Include: laborHours (number of hours), laborRate (hourly rate in USD), partsEstimate (parts cost in USD), totalLow (optimistic total), totalHigh (pessimistic total), description (2-3 sentence explanation), estimatedDuration (e.g. '2-3 hours'). Base labor rate on $100-150/hr for independent shops. Be realistic — use standard shop rates and parts pricing for common repairs. Return ONLY valid JSON, no markdown fences, no preamble.`;const estimateResultSchema = z.object({ laborHours: z.number(), laborRate: z.number(), partsEstimate: z.
Expected output:generateEstimate({ make: "Toyota", model: "Camry", year: 2020, symptom: "brakes" }) calls OpenAI with the system prompt, parses the JSON response, and returns a typed EstimateResult. On rate limit errors (429), it sleeps for 1 second and retries. On auth errors (401/403), it re-throws immediately.
Step 10: Build the call orchestrator
The call orchestrator ties everything together. It listens for transcribed utterances, routes them through the intent router, advances the estimate FSM, and plays TTS responses over the Twilio audio stream.
Create src/services/call-orchestrator.ts:
ts
import { TTSProviderInterface, type TTSProvider } from "@reaatech/voice-agent-tts";import { type RoutingDecision } from "@reaatech/confidence-router";import { EstimateState, type VehicleInfo, type EstimateResult } from "../types.js";import { processTurn } from "./estimate-collector.js";import { routeIntent, validateTranscript } from "./intent-router.js";import type { VoiceEngine } from "./voice-engine.js";function formatEstimateForSpeech(result: EstimateResult): string { return `Here is your estimate. The total could range from $${String(result.totalLow)
Expected output: The orchestrator routes every spoken utterance through the confidence router, then dispatches to the right handler: advance the estimate FSM, schedule an appointment, transfer to a human, or end the call.
Step 11: Set up the Express + WebSocket server
The Express server — running inside a Next.js application — exposes a TwiML endpoint at POST /twilio-voice, a WebSocket server at /media-stream for bidirectional audio, and a health check endpoint at GET /health.
Create src/app.ts:
ts
import express from "express";import http from "http";import { WebSocketServer } from "ws";import { parseConfig } from "./config.js";import { createVoiceEngine } from "./services/voice-engine.js";import { createTelephonyHandler } from "./services/telephony-handler.js";import { createEstimateComposer } from "./services/estimate-composer.js";import { createCallOrchestrator } from "./services/call-orchestrator.js";export async function createApp() { await Promise.resolve(); const config = parseConfig();
Expected output: The Twilio webhook returns valid TwiML connecting the call to the WebSocket media stream. The health endpoint returns { status: "ok", uptime: ..., sessions: N }.
Step 12: Create the entry point with graceful shutdown
The entry point boots the server, listens on the configured port, and handles cleanup on SIGTERM, SIGINT, and uncaught errors.
Expected output: Running pnpm dev starts the server on port 3000. Sending a SIGTERM signal triggers graceful shutdown: WebSocket connections are closed, the HTTP server stops, and Langfuse flushes pending events.
Step 13: Write and run the tests
The test suite covers every module with isolated mocks for external dependencies. Below is the intent router test as an example — it verifies all four routing targets, the clarify boundary, the fallback for minimal keyword matches, and the empty-transcript validation.
Create tests/services/intent-router.test.ts:
ts
import { describe, it, expect } from "vitest";import { routeIntent, validateTranscript } from "../../src/services/intent-router.js";describe("intent-router", () => { describe("routeIntent", () => { it('returns ROUTE with target "get_estimate" for repair query', async () => { const decision = await routeIntent( "estimate quote how much cost repair fix price brakes engine transmission oil ac battery", ); expect(decision.type).toBe("ROUTE"); expect(decision.target).toBe("get_estimate"); });
The full test suite includes test files for the config, voice engine, telephony handler, estimate collector, estimate composer, call orchestrator, and the app itself. Run them all with:
terminal
pnpm test
Expected output: All tests pass, numFailedTests=0, numTotalTests >= 3, and coverage for lines, branches, functions, and statements is at or above 90%.
Next steps
Add a CRM integration — when a caller confirms a callback or appointment, push the vehicle info and transcript to a webhook or database instead of just storing it in session metadata.
Add multi-language support — extend the STT configuration to detect the caller’s language and route to the appropriate Whisper model. Adjust the FSM prompts for non-English callers.
Deploy with a Twilio SIP trunk — instead of a Twilio phone number, connect the agent to an existing office phone system via SIP so it handles overflow calls when human receptionists are busy.