Small businesses lose potential customers when calls go unanswered after hours. Hiring 24/7 staff is cost-prohibitive, and basic voicemail often fails to capture and qualify leads in real-time.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an AI-powered voice receptionist that answers after-hours calls using xAI Grok, LiveKit, Deepgram, and Cartesia. In about 30 minutes, you will build a backend that receives LiveKit agent-dispatch webhooks, classifies caller intent with keyword routing, enforces per-call AI spend budgets, caches common responses to cut costs, and escalates urgent issues via Twilio SMS — all instrumented with Langfuse tracing.
Now install the REAA packages and third-party dependencies. These are vendored packages that provide the intent router, budget controller, semantic cache, agent handoff protocol, and media pipeline.
Open package.json and verify every dependency is pinned to an exact semver — no ^, ~, or *. The "type": "module" field must also be present.
Expected output: A Next.js 16 project with app/, src/, and all dependencies installed and exact-pinned.
Step 2: Configure environment variables
Replace your .env.example with the full set of environment variables the agent needs. Create a .env file by copying this example and filling in your real API keys.
env
# Env vars used by xai-grok-voice-agent-for-after-hours-customer-support.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# xAI GrokXAI_API_KEY=<your-xai-api-key>GROK_MODEL=grok-3# LiveKitLIVEKIT_API_KEY=<your-livekit-api-key>LIVEKIT_API_SECRET=<your-livekit-api-secret>LIVEKIT_HOST=<your-livekit-host-url># Deepgram (STT)DEEPGRAM_API_KEY=<your-deepgram-api-key># Cartesia (TTS)CARTESIA_API_KEY=<your-cartesia-api-key># Twilio (SMS/Callbacks)TWILIO_ACCOUNT_SID=<your-twilio-account-sid>TWILIO_AUTH_TOKEN=<your-twilio-auth-token>TWILIO_PHONE_NUMBER=<your-twilio-phone-number># Langfuse (Observability)LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=https://cloud.langfuse.com# OpenAI (required by OpenAIEmbedder from @reaatech/llm-cache for semantic cache)OPENAI_API_KEY=<your-openai-key># BudgetAGENT_BUDGET_CENTS=50# Confidence RouterCONFIDENCE_ROUTE_THRESHOLD=0.8CONFIDENCE_FALLBACK_THRESHOLD=0.3
terminal
cp .env.example .env# Now fill .env with your real API keys
Expected output:.env.example has all 18 env variables documented with placeholder values and group comments. Your .env has real values.
Step 3: Define shared types with Zod
Create the shared type definitions that every service module imports. This file defines interfaces for the call session, intent classification, handoff requests, budget records, and the agent configuration — plus Zod schemas for each for runtime validation.
Expected output: Six TypeScript interfaces and three Zod schemas exported from src/lib/types.ts.
Step 4: Create the xAI Grok client
This module wraps the Vercel AI SDK’s xai() provider and exposes a GrokClient with generateResponse and streamResponse methods. Extracting token counts from the usage object lets the budget engine track per-call spend.
Expected output: A GrokClient type and createGrokClient() factory at src/lib/grok.ts.
Step 5: Build the pricing provider
The PricingProvider interface from @reaatech/agent-budget-engine allows the budget controller to estimate the cost of a Grok API call before sending it. Hard-code the xAI pricing rates using the per-million-token rates published as of May 2026.
typescript
// src/lib/pricing-provider.tsimport type { PricingProvider } from "@reaatech/agent-budget-engine";const GROK_PRICING: Map<string, { inputPerM: number; outputPerM: number }> = new Map([ ["grok-3", { inputPerM: 2.0, outputPerM: 8.0 }], ["grok-3-mini", { inputPerM: 0.5, outputPerM: 2.0 }],]);export class GrokPricingProvider implements PricingProvider { estimateCost(modelId: string, estimatedInputTokens: number): number { const pricing = GROK_PRICING.get(modelId); if (!pricing) return 0; const outputEstimate = Math.round(estimatedInputTokens * 0.5); const inputCost = (estimatedInputTokens / 1_000_000) * pricing.inputPerM; const outputCost = (outputEstimate / 1_000_000) * pricing.outputPerM; return inputCost + outputCost; }}let _instance: GrokPricingProvider | undefined;export function createPricingProvider(): GrokPricingProvider { if (_instance === undefined) { _instance = new GrokPricingProvider(); } return _instance;}
Expected output:GrokPricingProvider class implementing PricingProvider and a singleton createPricingProvider() factory at src/lib/pricing-provider.ts.
Step 6: Set up Langfuse observability
Langfuse traces every interaction in the voice agent pipeline. The initLangfuse function creates a singleton Langfuse client from environment variables, and the traceObservable helper wraps an async function in a Langfuse span — completing the span with the result on success or recording the error on failure.
Expected output:initLangfuse() and traceObservable() exported from src/lib/langfuse.ts.
Step 7: Wire the confidence router service
This service wraps @reaatech/confidence-router’s ConfidenceRouter and KeywordClassifier. It registers three keyword classifiers — booking, inquiry, and escalation — and exposes a classifyCallIntent method that returns a RoutingDecision indicating whether the router can confidently route the transcript or needs to ask clarifying questions.
Expected output:createIntentRouter() factory and IntentRouter type at src/services/confidence-router.service.ts.
Step 8: Wire the budget engine service
The budget engine wraps @reaatech/agent-budget-engine’s BudgetController. Every call session gets a defined budget (defaulting to 50 cents). Before an LLM call, preFlightCheck checks whether the estimated cost is within the remaining budget. After the call, recordSpend logs the actual tokens consumed. The controller fires "hard-stop" and "threshold-breach" events when spending approaches or exceeds the limit.
Expected output:createBudgetController() factory and CallBudgetManager type at src/services/budget-engine.service.ts.
Step 9: Wire the LLM cache service
The LLM cache wraps @reaatech/llm-cache’s CacheEngine with an in-memory storage, in-memory vector storage, and OpenAI embeddings for semantic similarity. It reduces Grok API costs by serving cached responses when the same (or semantically similar) prompt is repeated — for example, common questions about business hours or pricing.
Expected output:createLlmCache() factory and LlmCacheManager type at src/services/llm-cache.service.ts.
Step 10: Wire the agent handoff service with Twilio
When the voice agent determines a caller needs human attention, AgentHandoffService sends an SMS with a transcript summary via Twilio. It uses @reaatech/agent-handoff’s withRetry for resilient API calls and TypedEventEmitter for typed event observability (handoff:started, handoff:completed, handoff:failed).
Expected output:createAgentHandoffService() factory and AgentHandoffService type at src/services/agent-handoff.service.ts.
Step 11: Build the VoiceAgentOrchestrator
The orchestrator ties together every service module. It manages the call lifecycle: handleIncomingCall() initializes a session and defines a budget; processTranscript() classifies caller intent; generateAgentResponse() checks cache first, then budget, then calls Grok — storing the response in cache afterward; evaluateEscalation() decides whether the caller needs a human; and endSession() resets the budget.
typescript
// src/services/voice-agent.service.tsimport { BudgetScope } from "@reaatech/agent-budget-types";import { initLangfuse, traceObservable } from "../lib/langfuse.js";import type { GrokClient } from "../lib/grok.js";import type { IntentRouter } from "./confidence-router.service.js";import type { CallBudgetManager } from "./budget-engine.service.js";import type { LlmCacheManager } from "./llm-cache.service.js";import type { AgentHandoffService } from "./agent-handoff.service.js";import type { CallSession, CallIntent, BudgetRecord } from "../lib/types.js";export class VoiceAgentOrchestrator { private
Expected output:VoiceAgentOrchestrator class at src/services/voice-agent.service.ts.
Step 12: Create the LiveKit webhook route
When a call arrives, LiveKit dispatches an agent_dispatch event to this route. The handler authenticates the webhook using livekit-server-sdk’s WebhookReceiver, extracts the caller phone from metadata, wires up all six service modules, and boots a VoiceAgentOrchestrator instance.
Expected output:POST handler at app/api/webhook/voice/route.ts accepting LiveKit dispatch events.
Step 13: Create the Twilio status callback route
When Twilio delivers your escalation SMS, it sends a delivery-status webhook to this route. The handler validates the request signature using twilio.validateRequest() to prevent forgery.
Expected output:POST handler at app/api/twilio/callback/route.ts validating Twilio signatures.
Step 14: Create the barrel export
A central src/index.ts re-exports every service so consumers (or your test files) can import from a single location.
typescript
// src/index.tsexport { createGrokClient } from "./lib/grok.js";export type { GrokClient } from "./lib/grok.js";export { GrokPricingProvider, createPricingProvider } from "./lib/pricing-provider.js";export { initLangfuse, traceObservable } from "./lib/langfuse.js";export { createIntentRouter } from "./services/confidence-router.service.js";export type { IntentRouter } from "./services/confidence-router.service.js";export { createBudgetController } from "./services/budget-engine.service.js";export type { CallBudgetManager } from "./services/budget-engine.service.js";export { createLlmCache } from "./services/llm-cache.service.js";export type { LlmCacheManager } from "./services/llm-cache.service.js";export { createAgentHandoffService } from "./services/agent-handoff.service.js";export type { AgentHandoffService } from "./services/agent-handoff.service.js";export { VoiceAgentOrchestrator } from "./services/voice-agent.service.js";export type { CallSession, CallIntent, IntentLabel, HandoffRequest, VoiceAgentConfig, BudgetRecord } from "./lib/types.js";
Expected output:src/index.ts re-exporting all 6 service/layer modules and their types.
Step 15: Run the tests
The test suite uses Vitest with MSW for mocking external HTTP endpoints. It covers every service module and route handler with happy-path, error, and boundary tests, targeting 90%+ coverage across lines, branches, functions, and statements.
terminal
pnpm test
Expected output: All 99 tests pass with zero failures. Your terminal shows something like:
The test suite exercises: Grok client response parsing and error handling, pricing provider cost estimation, Langfuse singleton creation and span tracing, confidence router keyword classification and config updates, budget controller spend tracking and hard-stop events, LLM cache exact-match and segmentation, agent handoff SMS sending via MSW-mocked Twilio, voice agent orchestrator full flow (cache hit, budget block, escalation), and both webhook routes (valid dispatch, unauthorized, signature validation).
Next steps
Add a database adapter: Replace the in-memory cache and spend tracker with Redis or SQLite so state survives server restarts. @reaatech/llm-cache supports Redis and PostgreSQL adapters out of the box.
Extend intent classification: Add more keyword classifiers for industry-specific intents (return, quote, support ticket) or plug in an LLM-based classifier for higher accuracy on ambiguous transcripts.
Build a dashboard: Create a Next.js page that displays real-time call sessions, budget consumption, cache hit rates, and escalation history by querying Langfuse’s public API.
Add a webhook retry queue: Wrap the LiveKit dispatch handler in a Bull/BullMQ job queue so transient failures don’t drop incoming calls.
"You are a friendly AI receptionist for a small business. Greet the caller, understand their needs, answer questions about products/services, and help book appointments. Keep responses conversational and under 3 sentences.";