A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build a voice agent that answers phone calls, understands natural-language booking requests with xAI Grok, and creates jobs in Jobber — all in under 300 lines of TypeScript. You’ll wire together Deepgram for speech-to-text, ElevenLabs for text-to-speech, a confidence-based routing engine to handle ambiguity, and structured repair to recover from imperfect LLM output. By the end, a customer can call your Twilio number, say “Fix my AC next Tuesday,” and have a job appear in Jobber without anyone touching a keyboard.
Prerequisites
Node.js >= 22 and pnpm 10 installed
A Twilio account with a phone number that has Voice webhooks enabled
Deepgram API key (for STT)
ElevenLabs API key (for TTS)
xAI API key (for Grok)
A Jobber account with an OAuth app (client ID + client secret)
(Optional) A Langfuse account for observability
Familiarity with TypeScript, Next.js App Router, Express, and WebSocket basics
Step 1: Scaffold the project and configure environment variables
Start by creating package.json with every dependency pinned to an exact version. The project uses Next.js 16 App Router, Express for the voice webhook server, and eight @reaatech/* packages for the voice agent pipeline.
Expected output: pnpm creates node_modules/ and a pnpm-lock.yaml with every dependency pinned to the exact versions above.
Step 2: Enable the Next.js instrumentation hook
Create next.config.ts. The recipe starts an Express server at boot time using Next.js instrumentation, which requires the experimental.instrumentationHook flag:
The key is instrumentationHook — a common mistake is writing clientInstrumentationHook or instrumentation, neither of which works. Without this flag, the register() function in src/instrumentation.ts is dead code.
Step 3: Define the shared types
Create src/lib/types.ts with the enums and interfaces that flow through every layer of the voice agent:
Expected output: These types are imported by every downstream module. JobberCredentials tracks OAuth 2.0 token expiry, JobberJobInput is the shape you’ll POST to the Jobber API, and VoiceSessionContext carries the state of a single phone call’s booking flow.
Step 4: Create the conversation session manager
Create src/lib/session.ts. The session manager maintains multi-turn conversation history — crucial because a caller might say “I need my AC fixed” on turn 1 and “Next Tuesday afternoon” on turn 2. You’ll use @reaatech/session-continuity with an in-memory storage adapter and a Tiktoken tokenizer.
ts
import { SessionManager, type Message, SessionNotFoundError } from "@reaatech/session-continuity";import { MemoryAdapter } from "@reaatech/session-continuity-storage-memory";import { TiktokenTokenizer } from "@reaatech/session-continuity-tokenizers";export function createConversationSessionManager(): SessionManager { return new SessionManager({ storage: new MemoryAdapter(), tokenCounter: new TiktokenTokenizer("gpt-4"), tokenBudget: { maxTokens: 4096, reserveTokens: 500, overflowStrategy: "compress", }, compression: { strategy: "sliding_window", targetTokens: 3500, }, sessionTTL: 3600, });}export async function addTurn( manager: SessionManager, sessionId: string, role: "user" | "assistant" | "system" | "tool", content: string): Promise<Message> { return manager.addMessage(sessionId, { role, content });}export async function getContext(manager: SessionManager, sessionId: string): Promise<Message[]> { return manager.getConversationContext(sessionId);}export async function endConversation(manager: SessionManager, sessionId: string): Promise<void> { await manager.endSession(sessionId);}export { SessionNotFoundError };
Expected output: The manager enforces a 4096-token budget per session, compresses history using a sliding window strategy, and expires sessions after one hour of inactivity.
Step 5: Build the Jobber API client
Create src/lib/jobber-client.ts. The JobberClient handles OAuth 2.0 client-credentials authentication, automatic token refresh when the access token is within 60 seconds of expiry, and job creation with exponential-backoff retry on 429 and 5xx errors.
ts
import type { JobberCredentials, JobberJobInput, JobberJobResponse } from "./types.js";export class JobberAuthError extends Error { constructor(message: string) { super(message); this.name = "JobberAuthError"; }}export class JobberApiError extends Error { statusCode: number; responseBody: string; constructor(statusCode: number, responseBody: string, message?: string) { super(message ?? `Jobber API error ${String(statusCode
Expected output: The client retries failed job creation requests on 429 and 5xx status codes with exponential backoff (1s, 2s, 4s) up to three attempts. Non-retriable errors (4xx except 429) throw immediately.
Step 6: Implement the confidence router for booking intent
Create src/lib/router.ts. The confidence router classifies the caller’s speech into booking-related intents using keyword matching, then routes based on confidence thresholds.
Expected output: When a caller says “fix my AC,” the router returns { type: "ROUTE", target: "book_repair" }. If they say “hello,” it returns { type: "CLARIFY", target: "book_repair" } (the configured fallback). Empty transcripts get { type: "FALLBACK", target: "empty_transcript" }.
Step 7: Wire up the voice agent core — STT, TTS, Grok, and the pipeline
Create src/lib/agent.ts. This is the heart of the recipe. It configures Deepgram for speech-to-text, ElevenLabs for text-to-speech, xAI Grok for conversational intelligence, and a latency budget that keeps the STT→MCP→TTS loop under 800ms.
First, the imports, schema, and system prompt:
ts
import { createPipeline, createLatencyBudget, initializeSessionManager, defineConfig, LatencyBudgetEnforcer, type MCPClient, type AgentResponse, type PipelineEvent } from "@reaatech/voice-agent-core";import { DeepgramSTTProvider } from "@reaatech/voice-agent-stt";import { ElevenLabsTTSProvider } from "@reaatech/voice-agent-tts";import { xai } from "@ai-sdk/xai";import { generateText } from "ai";import { repair, isValid } from "@reaatech/structured-repair-core";import { z } from "zod";import type { ConfidenceRouter } from "@reaatech/confidence-router";import type { JobberClient } from "./jobber-client.js";import type { SessionManager } from "@reaatech/session-continuity";import { addTurn, getContext } from "./session.js";export const JobberPayloadSchema = z.object({ title: z.string(), description: z.string(), customerName: z.string(), customerPhone: z.string().optional(), customerEmail: z.email().optional(), address: z.string().optional(), scheduledDate: z.string().optional(), confidence: z.number().min(0).max(1),});export type JobberPayload = z.infer<typeof JobberPayloadSchema>;export { isValid };const systemPromptText = `You are a voice agent for a field service booking system. Extract the customer's request as structured booking details.Respond ONLY with valid JSON matching this schema:{ "title": "short job title", "description": "detailed description of the issue", "customerName": "customer name if mentioned", "customerPhone": "customer phone if mentioned", "customerEmail": "customer email if mentioned", "address": "service address if mentioned", "scheduledDate": "preferred date if mentioned (ISO date)", "confidence": 0.0-1.0}`;
The GrokMCPClient class implements the MCPClient interface from @reaatech/voice-agent-core. Its sendRequest method loads conversation history, invokes xAI Grok for structured JSON output, repairs malformed responses with @reaatech/structured-repair-core, routes the result through the confidence router, and creates a Jobber job when confidence is high enough:
ts
export class GrokMCPClient implements MCPClient { constructor( private convSessionManager: SessionManager, private confidenceRouter: ConfidenceRouter, private jobberClient: JobberClient, private jobberClientId: string, private jobberClientSecret: string, ) {} async connect(): Promise<void> {} async sendRequest(params: { sessionId: string; turnId: string; utterance: string; history: Array<{ role: string; content: string }>; }): Promise<AgentResponse> { const startMs = Date.now(); const history = await getContext(this.convSessionManager, params.sessionId); const messages: Array<{ role: "user" | "assistant" | "system"; content: string }> = []; messages.push({ role: "system", content: systemPromptText }); for (const msg of history) { const role = msg.role === "user" ? "user" : "assistant"; messages.push({ role, content: typeof msg.content === "string" ? msg.content : JSON.stringify(msg.content) }); } messages.push({ role: "user", content: params.utterance }); const { text } = await generateText({ model: xai("grok-3"), messages, }); let payload: JobberPayload; try { payload = await repair(JobberPayloadSchema, text); } catch { payload = { title: "Field Service Request", description: params.utterance, customerName: "Unknown", confidence: 0.3 }; } await addTurn(this.convSessionManager, params.sessionId, "user", params.utterance); await addTurn(this.convSessionManager, params.sessionId, "assistant", JSON.stringify(payload)); const decision = await this.confidenceRouter.process(params.utterance); if (decision.type === "ROUTE" && payload.confidence >= 0.8) { try { const creds = await this.jobberClient.getAccessToken(this.jobberClientId, this.jobberClientSecret); const job = await this.jobberClient.createJob(creds, { title: payload.title, description: payload.description, customerName: payload.customerName, customerPhone: payload.customerPhone ?? "", customerEmail: payload.customerEmail, address: payload.address, scheduledDate: payload.scheduledDate, }); return { text: `Job created: ${job.id}. ${payload.title} scheduled.`, toolCalls: [], latencyMs: Date.now() - startMs }; } catch { return { text: "I'm sorry, I couldn't create the job in Jobber right now. Please try again later.", toolCalls: [], latencyMs: Date.now() - startMs }; } } if (decision.type === "CLARIFY") { const target = decision.target ?? "something"; return { text: `I'd like to clarify: are you looking to ${target}? Could you provide more details?`, toolCalls: [], latencyMs: Date.now() - startMs }; } return { text: "I'm not sure I understood. Let me transfer you to a human operator.", toolCalls: [], latencyMs: Date.now() - startMs }; } async close(): Promise<void> {}}
The factory functions wire up the STT/TTS providers, call-session manager, latency enforcer, and the full voice pipeline:
Expected output: The pipeline orchestrates the full STT→MCP→TTS lifecycle. Each stage has its own latency budget (STT: 200ms, MCP: 400ms, TTS: 200ms) with a total hard cap of 1200ms. Barge-in is enabled so the caller can interrupt the agent mid-speech.
Step 8: Create the Twilio voice webhook route
Create src/api/voice/route.ts. When a call arrives, Twilio POSTs form-encoded data to your webhook URL. This route handler creates a conversation session and returns TwiML that instructs Twilio to open a WebSocket Media Stream.
Expected output: A POST to the voice webhook endpoint with a valid CallSid returns TwiML that tells Twilio to connect to wss://<host>:3001/media. Missing CallSid returns a 400 JSON error. The handleVoiceWebhook export is used by the Express server (Step 10); the POST export is available as a Next.js route handler.
Step 9: Build the WebSocket server for media streams
Create src/server/websocket.ts. Twilio’s Media Streams protocol forwards raw audio packets over a WebSocket. The server creates a TwilioMediaStreamHandler from @reaatech/voice-agent-telephony, wires it to the voice pipeline, and sends TTS audio back to the caller.
Expected output: The WebSocket server listens on the port from WS_PORT (default 3001). When a call starts, it creates a session, pipes audio through the pipeline for speech recognition → Grok reasoning → TTS synthesis, and streams the synthesized speech back to the caller. Barge-in detection cancels any in-progress TTS and clears the audio buffer when the caller speaks over the agent.
Step 10: Assemble the Express server
Create src/server/jobber-glue.ts. The Express glue server mounts the voice webhook route, the health check, and the WebSocket server on a single HTTP server.
Expected output: The /api/voice/incoming POST route delegates to the handleVoiceWebhook function from Step 8, and WebSocket upgrades at /media are forwarded to the WebSocketServer. The server returns { app, server, wss, pipeline, sessionManager } for testing.
Step 11: Set up instrumentation with Langfuse and server startup
Create src/instrumentation.ts. Next.js instrumentation runs once at server startup. Here it initializes Langfuse observability (if configured) and starts the Express glue server.
ts
export async function register() { if (process.env.NEXT_RUNTIME === "nodejs") { if (process.env.LANGFUSE_SECRET_KEY) { const { default: Langfuse } = await import("langfuse"); const langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "", secretKey: process.env.LANGFUSE_SECRET_KEY, baseUrl: process.env.LANGFUSE_BASE_URL ?? "https://cloud.langfuse.com", }); langfuse.trace({ name: "voice-agent-register" }); } const { buildApplication } = await import("./server/jobber-glue.js"); const { server } = buildApplication(); const port = parseInt(process.env.WS_PORT ?? "3001", 10); const portStr = String(port); server.listen(port, () => { console.log(`Jobber glue server listening on port ${portStr}`); }); }}
The register() function guards itself with process.env.NEXT_RUNTIME === "nodejs" because Next.js runs instrumentation in both Node and Edge runtimes. Dynamic import() is used for modules that import Node-only APIs.
Expected output: When you run pnpm dev, the instrumentation starts the Express server on port 3001. If LANGFUSE_SECRET_KEY is set, it also initializes a Langfuse trace.
Step 12: Export the public API
Create src/index.ts so consumers can import everything from a single entry point:
ts
// Section 3 — typesexport { BookingIntent, type JobberCredentials, type JobberJobInput, type JobberJobResponse, type VoiceSessionContext } from "./lib/types.js";// Section 4 — jobber clientexport { createJobberClient, JobberAuthError, JobberApiError, isTokenExpired, type JobberClient } from "./lib/jobber-client.js";// Section 5 — routerexport { createBookingRouter, routeBookingIntent } from "./lib/router.js";// Section 6 — sessionexport { createConversationSessionManager, addTurn, getContext, endConversation, SessionNotFoundError } from "./lib/session.js";// Section 7 — agent pipelineexport { createCallSessionManager, createDeepgramSTT, createElevenLabsTTS, createLatencyEnforcer, GrokMCPClient, createVoicePipeline, JobberPayloadSchema, isValid, type JobberPayload } from "./lib/agent.js";// Section 8 — WebSocketexport { createMediaStreamWSS } from "./server/websocket.js";// Section 10 — glueexport { buildApplication } from "./server/jobber-glue.js";
Expected output: Importing from the package gives you access to all factory functions (createJobberClient, createBookingRouter, createConversationSessionManager, createVoicePipeline, buildApplication), error classes (JobberAuthError, JobberApiError, SessionNotFoundError), and types (JobberPayload, JobberCredentials, etc.).
Step 13: Add the App Router health check and verify the build
Create app/api/health/route.ts as a simple App Router route that confirms the Next.js side is alive:
ts
import { NextResponse } from "next/server";export function GET(): NextResponse { return NextResponse.json({ status: "ok" });}
Now run the type checker and linter:
terminal
pnpm typecheckpnpm lint
Expected output: Both commands exit 0. The type checker confirms that all imports from the @reaatech/* packages resolve correctly and that the generic MCPClient interface is satisfied by GrokMCPClient.
Step 14: Run the tests
The recipe includes a comprehensive test suite with mocked external services. Run it using the script defined in package.json:
terminal
pnpm test
Expected output: All tests pass. The suite covers:
Persist sessions to a database — Swap MemoryAdapter for a RedisAdapter or PostgresAdapter from @reaatech/session-continuity so sessions survive server restarts.
Add a web dashboard — Build a Next.js page under app/ that lists active calls and recently created jobs using getActiveSessionCount() and the Jobber API.
Expand the intent classifier — Add more keyword labels to the KeywordClassifier or replace it with an LLM-powered classifier from @reaatech/confidence-router for more nuanced routing.
Deploy to production — Configure a Twilio Voice webhook URL pointing to your deployed server, set the environment variables, and point a real phone number at it.
)
}`
);
this.name = "JobberApiError";
this.statusCode = statusCode;
this.responseBody = responseBody;
}
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));