Google Gemini Voice Agent for Twilio Call Handling

Handle inbound Twilio phone calls with a Gemini‑powered voice agent that understands speech and performs tasks like appointment booking or FAQ lookup.

typescript google-gemini

The problem

Small businesses miss after‑hours calls and can’t afford a 24/7 receptionist. They need an automated phone system that understands natural language and completes tasks without costly human staffing.

Built from

Intro

This recipe builds a Google Gemini voice agent that handles inbound Twilio phone calls in real time. When a caller speaks, Deepgram transcribes their words, Gemini classifies the intent, a confidence router decides what to do, and ElevenLabs speaks the response back — all over a bidirectional WebSocket media stream. The agent handles two primary intents: appointment booking (checking calendar availability and confirming a slot) and FAQ lookup (grounding responses in conversation history). All calls are traced through Langfuse for observability, and repeated phrases are served from an LLM cache to reduce latency and cost.

Prerequisites

Node.js 22 or later
pnpm installed
A Twilio account with a phone number configured for webhook callbacks
A Google AI Studio API key for Gemini
A Deepgram API key for speech-to-text
An ElevenLabs API key for text-to-speech
An OpenAI API key for embedding (used by agent-memory and llm-cache)
Optional: a Langfuse account for observability tracing

Step 1: Configure environment variables

Copy the .env.example file to .env.local and fill in your credentials.

terminal

cp .env.example .env.local

Edit .env.local with your real values. These are the environment variables the config module reads at startup:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

172 kB·91 tests·99.0% coverage·vitest passing

SHA-256e98c91143282d919d8080117e2441ef60ae8a28f063344070ff6c2f6ccc84c07

Book a conversation All solutions

Comments

Loading comments…

import { GoogleGenAI } from "@google/genai" import type { ClassifierOutput } from "@reaatech/agent-mesh" import type { Config } from "../lib/config.js" export class GeminiServiceError extends Error { statusCode: number declare cause: unknown constructor(message: string, statusCode: number = 500, cause?: unknown) { super(message) this.name = "GeminiServiceError" this.statusCode = statusCode this.cause = cause } } export class GeminiService { private ai: GoogleGenAI private model: string constructor(config: Config) { this.ai = new GoogleGenAI({ apiKey: config.googleApiKey }) this.model = config.geminiModel } async generateResponse(prompt: string): Promise<string> { try { const response = await this.ai.models.generateContent({ model: this.model, contents: prompt, }) if (!response.text) { throw new GeminiServiceError("Empty response from Gemini", 500) } return response.text } catch (err: unknown) { if (err instanceof GeminiServiceError) throw err const e = err as { name?: string; message?: string; status?: number } throw new GeminiServiceError( e.message ?? "Gemini API error", e.status ?? 500, err, ) } } async classifyIntent(transcript: string, intentLabels: string[]): Promise<ClassifierOutput[]> { try { const declaration = { name: "classify_intent", parametersJsonSchema: { type: "object" as const, properties: { label: { type: "string" as const }, confidence: { type: "number" as const }, }, required: ["label", "confidence"], }, } const contents = `Classify this transcript into one of: ${intentLabels.join(", ")}. Transcript: ${transcript}` const response = await this.ai.models.generateContent({ model: this.model, contents, config: { tools: [{ functionDeclarations: [declaration] }], }, }) if (!response.functionCalls || response.functionCalls.length === 0) { return [{ agent_id: "general_query", confidence: 0, ambiguous: false, detected_language: "en", intent_summary: "", entities: {} }] } return response.functionCalls.map((fc) => ({ agent_id: (fc.args as { label?: string }).label ?? "general_query", confidence: (fc.args as { confidence?: number }).confidence ?? 0, ambiguous: false, detected_language: "en", intent_summary: "", entities: {}, })) } catch (err: unknown) { const e = err as { name?: string; message?: string; status?: number } throw new GeminiServiceError( e.message ?? "Gemini classification error", e.status ?? 500, err, ) } } estimateTokens(text: string): number { return Math.ceil(text.length / 4) } }

import { AgentMemory } from "@reaatech/agent-memory" import type { TurnEntry } from "@reaatech/agent-mesh" import type { Config } from "../lib/config.js" export class MemoryService { private memory: AgentMemory constructor(config: Config) { let instance: AgentMemory try { instance = new AgentMemory({ storage: { provider: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", apiKey: config.openaiApiKey, }, extraction: { llmProvider: {} as never, enabledTypes: [], batchSize: 0, confidenceThreshold: 0, }, tenantId: "voice-agent", ownerId: config.twilioPhoneNumber, }) } catch { instance = new AgentMemory({ storage: { provider: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", apiKey: config.openaiApiKey, }, extraction: { llmProvider: {} as never, enabledTypes: [], batchSize: 0, confidenceThreshold: 0, }, tenantId: "voice-agent", ownerId: config.twilioPhoneNumber, }) } this.memory = instance } async storeCallTurn(_callSid: string, entry: TurnEntry): Promise<void> { await this.memory.extractAndStore([ { speaker: entry.role === "agent" ? "agent" : "user", content: entry.content, timestamp: new Date(entry.timestamp) }, ]) } async getCallHistory(_callSid: string, limit?: number): Promise<TurnEntry[]> { const memories = await this.memory.retrieve(_callSid, { limit: limit ?? 10 }) return memories.map((m) => ({ role: (m.source as string) === "user_statement" ? ("user" as const) : ("agent" as const), content: m.content, timestamp: m.createdAt.toISOString(), })) } async getRelevantContext(query: string): Promise<string> { const memories = await this.memory.retrieve(query, { limit: 5 }) return memories.map((m) => m.content).join("\n") } async clearCallMemory(callSid: string): Promise<void> { void callSid await this.memory.close() } }

import { CacheEngine, InMemoryAdapter, OpenAIEmbedder } from "@reaatech/llm-cache" import type { Config } from "../lib/config.js" export class CacheService { private cache: CacheEngine constructor(config: Config) { this.cache = new CacheEngine({ storage: new InMemoryAdapter(), vectorStorage: new InMemoryAdapter(), embedder: new OpenAIEmbedder({ provider: "openai", model: "text-embedding-3-small", dimensions: 1536, apiKey: config.openaiApiKey, }), config: { storage: { adapter: "memory" }, vectorStorage: { adapter: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", dimensions: 1536, batchSize: 100, maxRetries: 3, }, similarity: { threshold: 0.85, metric: "cosine", maxResults: 5 }, ttl: { default: 3600, factual: 1800, creative: 7200, analytical: 3600, sensitive: 600, byUseCase: {}, }, segmentation: { enabled: true, defaultUseCase: "voice-agent" }, cost: { enabled: true, currency: "USD" }, observability: { metrics: true, tracing: false, logging: "info" }, }, }) } async getCachedResponse( prompt: string, model: string, ): Promise<{ hit: boolean; type?: string; entry?: { response: { answer: string } }; reason?: string }> { const result = await this.cache.get(prompt, { model, modelVersion: model, useCase: "voice-agent" }) if (result.hit) { const hitResult: { hit: true; type?: string; entry?: { response: { answer: string } }; reason?: string } = { hit: true, type: result.type, entry: result.entry as { response: { answer: string } }, } return hitResult } return { hit: false, reason: result.reason } } async setCachedResponse(prompt: string, response: string, model: string): Promise<void> { await this.cache.set( prompt, { answer: response }, { model, modelVersion: model, useCase: "voice-agent" }, { queryType: "factual" }, ) } async invalidateCache(): Promise<void> { await this.cache.invalidate({ useCase: "voice-agent" }) } }

import { DeepgramClient } from "@deepgram/sdk" import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js" import type { Config } from "../lib/config.js" export class AudioServiceError extends Error { provider: string statusCode?: number constructor(message: string, provider: string, statusCode?: number) { super(message) this.name = "AudioServiceError" this.provider = provider this.statusCode = statusCode } } export class AudioService { private deepgram: DeepgramClient private elevenlabs: ElevenLabsClient private config: Config constructor(config: Config) { this.config = config this.deepgram = new DeepgramClient({ apiKey: config.deepgramApiKey }) this.elevenlabs = new ElevenLabsClient({ apiKey: config.elevenlabsApiKey }) } connectDeepgramStream() { return this.deepgram.listen.v1.connect({ model: this.config.deepgramModel, language: "en", punctuate: "true", interim_results: "false", Authorization: `Token ${this.config.deepgramApiKey}`, }) } async synthesizeSpeech(text: string): Promise<Buffer> { try { const stream = await this.elevenlabs.textToSpeech.convert(this.config.elevenlabsVoiceId, { text, modelId: this.config.elevenlabsModelId, }) const reader = stream.getReader() const chunks: Uint8Array[] = [] for (;;) { const { done, value } = await reader.read() if (done) break chunks.push(value) } return Buffer.concat(chunks.map((c) => Buffer.from(c))) } catch (err: unknown) { if (err instanceof AudioServiceError) throw err throw new AudioServiceError( err instanceof Error ? err.message : "ElevenLabs TTS error", "elevenlabs", ) } } async synthesizeSpeechStream(text: string) { try { return await this.elevenlabs.textToSpeech.stream(this.config.elevenlabsVoiceId, { text, modelId: this.config.elevenlabsModelId, }) } catch (err: unknown) { if (err instanceof AudioServiceError) throw err throw new AudioServiceError( err instanceof Error ? err.message : "ElevenLabs TTS stream error", "elevenlabs", ) } } }

Google Gemini Voice Agent for Twilio Call Handling

The problem

Built from

Intro

Prerequisites

Step 1: Configure environment variables

Example artifact

Comments

Intro

Prerequisites

Step 1: Configure environment variables

Step 2: Understand the types and shared interfaces

Step 3: Build the config validation module

Step 4: Build the Gemini LLM service

Step 5: Build the memory, router, and cache services

Step 6: Build the Twilio, audio, and observability services

Step 7: Build the calendar integration

Step 8: Build the orchestrator

Step 9: Build the call handler and WebSocket server

Step 10: Wire the API route handlers

Step 11: Enable the instrumentation hook and start the server

Step 12: Run the tests

Next steps