Google Gemini Voice Agent for Clinic Appointment Scheduling

Answer calls, book appointments, and send SMS reminders for medical and dental clinics using a voice AI agent powered by Google Gemini.

google-gemini voice-agent twilio nextjs clinic-scheduling appointment-booking deepgram cartesia mcp

The problem

Small clinics miss after-hours calls and get overwhelmed during peak times, leading to lost appointments and patient frustration. Staff spend too much time on the phone instead of in-clinic care.

Built from

Intro

In this tutorial you’ll build a voice AI receptionist for medical and dental clinics using Google Gemini. The system answers real-time phone calls via Twilio, transcribes speech with Deepgram Nova-3, runs conversation logic through Gemini 2.5 Flash with tool calling, and speaks back through Cartesia Sonic-3.5 TTS. It connects to an EHR system for checking availability, booking appointments, and sending SMS confirmations and reminders. By the end you’ll have a fully tested Next.js project that wires the entire STT → LLM → TTS pipeline.

Prerequisites

Node.js 22+ and pnpm 10 installed on your machine
A Twilio account with a voice-enabled phone number (for telephony and SMS)
API keys from: Google Gemini, Deepgram, Cartesia, and Langfuse (optional)
Familiarity with TypeScript, Next.js App Router, and basic WebSocket concepts
An MCP-compatible EHR/calendar server endpoint (or you can mock one)

Step 1: Scaffold the project and pin dependencies

Start by creating a new Next.js project with App Router and TypeScript. Then install all the dependencies the voice pipeline needs.

terminal

pnpm create next-app@latest clinic-voice-agent --typescript --app --src-dir --eslint --import-alias "@/*"
cd clinic-voice-agent

Now open package.json and replace its contents with the exact-pinned dependencies shown below. Every version is pinned precisely (no or ) so builds are reproducible.

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

186 kB·155 tests·97.7% coverage·vitest passing

SHA-2565c215d2c8f7fc93fcd6e4a8ad66227d9d7e2acbcb200429df5aac0f16dc407ce

Book a conversation All solutions

Comments

Loading comments…

import { createPipeline, initializeSessionManager, createLatencyBudget, LatencyBudgetEnforcer, type Pipeline, type PipelineEvent, type STTProvider, type TTSProvider, type MCPClient as CoreMCPClient, type PipelineDependencies, type AudioChunk, } from '@reaatech/voice-agent-core'; import { clinicConfig } from '../lib/config.js'; import { traceTurn, traceLLMCall } from '../lib/observability.js'; export function createVoicePipeline( sttProvider: STTProvider, ttsProvider: TTSProvider, mcpClient: CoreMCPClient, ): ReturnType<typeof createPipelineBackend> { const sessionManager = initializeSessionManager({ defaultTTL: 1800, maxTurns: 20, maxTokens: 4096, }); const budget = createLatencyBudget({ target: 800, hardCap: 1200, stt: 200, mcp: 400, tts: 200, }); const latencyEnforcer = new LatencyBudgetEnforcer(budget); return createPipelineBackend({ sessionManager, latencyEnforcer, sttProvider, ttsProvider, mcpClient, config: clinicConfig, }); } function createPipelineBackend(deps: PipelineDependencies) { const pipeline: Pipeline = createPipeline(deps); pipeline.on('pipeline:stt:final', (event: PipelineEvent) => { traceTurn(event.sessionId, event.turnId ?? '', { stage: 'stt', payload: event.data }); }); pipeline.on('pipeline:mcp:response', () => { traceLLMCall('gemini-2.5-flash', 0, 0, 0); }); pipeline.on('pipeline:turn:end', (event: PipelineEvent) => { const eventData = event.data as Record<string, unknown>; const metrics = eventData.metrics as Record<string, unknown> | undefined; if (metrics) { traceTurn(event.sessionId, event.turnId ?? '', metrics); } }); pipeline.on('pipeline:error', (event: PipelineEvent) => { console.error('[pipeline:error]', event); }); async function startCallSession(sessionId: string) { await pipeline.startSession({ sessionId, status: 'active' }); } async function processInboundAudio(sessionId: string, chunk: AudioChunk) { await pipeline.processAudioChunk(sessionId, chunk); } async function endCallSession(sessionId: string) { await pipeline.endSession(sessionId); } function handleBargeIn(sessionId: string) { pipeline.bargeIn(sessionId); } function destroy() { pipeline.destroy(); } return { pipeline, startCallSession, processInboundAudio, endCallSession, handleBargeIn, destroy }; }

import { describe, it, expect, vi } from 'vitest'; import { EventEmitter } from 'events'; const mockPipeline = Object.assign(new EventEmitter(), { startSession: vi.fn().mockResolvedValue(undefined), processAudioChunk: vi.fn().mockResolvedValue(undefined), bargeIn: vi.fn(), endSession: vi.fn().mockResolvedValue(undefined), destroy: vi.fn(), }); vi.mock('@reaatech/voice-agent-core', () => ({ createPipeline: vi.fn().mockReturnValue(mockPipeline), initializeSessionManager: vi.fn().mockReturnValue({}), createLatencyBudget: vi.fn().mockReturnValue({}), LatencyBudgetEnforcer: vi.fn().mockImplementation(function () { return {}; }), defineConfig: vi.fn(), initializeObservability: vi.fn(), createCostTracker: vi.fn(), })); vi.mock('../../src/lib/config.js', () => ({ clinicConfig: {} })); vi.mock('../../src/lib/observability.js', () => ({ traceTurn: vi.fn(), traceLLMCall: vi.fn(), traceToolCall: vi.fn(), initializeObservability: vi.fn(), })); const { createVoicePipeline } = await import('../../src/services/pipeline-service.js'); describe('pipeline-service', () => { const sttProvider = { name: 'test-stt', connect: vi.fn(), streamAudio: vi.fn(), onUtterance: vi.fn(), onEndOfSpeech: vi.fn(), close: vi.fn() }; const ttsProvider = { name: 'test-tts', supportsStreaming: true, firstByteLatencyMs: null, synthesize: vi.fn(), connect: vi.fn(), cancel: vi.fn(), close: vi.fn() }; const mcpClient = { connect: vi.fn(), sendRequest: vi.fn(), close: vi.fn(), isConnected: vi.fn() }; it('createVoicePipeline returns pipeline object', () => { const result = createVoicePipeline(sttProvider, ttsProvider, mcpClient); expect(result).toBeDefined(); expect(result.startCallSession).toBeInstanceOf(Function); expect(result.endCallSession).toBeInstanceOf(Function); expect(result.handleBargeIn).toBeInstanceOf(Function); expect(result.destroy).toBeInstanceOf(Function); }); it('startCallSession calls pipeline.startSession', async () => { const { startCallSession } = createVoicePipeline(sttProvider, ttsProvider, mcpClient); await startCallSession('session-1'); expect(mockPipeline.startSession).toHaveBeenCalledWith({ sessionId: 'session-1', status: 'active' }); }); it('endCallSession on never-started session does not throw', async () => { mockPipeline.endSession.mockResolvedValueOnce(undefined); const { endCallSession } = createVoicePipeline(sttProvider, ttsProvider, mcpClient); await expect(endCallSession('unknown')).resolves.toBeUndefined(); }); it('destroy calls pipeline.destroy', () => { const { destroy } = createVoicePipeline(sttProvider, ttsProvider, mcpClient); destroy(); expect(mockPipeline.destroy).toHaveBeenCalled(); }); });

Google Gemini Voice Agent for Clinic Appointment Scheduling

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and pin dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and pin dependencies

Step 2: Set up the Next.js configuration

Step 3: Create the Zod-validated environment configuration

Step 4: Build the EHR adapter for clinic operations

Step 5: Build the Twilio SMS reminder service

Step 6: Build the Gemini-powered LLM service

Step 7: Create the speech-to-text and text-to-speech providers

Step 8: Build session management with memory storage

Step 9: Create the voice pipeline orchestrator

Step 10: Build the telephony and MCP client services

Step 11: Wire the WebSocket server and route handler

Step 12: Write and run the tests

Next steps