Vertex AI Voice Agent for Cal.com Appointment Scheduling
Let customers book appointments on Cal.com over the phone with a voice agent that understands natural language, verifies availability, and confirms bookings.
Service businesses miss after-hours calls and lose revenue because clients can't schedule appointments when staff are unavailable. Existing IVR systems feel robotic and fail to handle complex scheduling requests.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build a voice-powered appointment booking agent that connects a phone call to Cal.com through a series of AI services. When a caller says something like “I’d like to book an appointment for tomorrow at 2pm,” the agent transcribes their speech, classifies their intent, extracts the booking details, validates the payload, and creates the event in Cal.com over the phone. By the end, you’ll have an Express server that handles Twilio PSTN calls, streams audio to Deepgram for speech-to-text, passes transcripts through a PII guardrail, classifies intent with a confidence router, calls Gemini on Vertex AI for conversational reasoning, and books appointments via Cal.com’s REST API.
Prerequisites
Node.js 22 or later (check with node --version)
pnpm 10.x (npm install -g pnpm@10)
A Google Cloud project with Vertex AI enabled and a service account JSON key
A Twilio account with a voice-capable phone number
A Deepgram account (STT)
A Cartesia account (TTS)
A Cal.com account with OAuth2 developer credentials (client ID, client secret, private key)
Familiarity with TypeScript, Express, and REST APIs
Step 1: Initialize the project
Start with an empty directory and scaffold the project structure. The recipe uses pnpm workspaces and ESM modules.
terminal
mkdir vertex-ai-voice-agent && cd vertex-ai-voice-agentpnpm init
Add the required fields to package.json so Node 22 and ESM are active.
Install the full dependency list in one command. The REAA packages handle intent routing, guardrails, and budget enforcement; Twilio connects to PSTN; Deepgram and Cartesia handle speech; @google/genai drives Vertex AI; jose signs Cal.com JWTs.
Create src/calcom/client.ts. This module handles the OAuth2 client credentials flow using jose to sign a JWT assertion, then calls Cal.com’s REST API to get availability, create bookings, reschedule, and cancel.
ts
import * as jose from "jose";const CALCOM_TOKEN_URL = "https://api.caldav.com/oauth/token";interface AccessToken { token: string; expiresAt: number;}export class CalComService { private readonly clientId: string; private readonly clientSecret: string; private readonly apiBase: string; private cachedToken: AccessToken | null = null;
Step 7: Write the calendar repair service
Create src/repair/calendar.ts. This module wraps a Zod schema to validate and coerce Cal.com booking payloads. If Gemini produces slightly malformed JSON (wrong date format, missing leading zeros on time), the repair service fixes it before the API call.
ts
import { z } from "zod";export const CalendarPayloadSchema = z.object({ date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/, "Date must be YYYY-MM-DD"), time: z.string().regex(/^\d{2}:\d{2}$/, "Time must be HH:mm"), serviceId: z.string().min(1, "Service ID is required"), attendeeName: z.string().min(1, "Attendee name is required"), attendeeEmail: z.string().email("Invalid email address"), attendeePhone: z .string() .regex(/^\+[1-9]\d{1,14}$/, "Phone must be E.164 format"),});export type CalendarPayload = z.infer<typeof CalendarPayloadSchema>;export class CalendarRepairService { repairPayload(raw: unknown): CalendarPayload { if (raw === null || raw === undefined) { throw new Error("Payload is null or undefined"); } const result = CalendarPayloadSchema.safeParse(raw); if (!result.success) { const firstError = result.error.issues[0]; throw new Error( firstError ? `Calendar payload validation failed: ${firstError.message} at ${firstError.path.join(".")}` : "Calendar payload validation failed" ); } console.log("repairPayload: payload repaired successfully"); return result.data; } isValidPayload(raw: unknown): raw is CalendarPayload { return CalendarPayloadSchema.safeParse(raw).success; } getValidationErrors(raw: unknown): string[] { const result = CalendarPayloadSchema.safeParse(raw); if (result.success) return []; return result.error.issues.map( (issue) => `${issue.path.join(".")}: ${issue.message}` ); }}
Step 8: Write the intent router
Create src/routing/intent.ts. This module uses @reaatech/confidence-router-core and @reaatech/confidence-router-classifiers to classify caller transcripts into create_appointment, reschedule_appointment, cancel_appointment, or unknown. Keyword matching runs first; the DecisionEngine enforces a routing threshold of 0.8.
Create src/guardrails/pii.ts. This module uses @reaatech/guardrail-chain and @reaatech/guardrail-chain-guardrails to redact PII (emails, phone numbers, credit cards) and detect prompt injection from caller transcripts before they reach Gemini.
ts
import { GuardrailChain, generateCorrelationId,} from "@reaatech/guardrail-chain";import { PIIRedaction, PromptInjection, CachedGuardrail,} from "@reaatech/guardrail-chain-guardrails";export class GuardrailService { private readonly piiChain: GuardrailChain; private readonly injectionChain: GuardrailChain; constructor() { const piiRedaction = new PIIRedaction({ redactionStrategy: "mask" }); const cachedPII = new CachedGuardrail(piiRedaction, { ttlMs: 60_000 }); this.piiChain = new GuardrailChain({ budget: { maxLatencyMs: 100, maxTokens: 4000 }, }); this.piiChain.addGuardrail(cachedPII); this.injectionChain = new GuardrailChain({ budget: { maxLatencyMs: 50, maxTokens: 4000 }, }); this.injectionChain.addGuardrail(new PromptInjection()); } async redactPII(input: string): Promise<string> { if (!input || input.length === 0) { return input; } const opts = { sessionId: "default", correlationId: generateCorrelationId(), }; const piiResult = await this.piiChain.executeInput(input, opts); if (!piiResult.success) { return input .replace(/[\w.+-]+@[\w-]+\.[\w.-]+/g, "[REDACTED]") .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, "[REDACTED]"); } const redacted = piiResult.output as string; console.log("redactPII: PII redacted from input"); const injectionResult = await this.injectionChain.executeInput(redacted, opts); if (!injectionResult.success) { return "[Input not processed due to policy]"; } console.log("redactPII: method complete"); return injectionResult.output as string; } async isSafe(input: string): Promise<boolean> { const result = await this.injectionChain.executeInput(input, { sessionId: "default", correlationId: generateCorrelationId(), }); return result.success; }}
Step 10: Write the budget controller
Create src/budget.ts. This module uses @reaatech/agent-budget-engine to cap per-call spending. Before each Gemini call, checkBudget throws BudgetExceededError if the call has exceeded its budget. After each call, recordSpend updates the running total.
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { BudgetScope } from "@reaatech/agent-budget-types";export const MAX_CALL_COST_CENTS = Number(process.env.MAX_CALL_COST_CENTS ?? 50) / 100;// gemini-2.5-flash on Vertex AI: input ~$0.00025/1k, output ~$0.0005/1kexport class VertexPricingProvider { estimateCost( _modelId: string, estimatedInputTokens: number, ): number { const inputCost = (estimatedInputTokens / 1000)
Step 11: Write the Gemini service
Create src/ai/gemini.ts. This module wraps @google/genai’s GoogleGenAI class to call Gemini on Vertex AI with conversation history, function calling tools, and streaming. The generateContent method accepts optional tool definitions and returns both text and a functionCall object when Gemini invokes a tool.
Create src/telephony/twilio.ts. This is the core service that wires together all the pieces. It receives incoming Twilio voice calls, opens a WebSocket to stream audio, sends audio to Deepgram for transcription, passes transcripts through the guardrail and intent router, calls Gemini for conversational responses, uses the calendar repair service and Cal.com client for bookings, and streams TTS audio back to the caller via Cartesia.
ts
import type WebSocket from "ws";import twilioModule from "twilio";import VoiceResponse from "twilio/lib/twiml/VoiceResponse.js";import { DeepgramClient } from "@deepgram/sdk";import { Cartesia } from "@cartesia/cartesia-js";import type { GenerationRequest } from "@cartesia/cartesia-js/resources/tts/tts.js";import { GeminiService } from "../ai/gemini.js";import { IntentRouter } from "../routing/intent.js";import { GuardrailService } from "../guardrails/pii.js";import { CalendarRepairService } from "../repair/calendar.js";import { CalComService }
Step 13: Write the Express server
Create src/server.ts. This Express server registers the Twilio voice webhook at POST /voice/incoming, a call status webhook at POST /voice/status, a health check at GET /health, and a WebSocket server at /media-stream. The Twilio media stream WebSocket is where the TwilioService handles audio in real time.
Create app/api/health/route.ts. This route proxies the Express health endpoint, so the Next.js frontend can check whether the voice agent server is running.
ts
import { NextResponse } from "next/server";const EXPRESS_PORT = process.env.EXPRESS_PORT ?? 3001;export async function GET(): Promise<NextResponse> { try { const response = await fetch(`http://localhost:${EXPRESS_PORT}/health`); const data = (await response.json()) as { status: string; timestamp: string }; if (!response.ok || data.status === "error") { return NextResponse.json({ status: "degraded", message: "Express server error", timestamp: data.timestamp ?? new Date().toISOString(), }); } return NextResponse.json(data); } catch { return NextResponse.json({ status: "degraded", message: "Express server not reachable", timestamp: new Date().toISOString(), }); }}
Step 15: Run the tests
All external services are mocked in the test suite, so you can run the full test suite with coverage without any real API keys configured.
terminal
pnpm test
Expected output includes passing tests for the server routes (health, status, incoming call), the intent router, calendar repair, PII guardrails, budget enforcement, and the Twilio service. The coverage summary should show 90% or above on lines, branches, functions, and statements.
Step 16: Start the server
Start the Express server — it runs independently of the Next.js dev server for the voice agent.
terminal
pnpm start
The terminal logs [Server] Express listening on port 3001 and all four endpoints are active. In production you would configure Twilio to point your phone number’s voice webhook at https://your-domain.com/voice/incoming.
Next steps
Connect your Twilio phone number to the /voice/incoming webhook URL in the Twilio console and call it from a phone to hear the full voice loop in action.
Add Langfuse tracing by setting the LANGFUSE_* environment variables; the langfuse package is already installed and the Langfuse SDK can be initialized in the server to capture Gemini calls and guardrail results.
Replace the keyword-only intent classifier with the LLMClassifier from @reaatech/confidence-router-classifiers to handle natural phrasing that does not match the hardcoded keyword list.