Skip to content
reaatechREAATECH

Files · Vertex AI Voice Agent for Cal.com Appointment Scheduling

45 (0 binary, 379.8 kB total)attempt 3

README.md·5408 B·markdown
markdown
# Vertex AI Voice Agent for Cal.com Appointment Scheduling
 
> Let customers book appointments on Cal.com over the phone with a voice agent that understands natural language, verifies availability, and confirms bookings.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Tech Stack
 
- **Runtime**: Node.js 22+, TypeScript 6
- **Framework**: Next.js 16 (App Router) + Express 5 (custom server)
- **Voice/Telephony**: Twilio (PSTN), Deepgram (STT), Cartesia (TTS)
- **AI**: Google Gemini via `@google/genai` (Vertex AI)
- **Intent Routing**: `@reaatech/confidence-router-core` + `@reaatech/confidence-router-classifiers`
- **Guardrails**: `@reaatech/guardrail-chain-guardrails` (PII redaction, prompt injection)
- **Budget**: `@reaatech/agent-budget-engine` + `@reaatech/agent-budget-spend-tracker`
- **Calendar**: Cal.com REST API + OAuth2 (JWT via `jose`)
- **Testing**: Vitest (v8 coverage, MSW), ESLint (typescript-eslint strict)
 
## Architecture
 
```
PSTN Call -> Twilio -> Express (/voice/incoming, /media-stream WebSocket)
                       |
                 Deepgram STT (WebSocket)
                       |
                 GuardrailChain (PII redaction)
                       |
                 IntentRouter (confidence-router -> DecisionEngine)
                       |
                 Gemini (Vertex AI, function calling)
                       |                |
              CalendarService    Cartesia TTS (WebSocket)
              (Cal.com API)           |
                                Twilio stream -> Caller
```
 
## Environment Variables
 
| Variable | Description | Example |
|----------|-------------|---------|
| `GOOGLE_CLOUD_PROJECT` | GCP project ID | `my-project-123` |
| `GOOGLE_CLOUD_LOCATION` | GCP region | `us-central1` |
| `GOOGLE_APPLICATION_CREDENTIALS` | Path to service account key | `/path/to/key.json` |
| `TWILIO_ACCOUNT_SID` | Twilio account SID | `ACxxxxxxxxxx` |
| `TWILIO_AUTH_TOKEN` | Twilio auth token | `xxxxxxxxxx` |
| `TWILIO_PHONE_NUMBER` | Twilio phone number (E.164) | `+15551234567` |
| `DEEPGRAM_API_KEY` | Deepgram API key | `xxxxxxxxxx` |
| `CARTESIA_API_KEY` | Cartesia API key | `xxxxxxxxxx` |
| `CALCOM_CLIENT_ID` | Cal.com OAuth2 client ID | `xxxxxxxxxx` |
| `CALCOM_CLIENT_SECRET` | Cal.com OAuth2 client secret | `xxxxxxxxxx` |
| `CALCOM_PRIVATE_KEY` | Cal.com JWT signing private key | `-----BEGIN PRIVATE KEY-----...` |
| `CALCOM_API_URL` | Cal.com API base URL | `https://api.cal.com` |
| `ANTHROPIC_API_KEY` | (optional) For LLMClassifier fallback | `sk-ant-...` |
| `OPENAI_API_KEY` | (optional) For LLMClassifier fallback | `sk-...` |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key | `sk-...` |
| `LANGFUSE_PUBLIC_KEY` | Langfuse public key | `pk-...` |
| `LANGFUSE_HOST` | Langfuse host URL | `https://cloud.langfuse.com` |
| `EXPRESS_PORT` | Express server port | `3001` |
| `MAX_CALL_COST_CENTS` | Per-call budget in cents | `50` |
 
## Getting API Keys
 
- **Twilio**: Sign up at [twilio.com](https://twilio.com), buy a voice-capable phone number
- **Deepgram**: Register at [deepgram.com](https://deepgram.com), get an API key with STT access
- **Cartesia**: Create an account at [cartesia.ai](https://cartesia.ai) and generate an API key
- **Google Cloud Vertex AI**: Enable Vertex AI API, create a service account, download JSON key
- **Cal.com OAuth2**: Create a developer client in Cal.com settings, generate a client ID/secret and a private key for JWT signing
 
## Conversation Flow
 
1. **Greeting**: Caller connects, agent plays welcome message via Cartesia TTS
2. **Intent Detection**: Caller speaks, audio is streamed to Deepgram STT, transcript is classified by IntentRouter (keyword -> optional LLM fallback)
3. **Slot Filling**: Gemini extracts appointment details (date, time, service) via function calling
4. **Booking Confirmation**: Cal.com API creates/reschedules/cancels the booking
5. **Summary**: Agent reads back confirmation, call ends
 
## Running locally
 
```bash
pnpm install
cp .env.example .env    # fill in your credentials
pnpm test               # vitest run with coverage (all external services mocked)
pnpm typecheck          # TypeScript type checking
pnpm lint               # ESLint
pnpm dev                # next dev (port 3000) + Express (port 3001) concurrently
pnpm start              # Express production server only
```
 
> **Note**: All external services (Twilio, Deepgram, Cartesia, Gemini, Cal.com) are mocked in tests. No real credentials needed to run the test suite.
 
## Project layout
 
```
app/                  Next.js App Router pages + API routes
src/                  services, lib, adapters
  ai/                 Gemini integration
  calcom/             Cal.com REST API client
  guardrails/         PII redaction + prompt injection
  repair/             Calendar payload validation (Zod)
  routing/            Intent classification
  telephony/          Twilio + Deepgram + Cartesia integration
  budget.ts           Per-call budget enforcement
  server.ts           Express server
  types.ts            Shared type definitions
tests/                vitest suite (mirrors src/)
packages/             API references for every dependency (read these first)
DEV_PLAN.md           build plan for this recipe
```
 
## License
 
MIT — see [LICENSE](./LICENSE).