Skip to content
reaatechREAATECH

Files · Google Gemini Voice Agent for Twilio Call Handling

74 (1 binary, 635.3 kB total)attempt 1

README.md·3296 B·markdown
markdown
# Google Gemini Voice Agent for Twilio Call Handling
 
Handle inbound Twilio phone calls with a Gemini‑powered voice agent that understands speech and performs tasks like appointment booking or FAQ lookup.
 
## Prerequisites
 
- Node.js >= 22
- pnpm
- A Twilio account with a phone number
- A Deepgram account and API key
- An ElevenLabs account and API key
- A Google AI Studio project with Gemini API enabled
 
## Setup
 
1. `pnpm install`
2. Copy `.env.example` to `.env`
3. Fill in all API keys and configuration values
4. Run `pnpm dev` to start the dev server
5. Expose via ngrok: `ngrok http 3000`
6. Configure your Twilio phone number's Voice webhook to point to `https://<ngrok>/api/call`
 
## Architecture
 
```
Incoming Twilio Call → Webhook (POST /api/call) → TwiML with WebSocket URL
    → WebSocket established for Media Stream
    → Audio streamed via Deepgram STT (nova-3)
    → Transcript classified by Gemini (gemini-2.5-flash)
    → ConfidenceRouter decides: route / clarify / fallback
    → Intent handler executes (appointment booking or FAQ lookup)
    → Gemini generates response
    → ElevenLabs TTS (eleven_flash_v2_5) synthesizes audio
    → Audio streamed back to caller via Twilio
```
 
## Project structure
 
- `src/lib/config.ts` — Zod-validated configuration
- `src/lib/types.ts` — Shared types from `@reaatech/agent-mesh` + local types
- `src/services/gemini.ts` — Gemini LLM wrapper with intent classification
- `src/services/memory.ts` — Agent memory via `@reaatech/agent-memory`
- `src/services/router.ts` — Intent routing via `@reaatech/confidence-router`
- `src/services/cache.ts` — LLM response caching via `@reaatech/llm-cache`
- `src/services/twilio-call.ts` — Twilio telephony integration
- `src/services/audio.ts` — Deepgram STT + ElevenLabs TTS
- `src/services/observability.ts` — Langfuse tracing
- `src/integrations/calendar.ts` — Calendar appointment booking
- `src/agent/orchestrator.ts` — Central coordinator
- `src/api/call.ts` — Twilio webhook + media stream handler
- `app/api/call/route.ts` — Next.js API route for Twilio webhook
- `app/api/health/route.ts` — Health check endpoint
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `NODE_ENV` | Application environment (development/production) |
| `GOOGLE_API_KEY` | Gemini API key from Google AI Studio |
| `OPENAI_API_KEY` | OpenAI API key (fallback LLM) |
| `TWILIO_ACCOUNT_SID` | Twilio account SID |
| `TWILIO_AUTH_TOKEN` | Twilio auth token |
| `DEEPGRAM_API_KEY` | Deepgram speech-to-text API key |
| `ELEVENLABS_API_KEY` | ElevenLabs text-to-speech API key |
| `LANGFUSE_PUBLIC_KEY` | Langfuse observability public key |
| `LANGFUSE_SECRET_KEY` | Langfuse observability secret key |
| `LANGFUSE_HOST` | Langfuse API host URL |
| `TWILIO_PHONE_NUMBER` | Twilio phone number for inbound calls |
| `DEEPGRAM_MODEL` | Deepgram STT model (default: nova-3) |
| `ELEVENLABS_VOICE_ID` | ElevenLabs voice ID for TTS |
| `ELEVENLABS_MODEL_ID` | ElevenLabs TTS model (default: eleven_flash_v2_5) |
| `GEMINI_MODEL` | Gemini model (default: gemini-2.5-flash) |
| `CONFIDENCE_ROUTE_THRESHOLD` | Minimum confidence to route directly (default: 0.8) |
| `CONFIDENCE_FALLBACK_THRESHOLD` | Below this confidence, trigger fallback (default: 0.3) |