Files · xAI Grok Voice Agent for After-Hours Customer Support
70 (1 binary, 576.8 kB total)attempt 1
README.md·4624 B·markdown
markdown
# xAI Grok Voice Agent for After-Hours Customer Support
> Deploy an AI receptionist powered by xAI Grok that answers calls, qualifies leads, and routes urgent issues—without 24/7 staffing.
Small businesses lose potential customers when calls go unanswered after hours. This recipe demonstrates how to build an AI-powered voice agent that uses xAI Grok for natural conversation, LiveKit for real-time media, Deepgram for speech recognition, and Cartesia for speech synthesis.
## Architecture
```
Caller → Twilio → LiveKit Room → Webhook → VoiceAgentOrchestrator
├── ConfidenceRouter (intent classification)
├── BudgetController (per-call spend limits)
├── LlmCache (semantic response cache)
└── AgentHandoff (Twilio SMS / callback)
```
The pipeline:
1. **LiveKit** dispatches an agent when a call comes in
2. **Deepgram** transcribes caller speech (via LiveKit plugin)
3. **xAI Grok** (via `@ai-sdk/xai`) generates natural responses
4. **Cartesia** synthesises speech output (via LiveKit plugin)
5. **ConfidenceRouter** classifies intent: inquiry, booking, or escalation
6. **BudgetController** enforces per-call spending limits
7. **LlmCache** caches common prompt-response pairs to reduce API cost
8. **AgentHandoff** triggers Twilio SMS/callbacks when human escalation is needed
## Prerequisites
- [xAI API key](https://console.x.ai) — `XAI_API_KEY`
- [LiveKit Cloud](https://cloud.livekit.io) account — `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`
- [Deepgram API key](https://console.deepgram.com) — `DEEPGRAM_API_KEY`
- [Cartesia API key](https://cartesia.ai) — `CARTESIA_API_KEY`
- [Twilio account](https://console.twilio.com) — `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, `TWILIO_PHONE_NUMBER`
- [Langfuse account](https://langfuse.com) — `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`
- [OpenAI API key](https://platform.openai.com/api-keys) — `OPENAI_API_KEY` (for semantic cache embeddings)
## Quick Start
```bash
cp .env.example .env
# Fill in your API keys in .env
pnpm install
pnpm dev
```
Then configure your LiveKit webhook to send `agent_dispatch` events to:
```
POST https://your-host/api/webhook/voice
```
## API Endpoints
| Method | Path | Description |
|--------|-------------------------|----------------------------------------------|
| POST | `/api/webhook/voice` | Receives LiveKit `agent_dispatch` webhooks |
| POST | `/api/twilio/callback` | Receives Twilio SMS delivery status callbacks |
## Project Structure
```
app/
api/
webhook/voice/route.ts LiveKit agent dispatch webhook
twilio/callback/route.ts Twilio status callback
src/
lib/
types.ts Shared types and Zod schemas
grok.ts xAI Grok client (Vercel AI SDK)
pricing-provider.ts PricingProvider for BudgetController
langfuse.ts Langfuse observability helpers
services/
voice-agent.service.ts Main orchestrator
confidence-router.service.ts Intent classification wrapper
budget-engine.service.ts Budget enforcement wrapper
llm-cache.service.ts Semantic cache wrapper
agent-handoff.service.ts Escalation / Twilio SMS wrapper
tests/ Vitest suite with MSW mocks
```
## Packages Used
| Package | Purpose |
|----------------------------------|--------------------------------|
| `@reaatech/confidence-router` | Intent classification routing |
| `@reaatech/agent-handoff` | Handoff protocol types/events |
| `@reaatech/agent-budget-engine` | LLM spending budget controller |
| `@reaatech/llm-cache` | Semantic + exact-match caching |
| `@livekit/agents` | Real-time voice agent pipeline |
| `@deepgram/sdk` | Speech-to-text |
| `@cartesia/cartesia-js` | Speech synthesis |
| `@ai-sdk/xai` | xAI Grok provider for AI SDK |
| `ai` | Vercel AI SDK (generateText) |
| `twilio` | SMS and callback notifications |
| `langfuse` | LLM observability / tracing |
| `livekit-server-sdk` | LiveKit server API + webhooks |
| `zod` | Schema validation |
## License
MIT — see [LICENSE](./LICENSE).