Files · vLLM Voice Agent for After-Hours Small Business Support
69 (1 binary, 602.9 kB total)attempt 1
README.md·4350 B·markdown
markdown
# vLLM Voice Agent for After-Hours Small Business Support
> A self-hosted voice agent that answers after-hours calls using your own vLLM inference, with customizable workflows for appointment booking and FAQs.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade voice AI systems with the `@reaatech/*` package family.
## Overview
Small service businesses miss after-hours calls, losing customers because they can't afford a 24/7 receptionist. Existing AI voice solutions require expensive cloud LLM APIs and send sensitive call data off-site. This recipe solves that by running the entire voice pipeline on your infrastructure with vLLM serving the LLM brain on-premises.
## Architecture
```
Twilio PSTN → Express Webhook (TwiML) → WebSocket Media Stream → Deepgram STT
↓
vLLM (OpenAI SDK)
↓
Twilio PSTN ← Express Webhook (TwiML) ← WebSocket Media Stream ← Cartesia TTS
```
- **Voice Agent Core** (`@reaatech/voice-agent-core`) orchestrates the STT → MCP → TTS pipeline
- **Telephony** (`@reaatech/voice-agent-telephony`) handles Twilio Media Stream WebSocket protocol
- **STT** (`@reaatech/voice-agent-stt`) transcribes audio via Deepgram
- **TTS** (`@reaatech/voice-agent-tts`) synthesizes speech via Cartesia
- **Session Continuity** (`@reaatech/session-continuity`) manages conversation context with Redis storage
- **LLM** runs on vLLM via the OpenAI SDK chat completions endpoint
## Prerequisites
- Twilio account with a phone number capable of media streams
- Deepgram API key
- Cartesia API key
- Redis instance
- Langfuse account (for observability)
- A running vLLM server serving an OpenAI-compatible endpoint
## Environment Variables
| Variable | Description |
|---|---|
| `VLLM_ENDPOINT` | vLLM server URL (e.g. `http://localhost:8000/v1`) |
| `VLLM_API_KEY` | API key for vLLM (often empty for self-hosted) |
| `VLLM_MODEL` | Model name on vLLM |
| `DEEPGRAM_API_KEY` | Deepgram API key |
| `CARTESIA_API_KEY` | Cartesia API key |
| `TWILIO_ACCOUNT_SID` | Twilio account SID |
| `TWILIO_AUTH_TOKEN` | Twilio auth token |
| `TWILIO_PHONE_NUMBER` | Twilio phone number |
| `REDIS_URL` | Redis connection URL |
| `LANGFUSE_PUBLIC_KEY` | Langfuse public key |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key |
| `LANGFUSE_BASE_URL` | Langfuse base URL |
| `PORT` | Express server port (default 8080) |
| `WS_URL` | WebSocket URL for Twilio media streams |
## Quick Start
```bash
# Install dependencies
pnpm install
# Copy and fill in environment variables
cp .env.example .env
# Run the Express server
pnpm run dev
```
Configure your Twilio phone number's voice webhook URL to point to `https://your-server.com/api/twilio/voice`.
## API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/api/twilio/voice` | POST | Twilio voice webhook — returns TwiML with Connect/Stream |
| `/api/twilio/stream` | WebSocket | Twilio Media Stream — real-time audio processing |
| `/health` | GET | Health check |
## Testing
```bash
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
```
## Project layout
```
app/ Next.js App Router pages + API routes
src/
config.ts Zod-validated configuration
types.ts Shared TypeScript types
services/ Voice agent services
simple-token-counter.ts Token counter for SessionManager
vllm-client.ts OpenAI SDK wrapper for vLLM
redis-storage-adapter.ts IStorageAdapter over ioredis
intent-router.ts Intent classification service
appointment-scheduler.ts Appointment booking via Redis
faq-service.ts FAQ answering service
twiml-handler.ts TwiML generation and validation
voice-call-handler.ts Main pipeline orchestrator
server.ts Express server bootstrap
tests/ vitest suite (mirrors src/)
packages/ API references for every dependency
DEV_PLAN.md build plan for this recipe
```
## License
MIT — see [LICENSE](./LICENSE).