Files · OpenAI Voice Agent for Aircall Small Business Support
73 (1 binary, 539.7 kB total)attempt 1
README.md·7261 B·markdown
markdown
# OpenAI Voice Agent for Aircall Small Business Support
> An AI-powered voice receptionist that answers calls on Aircall numbers, handles common inquiries, and escalates to human agents when needed, reducing hold times for SMBs.
[](./LICENSE)  
## Problem
Small businesses miss after-hours calls and struggle to handle peak-time call volumes, leading to lost revenue and customer frustration. Hiring a human receptionist around the clock is cost-prohibitive for most SMBs.
## Solution
This recipe builds an AI-powered voice receptionist that:
1. Receives Aircall webhook events (incoming, answered, ended)
2. Escalates to human agents via Aircall's transfer API when the AI can't resolve
3. Tracks per-call costs (STT, TTS, LLM) with detailed telemetry
4. Logs traces to Langfuse for observability
## Architecture
```
Aircall Phone → Aircall Webhook → Next.js API Route → voice-agent-core Pipeline
├── Deepgram STT (speech-to-text)
├── OpenAI NLU (decision/intent)
└── Deepgram TTS (text-to-speech)
→ @reaatech/agent-handoff (escalation protocol)
→ Aircall Transfer API (human agent handoff)
→ @reaatech/llm-cost-telemetry (per-call cost tracking)
```
## Quick Start
```bash
pnpm install
cp .env.example .env # fill in your API keys
pnpm dev # start Next.js dev server
pnpm test # run tests with coverage
pnpm typecheck # TypeScript type-checking
pnpm lint # ESLint
```
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/aircall/webhook` | Receive Aircall call events (`call.created`, `call.answered`, `call.ended`) |
| `POST` | `/api/aircall/transfer` | Transfer an active call to a human agent |
| `GET` | `/api/health` | Health check |
### Webhook Payload
```json
{
"event": "call.created",
"data": {
"call_id": "12345",
"number_id": "n-1",
"direction": "incoming",
"status": "ringing",
"caller_number": "+15551234567",
"started_at": "2026-06-05T12:00:00Z"
},
"token": "webhook-token"
}
```
### Transfer Payload
```json
{
"callId": "12345",
"targetAgentId": "agent-uuid",
"reason": "Customer billing inquiry — unable to resolve"
}
```
## Configuration
All configuration is via environment variables (see `.env.example`):
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key for voice NLU |
| `DEEPGRAM_API_KEY` | Yes | Deepgram API key for STT/TTS |
| `AIRACALL_API_KEY` | Yes | Aircall REST API key (basic auth) |
| `AIRACALL_API_SECRET` | Yes | Aircall REST API secret (basic auth) |
| `AIRACALL_BASE_URL` | No | Aircall API base URL (default: `https://api.aircall.io/v1`) |
| `LANGKFUSE_PUBLIC_KEY` | No | Langfuse public key for tracing |
| `LANGKFUSE_SECRET_KEY` | No | Langfuse secret key for tracing |
| `OTEL_SERVICE_NAME` | No | OpenTelemetry service name |
| `AIRACALL_WEBHOOK_SECRET` | No | HMAC secret for webhook signature verification |
## Voice Pipeline
1. **STT (Speech-to-Text)**: Deepgram Nova-2 via `@reaatech/voice-agent-stt` — transcribes caller audio in real-time (8kHz mulaw, interim results enabled)
2. **NLU (Natural Language Understanding)**: OpenAI `gpt-5.2-mini` via `@reaatech/voice-agent-core` pipeline — classifies intent as `greet`, `answer`, `escalate`, or `goodbye`
3. **TTS (Text-to-Speech)**: Deepgram Aura via `@reaatech/voice-agent-tts` — generates spoken responses with configurable voice and latency budgets
4. **Latency Budget**: End-to-end target of 800ms (hard cap 1200ms), with per-stage budgets (STT: 200ms, MCP: 400ms, TTS: 200ms)
## Escalation Flow
When the AI determines it cannot resolve the caller's issue:
1. The NLU returns `action: "escalate"` with a reason
2. `@reaatech/agent-handoff` protocol creates a handoff payload with conversation context
3. `AircallTransferHandler` calls `POST /v1/calls/{id}/transfer` to route to a human agent
4. Retry logic (`withRetry`, exponential backoff, max 3 attempts) handles transient failures
## Cost Tracking
Per-call costs are tracked via `@reaatech/llm-cost-telemetry` with the following rates:
| Service | Model | Rate |
|---------|-------|------|
| Deepgram STT | Nova-2 | $0.0059/min |
| Deepgram TTS | Aura | $0.000015/char |
| OpenAI NLU | gpt-5.2-mini | Per-token pricing |
## Project Layout
```
app/api/aircall/webhook/route.ts — Aircall webhook receiver
app/api/aircall/transfer/route.ts — Human agent transfer endpoint
app/api/health/route.ts — Health check
src/types.ts — Shared domain types
src/config.ts — Configuration loader (dotenv + validation)
src/openai.ts — OpenAI VoiceNLUClient + SMB prompt
src/repair.ts — LLM output repair via structured-repair-core
src/voice/agent.ts — Voice agent pipeline assembly
src/voice/openai-mcp-adapter.ts — MCP adapter wrapping VoiceNLUClient
src/voice/session-service.ts — Call session management helpers
src/aircall/webhook-handler.ts — Aircall event → pipeline bridge
src/aircall/transfer.ts — Aircall transfer API + agent-handoff
src/cost/pricing.ts — Per-call cost telemetry
src/langfuse.ts — Langfuse observability
src/instrumentation.ts — Next.js instrumentation hook
tests/ — Test suite (70 tests, ≥90% coverage)
```
## Packages Used
| Package | Version | Role |
|---------|---------|------|
| `@reaatech/voice-agent-core` | 0.1.0 | Pipeline orchestration, session management, latency enforcement |
| `@reaatech/voice-agent-stt` | 0.1.0 | Deepgram speech-to-text provider |
| `@reaatech/voice-agent-tts` | 0.1.0 | Deepgram text-to-speech provider |
| `@reaatech/agent-handoff` | 0.1.0 | Agent handoff protocol (types, config, retry) |
| `@reaatech/llm-cost-telemetry` | 0.1.0 | Cost span generation and telemetry types |
| `@reaatech/structured-repair-core` | 1.0.0 | LLM output repair for structured responses |
| `openai` | 6.42.0 | OpenAI SDK for voice NLU |
| `@deepgram/sdk` | 5.4.0 | Deepgram SDK (listed but REAA packages handle Deepgram natively) |
| `pino` | 10.3.1 | Structured JSON logging |
| `dotenv` | 17.4.2 | Environment variable loading |
| `zod` | 4.4.3 | Schema validation for webhooks and LLM output |
| `langfuse` | 3.38.20 | LLM observability and tracing |
## Testing
```bash
pnpm test # 70 tests, ≥90% coverage on runtime code
pnpm typecheck # TypeScript strict mode (NodeNext)
pnpm lint # ESLint strictTypeChecked
```
Run preflight validation:
```bash
npx /home/rick/solutions-worker/bin/preflight.js
```
## License
MIT — see [LICENSE](./LICENSE).