Files · OpenAI Voice Agent for Aircall Small Business Support

73 (1 binary, 539.7 kB total)attempt 1
README.md·7261 B·markdown
markdown
# OpenAI Voice Agent for Aircall Small Business Support
 
> An AI-powered voice receptionist that answers calls on Aircall numbers, handles common inquiries, and escalates to human agents when needed, reducing hold times for SMBs.
 
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) ![Node)](https://img.shields.io/badge/node-%3E%3D22-brightgreen) ![pnpm)](https://img.shields.io/badge/pnpm-%3E%3D10-blue)
 
## Problem
 
Small businesses miss after-hours calls and struggle to handle peak-time call volumes, leading to lost revenue and customer frustration. Hiring a human receptionist around the clock is cost-prohibitive for most SMBs.
 
## Solution
 
This recipe builds an AI-powered voice receptionist that:
 
1. Receives Aircall webhook events (incoming, answered, ended)
2. Escalates to human agents via Aircall's transfer API when the AI can't resolve
3. Tracks per-call costs (STT, TTS, LLM) with detailed telemetry
4. Logs traces to Langfuse for observability
 
## Architecture
 
```
Aircall Phone → Aircall Webhook → Next.js API Route → voice-agent-core Pipeline
                                                          ├── Deepgram STT (speech-to-text)
                                                          ├── OpenAI NLU (decision/intent)
                                                          └── Deepgram TTS (text-to-speech)
                                → @reaatech/agent-handoff (escalation protocol)
                                → Aircall Transfer API (human agent handoff)
                                → @reaatech/llm-cost-telemetry (per-call cost tracking)
```
 
## Quick Start
 
```bash
pnpm install
cp .env.example .env     # fill in your API keys
pnpm dev                 # start Next.js dev server
pnpm test                # run tests with coverage
pnpm typecheck           # TypeScript type-checking
pnpm lint                # ESLint
```
 
## API Endpoints
 
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/aircall/webhook` | Receive Aircall call events (`call.created`, `call.answered`, `call.ended`) |
| `POST` | `/api/aircall/transfer` | Transfer an active call to a human agent |
| `GET` | `/api/health` | Health check |
 
### Webhook Payload
 
```json
{
  "event": "call.created",
  "data": {
    "call_id": "12345",
    "number_id": "n-1",
    "direction": "incoming",
    "status": "ringing",
    "caller_number": "+15551234567",
    "started_at": "2026-06-05T12:00:00Z"
  },
  "token": "webhook-token"
}
```
 
### Transfer Payload
 
```json
{
  "callId": "12345",
  "targetAgentId": "agent-uuid",
  "reason": "Customer billing inquiry — unable to resolve"
}
```
 
## Configuration
 
All configuration is via environment variables (see `.env.example`):
 
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key for voice NLU |
| `DEEPGRAM_API_KEY` | Yes | Deepgram API key for STT/TTS |
| `AIRACALL_API_KEY` | Yes | Aircall REST API key (basic auth) |
| `AIRACALL_API_SECRET` | Yes | Aircall REST API secret (basic auth) |
| `AIRACALL_BASE_URL` | No | Aircall API base URL (default: `https://api.aircall.io/v1`) |
| `LANGKFUSE_PUBLIC_KEY` | No | Langfuse public key for tracing |
| `LANGKFUSE_SECRET_KEY` | No | Langfuse secret key for tracing |
| `OTEL_SERVICE_NAME` | No | OpenTelemetry service name |
| `AIRACALL_WEBHOOK_SECRET` | No | HMAC secret for webhook signature verification |
 
## Voice Pipeline
 
1. **STT (Speech-to-Text)**: Deepgram Nova-2 via `@reaatech/voice-agent-stt` — transcribes caller audio in real-time (8kHz mulaw, interim results enabled)
2. **NLU (Natural Language Understanding)**: OpenAI `gpt-5.2-mini` via `@reaatech/voice-agent-core` pipeline — classifies intent as `greet`, `answer`, `escalate`, or `goodbye`
3. **TTS (Text-to-Speech)**: Deepgram Aura via `@reaatech/voice-agent-tts` — generates spoken responses with configurable voice and latency budgets
4. **Latency Budget**: End-to-end target of 800ms (hard cap 1200ms), with per-stage budgets (STT: 200ms, MCP: 400ms, TTS: 200ms)
 
## Escalation Flow
 
When the AI determines it cannot resolve the caller's issue:
 
1. The NLU returns `action: "escalate"` with a reason
2. `@reaatech/agent-handoff` protocol creates a handoff payload with conversation context
3. `AircallTransferHandler` calls `POST /v1/calls/{id}/transfer` to route to a human agent
4. Retry logic (`withRetry`, exponential backoff, max 3 attempts) handles transient failures
 
## Cost Tracking
 
Per-call costs are tracked via `@reaatech/llm-cost-telemetry` with the following rates:
 
| Service | Model | Rate |
|---------|-------|------|
| Deepgram STT | Nova-2 | $0.0059/min |
| Deepgram TTS | Aura | $0.000015/char |
| OpenAI NLU | gpt-5.2-mini | Per-token pricing |
 
## Project Layout
 
```
app/api/aircall/webhook/route.ts   — Aircall webhook receiver
app/api/aircall/transfer/route.ts  — Human agent transfer endpoint
app/api/health/route.ts            — Health check
src/types.ts                       — Shared domain types
src/config.ts                      — Configuration loader (dotenv + validation)
src/openai.ts                      — OpenAI VoiceNLUClient + SMB prompt
src/repair.ts                      — LLM output repair via structured-repair-core
src/voice/agent.ts                 — Voice agent pipeline assembly
src/voice/openai-mcp-adapter.ts    — MCP adapter wrapping VoiceNLUClient
src/voice/session-service.ts       — Call session management helpers
src/aircall/webhook-handler.ts     — Aircall event → pipeline bridge
src/aircall/transfer.ts            — Aircall transfer API + agent-handoff
src/cost/pricing.ts                — Per-call cost telemetry
src/langfuse.ts                    — Langfuse observability
src/instrumentation.ts             — Next.js instrumentation hook
tests/                             — Test suite (70 tests, ≥90% coverage)
```
 
## Packages Used
 
| Package | Version | Role |
|---------|---------|------|
| `@reaatech/voice-agent-core` | 0.1.0 | Pipeline orchestration, session management, latency enforcement |
| `@reaatech/voice-agent-stt` | 0.1.0 | Deepgram speech-to-text provider |
| `@reaatech/voice-agent-tts` | 0.1.0 | Deepgram text-to-speech provider |
| `@reaatech/agent-handoff` | 0.1.0 | Agent handoff protocol (types, config, retry) |
| `@reaatech/llm-cost-telemetry` | 0.1.0 | Cost span generation and telemetry types |
| `@reaatech/structured-repair-core` | 1.0.0 | LLM output repair for structured responses |
| `openai` | 6.42.0 | OpenAI SDK for voice NLU |
| `@deepgram/sdk` | 5.4.0 | Deepgram SDK (listed but REAA packages handle Deepgram natively) |
| `pino` | 10.3.1 | Structured JSON logging |
| `dotenv` | 17.4.2 | Environment variable loading |
| `zod` | 4.4.3 | Schema validation for webhooks and LLM output |
| `langfuse` | 3.38.20 | LLM observability and tracing |
 
## Testing
 
```bash
pnpm test              # 70 tests, ≥90% coverage on runtime code
pnpm typecheck         # TypeScript strict mode (NodeNext)
pnpm lint              # ESLint strictTypeChecked
```
 
Run preflight validation:
 
```bash
npx /home/rick/solutions-worker/bin/preflight.js
```
 
## License
 
MIT — see [LICENSE](./LICENSE).