OpenAI Voice Agent for Aircall Small Business Support
An AI-powered voice receptionist that answers calls on Aircall numbers, handles common inquiries, and escalates to human agents when needed, reducing hold times for SMBs.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an AI-powered voice receptionist for Aircall phone numbers using Next.js, Deepgram for speech transcription and synthesis, and OpenAI for natural language understanding. When a customer calls, the agent greets them, answers FAQs about hours, pricing, and services, and escalates to a human agent when it can’t help — reducing hold times and catching after-hours calls for small businesses.
Prerequisites
Node.js >= 22 with pnpm 10.x
An OpenAI API key (for voice NLU via gpt-5.2-mini)
A Deepgram API key (for STT and TTS)
An Aircall account with API credentials (key and secret) and a phone number
(Optional) A Langfuse account for tracing and cost observability
Familiarity with TypeScript and Next.js App Router basics
Step 1: Create the project scaffold
Create a new Next.js project and install all dependencies at exact pinned versions.
create-next-app generates package.json with next, react, and react-dom already listed. Add these recipe-specific dependencies at the exact versions shown (merge them into the existing dependencies and devDependencies sections):
Expected output: pnpm resolves all packages and writes a pnpm-lock.yaml with no version ranges.
Step 2: Configure environment variables
Create .env by copying .env.example and filling in your credentials:
env
# OpenAI SDK auth for voice NLUOPENAI_API_KEY=<your-openai-key># Deepgram auth for STT (speech-to-text) and TTS (text-to-speech) via REAA packagesDEEPGRAM_API_KEY=<your-deepgram-key># Aircall REST API basic-auth credentialsAIRACALL_API_KEY=<your-aircall-api-key>AIRACALL_API_SECRET=<your-aircall-api-secret>AIRACALL_BASE_URL=https://api.aircall.io/v1# Langfuse tracing (optional — omit to disable)LANGKFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGKFUSE_SECRET_KEY=<your-langfuse-secret-key># OpenTelemetry service name (used by @reaatech/voice-agent-core)OTEL_SERVICE_NAME=openai-voice-agent-aircall# Optional HMAC secret for Aircall webhook signature verification (set to "" to skip)AIRACALL_WEBHOOK_SECRET=<your-webhook-secret>
Step 3: Create shared domain types
Create src/types.ts with all the interfaces your system needs:
Create src/config.ts — it reads environment variables, validates the required ones, and returns a typed config object.
ts
import "dotenv/config";import pino from "pino";import type { VoiceAgentConfig } from "./types.js";export class ConfigurationError extends Error { name = "ConfigurationError";}export function loadConfig(): VoiceAgentConfig { const openaiApiKey = process.env.OPENAI_API_KEY; const deepgramApiKey = process.env.DEEPGRAM_API_KEY; const aircallApiKey = process.env.AIRACALL_API_KEY; const aircallApiSecret = process.env.AIRACALL_API_SECRET; if (!openaiApiKey) { throw new ConfigurationError("OPENAI_API_KEY is required"); } if (!deepgramApiKey) { throw new ConfigurationError("DEEPGRAM_API_KEY is required"); } if (!aircallApiKey) { throw new ConfigurationError("AIRACALL_API_KEY is required"); } if (!aircallApiSecret) { throw new ConfigurationError("AIRACALL_API_SECRET is required"); } const aircallBaseUrl = process.env.AIRACALL_BASE_URL || "https://api.aircall.io/v1"; return { openaiApiKey, deepgramApiKey, aircallApiKey, aircallApiSecret, aircallBaseUrl, langfusePublicKey: process.env.LANGKFUSE_PUBLIC_KEY || "", langfuseSecretKey: process.env.LANGKFUSE_SECRET_KEY || "", aircallWebhookSecret: process.env.AIRACALL_WEBHOOK_SECRET || "", };}export function createLogger(name: string) { return pino({ level: process.env.LOG_LEVEL ?? "info", name });}
Expected output: When you call loadConfig() with all env vars set, you get a VoiceAgentConfig object. Missing a required key throws ConfigurationError.
Step 5: Create the OpenAI NLU client
Create src/openai.ts. This client sends transcripts to OpenAI and decides what action the voice agent should take.
ts
import OpenAI from "openai";import pino from "pino";import type { NLUResponse } from "./types.js";import { repairLlmResponse } from "./repair.js";const SMB_SUPPORT_PROMPT = `You are an AI receptionist for a small business. Your job is to:1. Greet callers warmly and identify their needs2. Answer frequently asked questions about hours, pricing, and services3. Escalate to a human agent when you are unsure or the caller asks for something beyond your scopeYou must ALWAYS respond with valid JSON matching this schema:{ "action": "greet" | "answer" | "escalate" | "goodbye", "message": string, "escalationReason"?: string }- Use "greet" for initial greetings- Use "answer" when you can answer the question- Use "escalate" when you need to transfer to a human- Use "goodbye" to end the conversation`;export function createOpenAIClient(apiKey: string): OpenAI { return new OpenAI({ apiKey });}export class VoiceNLUClient { private client: OpenAI; private logger: pino.Logger; constructor(client: OpenAI, logger: pino.Logger) { this.client = client; this.logger = logger; } async decideAction( transcript: string, context: { callerName?: string; history: string[] } ): Promise<NLUResponse> { const messages: Array<{ role: "developer" | "user" | "assistant"; content: string }> = [ { role: "developer", content: SMB_SUPPORT_PROMPT }, ]; for (const entry of context.history) { messages.push({ role: "user", content: entry }); } messages.push({ role: "user", content: transcript }); try { const completion = await this.client.chat.completions.create({ model: "gpt-5.2-mini", messages, max_tokens: 256, temperature: 0.3, }); const rawContent = completion.choices[0]?.message?.content || ""; this.logger.info({ rawContent }, "decideAction raw response"); return await repairLlmResponse(rawContent); } catch (error) { this.logger.error({ error, transcript }, "decideAction failed"); return { action: "escalate", message: "Let me connect you with someone who can help.", escalationReason: "NLU service error", }; } } async generateReply( transcript: string, context: { callerName?: string; businessName: string } ): Promise<string> { const messages: Array<{ role: "developer" | "user"; content: string }> = [ { role: "developer", content: `You are a receptionist at ${context.businessName}. Provide a friendly, concise spoken reply.`, }, { role: "user", content: transcript }, ]; try { const completion = await this.client.chat.completions.create({ model: "gpt-5.2-mini", messages, max_tokens: 150, temperature: 0.5, }); return completion.choices[0]?.message?.content || ""; } catch (error) { this.logger.error({ error, transcript }, "generateReply failed"); return "I'm sorry, I'm having trouble responding right now."; } }}
Step 6: Create the structured output repair utility
Create src/repair.ts. LLMs sometimes wrap JSON in markdown fences or produce slightly malformed output — this utility fixes that using @reaatech/structured-repair-core.
ts
import { repair, isValid, UnrepairableError } from "@reaatech/structured-repair-core";import { z } from "zod";import type { NLUResponse } from "./types.js";export const nluResponseSchema = z.object({ action: z.enum(["greet", "answer", "escalate", "goodbye"]), message: z.string(), escalationReason: z.string().optional(),});export async function repairLlmResponse(raw: string): Promise<NLUResponse> { try { const data = await repair(nluResponseSchema, raw); return data; } catch (error) { if (error instanceof UnrepairableError) { return { action: "escalate", message: "Let me connect you with someone who can help.", escalationReason: "NLU parse failure", }; } throw error; }}export function validateLlmResponse(raw: string): boolean { return isValid(nluResponseSchema, raw);}
Step 7: Create the MCP adapter
Create src/voice/openai-mcp-adapter.ts. This bridges the voice pipeline with the OpenAI NLU client by implementing the MCPClient interface from @reaatech/voice-agent-core.
Create src/voice/agent.ts. This is the core wiring — it connects STT, TTS, session management, latency enforcement, and the OpenAI adapter into a single pipeline.
Create src/aircall/webhook-handler.ts. This class receives Aircall call events (created, answered, ended) and bridges them into the voice pipeline.
ts
import pino from "pino";import crypto from "crypto";import { z } from "zod";import type { AudioChunk } from "@reaatech/voice-agent-core";import type { AircallWebhookPayload, VoiceAgentConfig } from "../types.js";import { CallSessionService } from "../voice/session-service.js";const aircallEventSchema = z.object({ event: z.enum(["call.created", "call.answered", "call.ended"]), data: z.object({ call_id: z.string(), number_id: z.string(),
Step 11: Build the transfer handler
Create src/aircall/transfer.ts. When the AI decides to escalate, this class calls Aircall’s transfer API to hand the call to a human agent, with exponential retries via @reaatech/agent-handoff.
First, create src/instrumentation.ts. This runs when the Next.js server starts, loading config and initializing Langfuse only in the Node.js runtime (not Edge).
Add a web dashboard — build a real-time call dashboard showing active sessions, transfer requests, and per-call cost charts using the CallCostTracker data.
Extend the NLU prompt — add more business-specific FAQs (return policy, appointment booking, product catalog) to the SMB_SUPPORT_PROMPT constant in src/openai.ts.
Wire up audio streaming — implement real-time audio piping from Aircall through Deepgram STT in the webhook handler instead of just handling lifecycle events.