Building Voice Agents for Small Business
A practical walkthrough of building, deploying, and monitoring a production AI voice agent — from first call to production, with real cost numbers and failure modes.
Why voice agents?
Most small businesses lose 20-40% of inbound calls. Staff are busy, it’s after hours, or the phone just rings. A virtual receptionist that answers every call, books appointments, and routes urgent issues pays for itself the first time it captures a lead that would have been lost.
This guide walks through building one with reaatech’s voice-agent platform.
Before you start
You’ll need:
- A Twilio account with a phone number
- An LLM API key (OpenAI, Anthropic, or any provider)
- Node.js 20+ and pnpm
No AI expertise required. If you can configure a webhook and edit a TypeScript config file, you can run this.
Step 1: Clone the repo
$ git clone https://github.com/reaatech/voice-agent$ cd voice-agent$ pnpm install$ cp .env.example .envStep 2: Configure your environment
Edit .env with your Twilio credentials and LLM provider:
TWILIO_ACCOUNT_SID=ACxxxx
TWILIO_AUTH_TOKEN=xxxx
TWILIO_PHONE_NUMBER=+15555550123
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-xxxx
You can use any OpenAI-compatible endpoint. The platform supports Anthropic, Groq, and local models too.
Step 3: Define your agent’s behavior
The agent’s behavior is defined in a simple TypeScript config. Here’s the default receptionist:
export const receptionistConfig = {
name: "receptionist",
greeting: "Thanks for calling. How can I help you today?",
systemPrompt: `You are a professional receptionist for a small business.
- Answer questions about hours, location, and services
- Book appointments using the calendar tool
- Route urgent calls to the right person
- Always be courteous and concise`,
tools: ["calendar", "transfer", "faq"],
maxTurns: 12,
idleTimeoutMs: 30_000,
};Step 4: Start the agent
$ pnpm devVoice agent running on http://localhost:3000Twilio webhook configured at /api/incoming-callStep 5: Point Twilio at your agent
In the Twilio console, set your phone number’s voice webhook to:
https://your-domain.com/api/incoming-call
Call your number. The agent picks up on the first ring.
What you’ll notice immediately
The agent handles the basics well: hours, location, services. It books appointments with surprisingly natural conversation flow. Callers who don’t realize they’re talking to AI say “thank you” and hang up satisfied.
What it won’t handle perfectly out of the box:
- Niche industry terminology (you’ll want to add to the FAQ tool)
- Multi-step edge cases (“I need to reschedule, then also ask about billing”)
- Callers who ramble for 90 seconds without stating their need
These improve as you add to the FAQ and tune the prompt.
Production considerations
Cost
At 500 calls/month with average 3-minute conversations, you’re looking at roughly $40-60/month in LLM costs plus Twilio’s per-minute rates. The DIY tier uses your own API keys — no platform tax.
Prompt injection
The platform includes prompt injection guards at the tool-use layer. If a caller says “ignore your previous instructions,” the guardrail intercepts it before it reaches the LLM. This is on by default.
Monitoring
Every call generates structured telemetry: duration, turns, tools used, confidence scores, cost breakdown. These flow to your observability stack via OTel. You’ll see if call quality degrades when you update the prompt or switch models.
Scaling
The DIY setup handles 5-10 concurrent calls on a $20/month VPS. Going beyond that is about horizontal scaling — the architecture is stateless at the application layer. State lives in the call context object, not in the process.
When to go from DIY to built-for-you
The DIY tier works well when:
- You have someone comfortable editing a TypeScript config file
- Your call volume is under 200/month
- Your use case fits the standard receptionist template
Consider having us build it when:
- You need custom tool integrations (your CRM, your calendar system)
- Your call volume is 500+/month and you need reliability guarantees
- You want someone else to own the initial setup and training
- You need HIPAA review or other compliance work
Ready to get this running?
Book a free first conversation. We'll figure out if there's a fit.
The architecture
The voice agent platform follows a pipeline architecture:
- Twilio Media Streams capture audio from the caller
- Speech-to-text (Deepgram or Whisper) converts audio to text
- LLM orchestration (the agent core) decides what to say or do
- Tool execution (calendar, transfer, FAQ lookup) when needed
- Text-to-speech (ElevenLabs or Deepgram) converts response to audio
- Response sent back through the media stream
Every step is instrumented. You can swap any component: different STT, different LLM, different TTS. The platform is a pipeline, not a monolith.
Common failure modes and how we handle them
These are real issues we’ve seen in production. The platform handles them out of the box.
Hallucinated appointments. The calendar tool validates dates and times at the API level before confirming with the caller. If the LLM generates a date that doesn’t exist, the tool returns an error and the agent gracefully recovers.
Infinite loops. The maxTurns config (default 12) prevents conversations from running forever. After the limit, the agent politely wraps up and offers to transfer to a human.
Silence handling. After 30 seconds of silence (idleTimeoutMs), the agent checks in: “Are you still there?” After 60 seconds, it offers to take a message and hangs up.
Accent and dialect. The STT pipeline uses the default Deepgram model which handles most accents well. If you serve a specific dialect community, you can swap to a tuned model with one config change.
Ready to try it?
The repo is at reaatech/voice-agent. It’s MIT-licensed. Clone it, configure it, and take your first call this afternoon.
If you’d rather have us build and deploy it, book a conversation. First one’s free.
Ready to make this real?
Book a free first conversation. Tell us what you're trying to figure out.
