Files · Azure AI Voice Agent for Xero Small Business Financial Queries
82 (1 binary, 714.8 kB total)attempt 2
README.md·5604 B·markdown
markdown
# Azure AI Voice Agent for Xero Small Business Financial Queries
> Call a number and ask about your Xero finances — get spoken P&L, invoices, and cash flow in natural language.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build a voice-enabled AI agent that connects Twilio telephony, Deepgram STT, Azure OpenAI, Xero accounting API, and Cartesia TTS into a single pipeline using the `@reaatech/*` package family.
## Architecture
```
Incoming Call → Twilio → WebSocket (Media Streams)
→ @reaatech/voice-agent-telephony (TwilioMediaStreamHandler)
→ @reaatech/voice-agent-core (Pipeline: STT → MCP → TTS)
→ Deepgram STT (speech-to-text)
→ Confidence Router (intent classification: P&L, invoices, cash flow)
→ Xero SDK (real-time financial data fetch)
→ Azure OpenAI (natural-language answer generation)
→ Cartesia TTS (text-to-speech)
→ Spoken response back through Twilio
```
## REAA Packages
| Package | Role |
|---------|------|
| `@reaatech/voice-agent-core` | Pipeline orchestration, latency enforcement, observability |
| `@reaatech/voice-agent-telephony` | Twilio Media Stream WebSocket handler (start/media/stop/DTMF) |
| `@reaatech/session-continuity` | Multi-turn conversation context with token budget enforcement |
| `@reaatech/llm-cost-telemetry` | Per-tenant LLM spend tracking and budget caps |
| `@reaatech/confidence-router` | Threshold-based intent classification (ROUTE / CLARIFY / FALLBACK) |
## Third-Party Integrations
- **Xero** — Financial data via `xero-node` SDK (client_credentials OAuth)
- **Twilio** — Phone call webhook + media stream WebSocket
- **Azure OpenAI** — NLU intent classification + spoken answer generation
- **Deepgram** — Real-time speech-to-text (nova-3 model)
- **Cartesia** — Text-to-speech (sonic-3.5 model)
- **Langfuse** — LLM tracing and observability
## Environment Variables
See `.env.example` for all required variables. Key ones:
- `CARTESIA_API_KEY`, `DEEPGRAM_API_KEY` — speech service credentials
- `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_DEPLOYMENT_NAME` — Azure OpenAI
- `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` — Twilio telephony
- `XERO_CLIENT_ID`, `XERO_CLIENT_SECRET`, `XERO_TENANT_ID` — Xero API
- `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY` — LLM tracing
- `DAILY_BUDGET_USD` — per-tenant daily Azure spend cap
- `WS_PORT` — WebSocket server port for Twilio media streams
## Setup Guide
### 1. Xero App
1. Create a free [Xero user account](https://www.xero.com/us/signup/api/)
2. Log into the [Xero Developer Dashboard](https://developer.xero.com/app/manage) and create an API application
3. Choose "Custom Connection" for M2M integration — set `grantType: "client_credentials"`
4. Copy the `Client Id` and `Client Secret` to your `.env` file
5. Note your `Xero Tenant Id` (found in organisation settings)
### 2. Twilio Phone Number
1. Sign up for [Twilio](https://www.twilio.com/try-twilio)
2. Buy a phone number with voice capabilities
3. Find your `Account SID` and `Auth Token` in the Twilio Console
4. Configure the voice webhook URL to `https://your-domain.com/api/voice/webhook`
### 3. Deepgram API Key
1. Sign up at [Deepgram Console](https://console.deepgram.com)
2. Generate an API key with STT access
3. Set `DEEPGRAM_API_KEY` in your `.env`
### 4. Cartesia API Key
1. Sign up at [Cartesia](https://cartesia.ai)
2. Generate an API key for TTS access
3. Set `CARTESIA_API_KEY` in your `.env`
### 5. Azure OpenAI Deployment
1. Create an [Azure OpenAI resource](https://portal.azure.com) in your Azure subscription
2. Deploy a model (e.g., `gpt-4` or `gpt-4o-mini`)
3. Note the endpoint URL and deployment name
4. Set `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, and `AZURE_OPENAI_DEPLOYMENT_NAME` in your `.env`
### 6. Langfuse Project
1. Sign up at [Langfuse](https://langfuse.com)
2. Create a new project
3. Copy the Public Key and Secret Key
4. Set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your `.env`
## Running Locally
```bash
pnpm install
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
pnpm test # vitest run with coverage
pnpm dev # next dev
```
## API Routes
| Route | Method | Description |
|-------|--------|-------------|
| `POST /api/voice/webhook` | POST | Twilio voice webhook — returns TwiML with `<Connect><Stream>` |
| `GET /api/voice/webhook` | GET | Health check (empty 200) |
## Project Layout
```
app/api/voice/webhook/route.ts Twilio webhook route handler
src/lib/
xero.ts Xero client wrapper
azure-openai.ts Azure OpenAI NLU + answer generation
twilio-validate.ts Twilio webhook signature validation
src/services/
session.ts Multi-turn session management
intent-classifier.ts Intent classification with ConfidenceRouter
answer-formatter.ts Financial data → spoken text
cartesia-tts.ts Cartesia TTS adapter
deepgram-stt.ts Deepgram streaming STT adapter
pipeline.ts VoicePipelineOrchestrator
websocket-server.ts Twilio media stream WebSocket server
src/middleware/
cost.ts Per-tenant LLM cost tracking
src/instrumentation.ts Next.js instrumentation (service init)
tests/ 86 tests across 17 test files
```
## License
MIT — see [LICENSE](./LICENSE).