Skip to content
reaatech

Files · Azure AI Voice Agent for Xero Small Business Financial Queries

82 (1 binary, 714.8 kB total)attempt 2

README.md·5604 B·markdown
markdown
# Azure AI Voice Agent for Xero Small Business Financial Queries
 
> Call a number and ask about your Xero finances — get spoken P&L, invoices, and cash flow in natural language.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build a voice-enabled AI agent that connects Twilio telephony, Deepgram STT, Azure OpenAI, Xero accounting API, and Cartesia TTS into a single pipeline using the `@reaatech/*` package family.
 
## Architecture
 
```
Incoming Call → Twilio → WebSocket (Media Streams)
  → @reaatech/voice-agent-telephony (TwilioMediaStreamHandler)
    → @reaatech/voice-agent-core (Pipeline: STT → MCP → TTS)
      → Deepgram STT (speech-to-text)
        → Confidence Router (intent classification: P&L, invoices, cash flow)
          → Xero SDK (real-time financial data fetch)
            → Azure OpenAI (natural-language answer generation)
              → Cartesia TTS (text-to-speech)
                → Spoken response back through Twilio
```
 
## REAA Packages
 
| Package | Role |
|---------|------|
| `@reaatech/voice-agent-core` | Pipeline orchestration, latency enforcement, observability |
| `@reaatech/voice-agent-telephony` | Twilio Media Stream WebSocket handler (start/media/stop/DTMF) |
| `@reaatech/session-continuity` | Multi-turn conversation context with token budget enforcement |
| `@reaatech/llm-cost-telemetry` | Per-tenant LLM spend tracking and budget caps |
| `@reaatech/confidence-router` | Threshold-based intent classification (ROUTE / CLARIFY / FALLBACK) |
 
## Third-Party Integrations
 
- **Xero** — Financial data via `xero-node` SDK (client_credentials OAuth)
- **Twilio** — Phone call webhook + media stream WebSocket
- **Azure OpenAI** — NLU intent classification + spoken answer generation
- **Deepgram** — Real-time speech-to-text (nova-3 model)
- **Cartesia** — Text-to-speech (sonic-3.5 model)
- **Langfuse** — LLM tracing and observability
 
## Environment Variables
 
See `.env.example` for all required variables. Key ones:
 
- `CARTESIA_API_KEY`, `DEEPGRAM_API_KEY` — speech service credentials
- `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_DEPLOYMENT_NAME` — Azure OpenAI
- `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` — Twilio telephony
- `XERO_CLIENT_ID`, `XERO_CLIENT_SECRET`, `XERO_TENANT_ID` — Xero API
- `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY` — LLM tracing
- `DAILY_BUDGET_USD` — per-tenant daily Azure spend cap
- `WS_PORT` — WebSocket server port for Twilio media streams
 
## Setup Guide
 
### 1. Xero App
1. Create a free [Xero user account](https://www.xero.com/us/signup/api/)
2. Log into the [Xero Developer Dashboard](https://developer.xero.com/app/manage) and create an API application
3. Choose "Custom Connection" for M2M integration — set `grantType: "client_credentials"`
4. Copy the `Client Id` and `Client Secret` to your `.env` file
5. Note your `Xero Tenant Id` (found in organisation settings)
 
### 2. Twilio Phone Number
1. Sign up for [Twilio](https://www.twilio.com/try-twilio)
2. Buy a phone number with voice capabilities
3. Find your `Account SID` and `Auth Token` in the Twilio Console
4. Configure the voice webhook URL to `https://your-domain.com/api/voice/webhook`
 
### 3. Deepgram API Key
1. Sign up at [Deepgram Console](https://console.deepgram.com)
2. Generate an API key with STT access
3. Set `DEEPGRAM_API_KEY` in your `.env`
 
### 4. Cartesia API Key
1. Sign up at [Cartesia](https://cartesia.ai)
2. Generate an API key for TTS access
3. Set `CARTESIA_API_KEY` in your `.env`
 
### 5. Azure OpenAI Deployment
1. Create an [Azure OpenAI resource](https://portal.azure.com) in your Azure subscription
2. Deploy a model (e.g., `gpt-4` or `gpt-4o-mini`)
3. Note the endpoint URL and deployment name
4. Set `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, and `AZURE_OPENAI_DEPLOYMENT_NAME` in your `.env`
 
### 6. Langfuse Project
1. Sign up at [Langfuse](https://langfuse.com)
2. Create a new project
3. Copy the Public Key and Secret Key
4. Set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your `.env`
 
## Running Locally
 
```bash
pnpm install
pnpm typecheck      # TypeScript type checking
pnpm lint           # ESLint
pnpm test           # vitest run with coverage
pnpm dev            # next dev
```
 
## API Routes
 
| Route | Method | Description |
|-------|--------|-------------|
| `POST /api/voice/webhook` | POST | Twilio voice webhook — returns TwiML with `<Connect><Stream>` |
| `GET /api/voice/webhook` | GET | Health check (empty 200) |
 
## Project Layout
 
```
app/api/voice/webhook/route.ts    Twilio webhook route handler
src/lib/
  xero.ts                          Xero client wrapper
  azure-openai.ts                  Azure OpenAI NLU + answer generation
  twilio-validate.ts               Twilio webhook signature validation
src/services/
  session.ts                       Multi-turn session management
  intent-classifier.ts             Intent classification with ConfidenceRouter
  answer-formatter.ts              Financial data → spoken text
  cartesia-tts.ts                  Cartesia TTS adapter
  deepgram-stt.ts                  Deepgram streaming STT adapter
  pipeline.ts                      VoicePipelineOrchestrator
  websocket-server.ts              Twilio media stream WebSocket server
src/middleware/
  cost.ts                          Per-tenant LLM cost tracking
src/instrumentation.ts             Next.js instrumentation (service init)
tests/                             86 tests across 17 test files
```
 
## License
 
MIT — see [LICENSE](./LICENSE).