Small business owners need immediate financial answers while on the go but can't log into Xero; calling an AI that reads their data aloud saves time and reduces errors.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a voice-enabled AI agent that small business owners can call to ask questions about their Xero financial data. After a Twilio phone number connects the call, audio streams through a WebSocket into a pipeline: Deepgram transcribes speech to text, a confidence router classifies the intent (profit-and-loss, invoices, cash flow, or balance sheet), the Xero SDK fetches the real data, Azure OpenAI formats it into natural spoken prose, and Cartesia reads it back through TTS. Every call is tracked with session continuity for follow-up questions, cost telemetry enforces daily budgets per tenant, and Langfuse provides observability.
Prerequisites
Node.js 22+ and pnpm 10+ installed
A Twilio phone number with Voice and Media Streams enabled
A Xero app (M2M client credentials grant type) in a Xero organisation
Deepgram API key (Nova-3 model)
Cartesia API key (Sonic 3.5 voice)
An Azure OpenAI resource with a chat deployment
A Langfuse project for tracing
Familiarity with TypeScript, Next.js App Router, and basic telephony concepts
Step 1: Scaffold the Next.js project
Start by scaffolding a Next.js 16+ project with the App Router. Add the exact-pinned dependencies you need. The package manager is pnpm.
Create next.config.ts with an empty config — Next.js 16 picks up sensible defaults automatically:
ts
import type { NextConfig } from "next";const nextConfig: NextConfig = {};export default nextConfig;
Expected output:pnpm install completes without errors and pnpm typecheck reports zero TypeScript errors.
Step 2: Create the environment file and shared types
Create .env.example with placeholders for every variable the agent will read:
env
# Env vars used by azure-ai-voice-agent-for-xero-small-business-financial-queries.# Keep placeholders only — never commit real values.NODE_ENV=development# Cartesia TTSCARTESIA_API_KEY=<your-cartesia-api-key># Deepgram STTDEEPGRAM_API_KEY=<your-deepgram-api-key># Azure OpenAIAZURE_OPENAI_API_KEY=<your-azure-openai-key>AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/AZURE_OPENAI_DEPLOYMENT_NAME=<your-deployment-name># TwilioTWILIO_ACCOUNT_SID=<your-twilio-account-sid>TWILIO_AUTH_TOKEN=<your-twilio-auth-token># XeroXERO_CLIENT_ID=<your-xero-client-id>XERO_CLIENT_SECRET=<your-xero-client-secret>XERO_TENANT_ID=<your-xero-tenant-id># Langfuse tracingLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.com# Session defaultsSESSION_TTL_SECONDS=3600MAX_TURNS_PER_SESSION=20# Cost budget (0 = unlimited)DAILY_BUDGET_USD=0# WebSocket server portWS_PORT=8080
Now create src/types.ts — this defines the shared domain types:
Expected output:pnpm typecheck exits cleanly. The Zod schema catches missing required env vars at runtime with a descriptive error message.
Step 4: Build the Xero financial data service
Create src/lib/xero.ts — a service class that authenticates with OAuth 2.0 client credentials, fetches invoices, profit-and-loss, bank summary, and balance sheet from the Xero Accounting API, and wraps errors with a custom error class:
ts
import { XeroClient } from "xero-node";import { retryWithBackoff } from "@reaatech/llm-cost-telemetry";import type { XeroConfig, FinancialData, XeroFinancialQuery } from "../types.js";export class XeroServiceError extends Error { constructor(message: string, public readonly cause?: unknown) { super(message); this.name = "XeroServiceError"; }}export class XeroService { private client: XeroClient; private config: XeroConfig; private
Expected output:pnpm typecheck passes. The class handles 401 re-authentication, 429 retry via retryWithBackoff, and empty result sets.
Step 5: Build the Azure OpenAI intent classifier and answer generator
Create src/lib/azure-openai.ts — this uses the OpenAI SDK pointed at your Azure OpenAI endpoint to classify financial queries and generate natural spoken answers:
ts
import OpenAI from "openai";import type { XeroFinancialQuery } from "../types.js";import { loadAppConfig } from "../config.js";export class AzureOpenAIError extends Error { constructor(message: string, public readonly cause?: unknown) { super(message); this.name = "AzureOpenAIError"; }}export class AzureOpenAIService { private client: OpenAI; private deploymentName: string; constructor
Expected output:pnpm typecheck passes. The service never throws — it returns fallback responses on any error.
Step 6: Build the voice session manager
Create src/services/session.ts — this wraps @reaatech/session-continuity’s SessionManager with an in-memory storage adapter and a simple character-based token counter:
ts
import { SessionManager, type HealthStatus, type IStorageAdapter, type Message, type Session, type TokenBudgetConfig, type TokenCounter, type CompressionConfig, SessionNotFoundError, TokenBudgetExceededError,} from "@reaatech/session-continuity";class InMemoryStorageAdapter implements IStorageAdapter { private sessions = new Map<string, Session>(); private messages = new Map<string, Message[]>(); private sessionIdCounter = 0;
Expected output:pnpm typecheck passes. The session manager compresses context automatically when token budget is exceeded.
Step 7: Build the intent classifier
Create src/services/intent-classifier.ts — this combines a keyword-based ConfidenceRouter from @reaatech/confidence-router with an Azure OpenAI fallback for ambiguous queries:
Expected output:pnpm typecheck passes. When DAILY_BUDGET_USD=0, all budget checks are skipped.
Step 11: Build the pipeline orchestrator
Create src/services/pipeline.ts — this wires STT, TTS, intent classification, Xero queries, and cost telemetry into a single createPipeline from @reaatech/voice-agent-core:
ts
import { createPipeline, createLatencyBudget, LatencyBudgetEnforcer, getDefaultSessionManager } from "@reaatech/voice-agent-core";import type { STTProvider, TTSProvider, MCPClient, AgentResponse, AudioChunk, Utterance } from "@reaatech/voice-agent-core";import { loadAppConfig, getXeroConfig, pipelineConfig } from "../config.js";import { VoiceSessionManager } from "./session.js";import { DeepgramSTTProvider } from "./deepgram-stt.js";import { CartesiaTTSProvider } from "./cartesia-tts.js";import { IntentClassifierService } from "./intent-classifier.js";import { XeroService } from "../lib/xero.js";import { AzureOpenAIService } from "../lib/azure-openai.js";import { CostTelemetryMiddleware } from "../middleware/cost.js";
Expected output:pnpm typecheck passes. The orchestrator coordinates the full classify-query-format-speak flow.
Step 12: Build the Twilio webhook validation, WebSocket server, and webhook route
Create src/lib/twilio-validate.ts:
ts
import twilio from "twilio";import type { NextRequest } from "next/server";export function validateTwilioRequest(req: NextRequest, params?: Record<string, string>): boolean { const authToken = process.env.TWILIO_AUTH_TOKEN; if (!authToken || authToken.length === 0) { console.warn("[twilio-validate] TWILIO_AUTH_TOKEN not set — skipping validation"); return true; } const signature = req.headers.get("X-Twilio-Signature"); if (!signature) return false; const url = req.url; return twilio.validateRequest(authToken, signature, url, params ?? {});}
Create src/services/websocket-server.ts:
ts
import WebSocket, { WebSocketServer } from "ws";import { createTwilioHandler } from "@reaatech/voice-agent-telephony";import type { TwilioMediaStreamHandler } from "@reaatech/voice-agent-telephony";import type { AudioChunk } from "@reaatech/voice-agent-core";import type { VoicePipelineOrchestrator } from "./pipeline.js";import { loadAppConfig } from "../config.js";export class TwilioMediaStreamServer { private wss: WebSocketServer | null = null; private orchestrator: VoicePipelineOrchestrator; private sessionToHandler = new
Create app/api/voice/webhook/route.ts — the Twilio incoming-call webhook that returns TwiML with a <Connect><Stream> block:
ts
import { type NextRequest, NextResponse } from "next/server";import { validateTwilioRequest } from "../../../../src/lib/twilio-validate.js";export async function POST(req: NextRequest): Promise<NextResponse> { try { const formData = await req.formData(); const params: Record<string, string> = {}; for (const [key, value] of formData.entries()) { if (typeof value === "string") params[key] = value; } if (!validateTwilioRequest(req, params)) { return NextResponse.json({ error: "invalid signature" }, { status: 403 }); } const callSid = formData.get("CallSid"); if (!callSid || typeof callSid !== "string") { const twiml = `<Response><Say>I'm sorry, there was a technical error. Goodbye.</Say><Hangup/></Response>`; return new NextResponse(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, }); } const host = req.headers.get("host") ?? "localhost:8080"; const wsUrl = `wss://${host}/voice/media-stream`; const twiml = `<Response><Connect><Stream url="${wsUrl}"/></Connect></Response>`; return new NextResponse(twiml, { status: 200, headers: { "Content-Type": "text/xml" }, }); } catch { return new NextResponse(null, { status: 200 }); }}export function GET(): NextResponse { return new NextResponse(null, { status: 200 });}
Expected output:pnpm typecheck passes. The route uses NextRequest/NextResponse, not bare Request/Response.
Step 13: Create instrumentation, main entry point, and run the test suite
Create src/instrumentation.ts — Next.js calls register() at startup:
ts
import { initializeObservability } from "@reaatech/voice-agent-core";import { Langfuse } from "langfuse";import { loadAppConfig } from "./config.js";import { VoicePipelineOrchestrator } from "./services/pipeline.js";import { TwilioMediaStreamServer } from "./services/websocket-server.js";let orchestrator: VoicePipelineOrchestrator | undefined;let langfuse: Langfuse | undefined;let wsServer: TwilioMediaStreamServer | undefined;export async function register(): Promise<void> { if (process.env.NEXT_RUNTIME !== "nodejs") return; try { const config = loadAppConfig(); await initializeObservability({ serviceName: "xero-voice-agent", enabled: true, serviceVersion: "0.1.0", otlpEndpoint: undefined }); langfuse = new Langfuse({ publicKey: config.langfusePublicKey, secretKey: config.langfuseSecretKey, baseUrl: config.langfuseHost, }); orchestrator = new VoicePipelineOrchestrator(); wsServer = new TwilioMediaStreamServer(orchestrator); wsServer.start(config.wsPort); console.log("[instrumentation] Voice agent services initialized"); } catch (err) { console.warn("[instrumentation] Degraded startup — some services may be unavailable:", err); }}export function getOrchestrator(): VoicePipelineOrchestrator | undefined { return orchestrator;}export function getLangfuse(): Langfuse | undefined { return langfuse;}
Replace src/index.ts with re-exports:
ts
export { VoicePipelineOrchestrator } from "./services/pipeline.js";export { XeroService } from "./lib/xero.js";export { AzureOpenAIService } from "./lib/azure-openai.js";export { IntentClassifierService } from "./services/intent-classifier.js";export { CostTelemetryMiddleware } from "./middleware/cost.js";export { CartesiaTTSProvider } from "./services/cartesia-tts.js";export { DeepgramSTTProvider } from "./services/deepgram-stt.js";export { TwilioMediaStreamServer } from "./services/websocket-server.js";export { VoiceSessionManager } from "./services/session.js";export type { XeroFinancialQuery, FinancialData, CallContext, AppConfig } from "./types.js";
Now create the test infrastructure. Start with the MSW server setup in tests/setup.ts:
Then create tests/helpers.ts for typed test factories:
ts
import type { FinancialData, CallContext } from "../src/types.js";import type { AudioChunk } from "@reaatech/voice-agent-core";import type { Message } from "@reaatech/session-continuity";export function makeCallContext(overrides?: Partial<CallContext>): CallContext { return { callSid: "CA-test-call-123", tenantId: "tenant-abc", sessionId: "session-xyz", ...overrides, };}export function makeFinancialData(overrides?: Partial<FinancialData>): FinancialData { return { queryType: "profit_and_loss", summary: "Revenue: 50000, Expenses: 32000, Net Profit: 18000", period: "current", ...overrides, };}export function makeAudioChunk(overrides?: Partial<AudioChunk>): AudioChunk { return { buffer: Buffer.alloc(160), sampleRate: 8000, encoding: "pcm", channels: 1, timestamp: Date.now(), ...overrides, };}export function makeMessage(overrides?: Partial<Message>): Message { return { id: "msg-test-1", sessionId: "session-xyz", role: "user", content: "What is my profit?", createdAt: new Date(), ...overrides, };}
Now run the full quality gate:
terminal
pnpm typecheckpnpm lintpnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output:pnpm typecheck — zero errors. pnpm lint — ESLint passes. pnpm vitest run --coverage — all tests pass (60+ tests), zero failures, and line/branch/function/statement coverage all at 90% or higher.
Next steps
Add a real database adapter — swap the InMemoryStorageAdapter in the session manager for a Postgres or Redis-backed IStorageAdapter to persist conversations across restarts.
Deploy with load balancing — run multiple WebSocket server instances behind an NGINX or Envoy proxy, using sticky sessions keyed on CallSid to keep a call pinned to one backend.
Add fraud detection — pipe DTMF digits and voice sentiment through a scoring model that auto-escalates high-risk calls to human agents via Twilio’s enqueue behaviour.
Support multi-tenant isolation — derive tenantId from the called Twilio number or a sub-account SID, and give each tenant its own budget cap in CostTelemetryMiddleware.
Replace mock MCP with a remote model context protocol server — the MCPClient in pipeline.ts currently runs inline; point it at a remote MCP server to offload LLM calls from the Next.js process.
initialized
=
false
;
constructor(config: XeroConfig) {
this.config = config;
this.client = new XeroClient({
clientId: config.clientId,
clientSecret: config.clientSecret,
grantType: "client_credentials",
});
}
async initialize(): Promise<void> {
try {
await this.client.getClientCredentialsToken();
this.initialized = true;
} catch (err) {
throw new XeroServiceError("Failed to initialize Xero client", err);
{ role: "system", content: "Classify the user's financial query into one of: profit_and_loss, invoices, cash_flow, balance_sheet, unknown. Respond in JSON with keys 'queryType' and 'confidence' (0-1)." },
{ role: "system", content: "You are a helpful financial assistant. Convert the financial data into natural, conversational speech. Use short sentences, pronounce numbers naturally (e.g., 'twelve thousand' instead of '12000'), and avoid markdown, JSON, or tables. This will be read aloud." },