Small businesses using Azure AI services often lose control of per-model, per-session costs, especially when orchestrating multiple models for complex workflows. Unexpected overages hurt margins.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build a budget-controlled Azure AI chat server that prevents runaway spend across multiple models. You’ll use the REAA budget engine to check every chat request against per-scope spending limits, auto-downgrade to cheaper models when budgets tighten, and export cost telemetry to Langfuse via OpenTelemetry. By the end, you’ll have a working Next.js app with two API routes — one for budget-controlled chat and one for admin spend summaries — plus a sidecar health server for orchestration.
Prerequisites
Node.js >= 22 (the engines field in package.json enforces this)
pnpm 10.x (the project uses pnpm@10.0.0 as its package manager)
Azure OpenAI deployment — you need an endpoint URL, an API key, and a deployment name (default: gpt-4o)
A Langfuse account — for OpenTelemetry-based cost dashboards (optional but recommended)
Familiarity with TypeScript and Next.js App Router conventions
Step 1: Scaffold the project and install dependencies
Create a new Next.js project and install all the required dependencies. This project uses the Next.js App Router with the src/ directory layout and vitest for testing.
Next, install the runtime dependencies — the REAA budget packages, the Azure OpenAI SDK (via the openai npm package), Express for the health server, Zod for request validation, and Langfuse for observability:
Expected output: You should have a .env.local file with all variables filled in. The server reads AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT, and the Langfuse variables from the runtime environment.
Step 3: Define the TypeScript types
Create src/types.ts to hold the shared type definitions used across the project. These describe pricing, budget state, scope identifiers, and the Azure OpenAI configuration.
Expected output:src/types.ts exports interfaces for PricingProvider, AzureModelPricing, SpendSummary, AzureOpenAiConfig, BudgetPolicy, BudgetDefinition, ScopeIdentifier, and the BudgetScope constant map.
Step 4: Create the Azure pricing provider
The pricing provider estimates how much a model call will cost based on token counts and published Azure pricing. Create src/services/azure-pricing.ts:
ts
import type { PricingProvider } from "@reaatech/agent-budget-engine";import type { AzureModelPricing } from "../types.js";const DEFAULT_PRICING: AzureModelPricing[] = [ { modelId: "gpt-4o", inputPricePer1M: 2.50, outputPricePer1M: 10.00 }, { modelId: "gpt-4o-mini", inputPricePer1M: 0.15, outputPricePer1M: 0.60 }, { modelId: "gpt-4", inputPricePer1M: 30.00, outputPricePer1M: 60.00 }, { modelId: "gpt-4-turbo", inputPricePer1M: 10.00, outputPricePer1M: 30.00 },];export class AzurePricingProvider implements PricingProvider { private pricing: Map<string, AzureModelPricing>; constructor(overrides?: AzureModelPricing[]) { this.pricing = new Map(); for (const p of DEFAULT_PRICING) { this.pricing.set(p.modelId, p); } if (overrides) { for (const p of overrides) { this.pricing.set(p.modelId, p); } } } estimateCost(modelId: string, estimatedInputTokens: number, _provider?: string): number { void _provider; const pricing = this.pricing.get(modelId); if (!pricing) { const fallback = this.pricing.get("gpt-4o"); console.warn(`AzurePricingProvider: unknown model "${modelId}", falling back to gpt-4o pricing`); const fp = fallback ?? { inputPricePer1M: 2.50, outputPricePer1M: 10.00 }; const estimatedOutputTokens = Math.round(estimatedInputTokens * 0.5); return (estimatedInputTokens / 1_000_000) * fp.inputPricePer1M + (estimatedOutputTokens / 1_000_000) * fp.outputPricePer1M; } const estimatedOutputTokens = Math.round(estimatedInputTokens * 0.5); const inputCost = (estimatedInputTokens / 1_000_000) * pricing.inputPricePer1M; const outputCost = (estimatedOutputTokens / 1_000_000) * pricing.outputPricePer1M; return inputCost + outputCost; } getModelPricing(modelId: string): AzureModelPricing | undefined { return this.pricing.get(modelId); }}export function createDefaultPricingProvider(): AzurePricingProvider { return new AzurePricingProvider();}
This class implements the PricingProvider interface expected by @reaatech/agent-budget-engine. It stores a static pricing table for common Azure OpenAI models. For unknown models it falls back to gpt-4o pricing and logs a warning. The estimateCost method estimates output tokens at 50% of input tokens — a reasonable approximation for chat workloads.
Expected output: You can now import AzurePricingProvider and call estimateCost("gpt-4o", 1_000_000) which returns $7.50.
Step 5: Initialize the budget engine
Create src/middleware/budget.ts to set up the singleton budget engine instances. This is the central wiring point that connects the pricing provider, the spend tracker, the budget controller, and the router plugin.
The three exported singletons — spendStore, budgetController, and budgetStrategy — are used by the rest of the application. The SpendStore uses an in-memory circular buffer for O(1) spend lookups with no external database. The BudgetController performs pre-flight checks and state transitions (Active → Warned → Degraded → Stopped). The BudgetAwareStrategy filters model candidates by remaining budget.
Expected output: Importing from src/middleware/budget.js gives you ready-to-use instances of SpendStore, BudgetController, and BudgetAwareStrategy.
Step 6: Create the Express budget middleware bridge
Create src/middleware/budget-express.ts to expose the budget controller via Express-compatible middleware. The createBudgetMiddleware factory returns a middleware function that, when called with request headers, produces beforeStep / afterStep handler pairs. You can use this if you want to wire budget checks into a standalone Express server. For the Next.js route handler (next step), the project uses getBudgetContext from budget.ts directly.
extractScope reads budget scope from request headers with fallback defaults — it mirrors the logic in getBudgetContext from budget.ts but accepts an Express-style headers object.
Step 7: Create the Azure OpenAI service
Create src/services/azure-openai.ts to wrap the Azure OpenAI SDK behind a clean interface that returns text and token counts:
The service uses the openai SDK pointed at your Azure OpenAI endpoint. The chatCompletion method returns the response text plus token usage. Errors from the Azure API are wrapped in the typed AzureOpenAiError class that preserves the HTTP status code.
Step 8: Set up the Langfuse OTel bridge
Create src/services/langfuse-otel.ts to route GenAI span data to Langfuse via OpenTelemetry. A custom span processor forwards completed spans to the REAA SpanListener, which extracts budget-relevant attributes and records spend.
ts
import { SpanListener } from "@reaatech/agent-budget-otel-bridge";import { NodeTracerProvider, BatchSpanProcessor } from "@opentelemetry/sdk-trace-node";import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";import type { BudgetController } from "@reaatech/agent-budget-engine";import type { ReadableSpan, SpanProcessor } from "@opentelemetry/sdk-trace-node";import { resourceFromAttributes } from "@opentelemetry/resources";export type ControllerLike = Pick<BudgetController, "check" | "record" | "getState" | "defineBudget" | "listAll" | "on">;const ATTR_SERVICE_NAME = "service.name";let provider: NodeTracerProvider | undefined;let listener: SpanListener | undefined;let initialized = false;export const budgetProcessor: SpanProcessor = { onEnd(span: ReadableSpan): void { listener?.onSpanEnd(span.attributes); }, forceFlush: () => Promise.resolve(), shutdown: () => Promise.resolve(), onStart: () => { return; },};export function initLangfuseOtel(controller: ControllerLike): SpanListener { if (initialized && listener) return listener; initialized = true; const publicKey = process.env.LANGFUSE_PUBLIC_KEY ?? ""; const secretKey = process.env.LANGFUSE_SECRET_KEY ?? ""; const baseUrl = process.env.LANGFUSE_BASE_URL ?? "https://cloud.langfuse.com"; const serviceName = process.env.OTEL_SERVICE_NAME ?? "azure-ai-spend-control"; const controllerForBridge: unknown = controller; if (!publicKey || !secretKey) { console.warn("Langfuse credentials not configured — skipping OTel export"); listener = new SpanListener({ controller: controllerForBridge as BudgetController }); return listener; } const exporter = new OTLPTraceExporter({ url: `${baseUrl}/api/public/otel/v1/traces`, headers: { Authorization: `Basic ${Buffer.from(`${publicKey}:${secretKey}`).toString("base64")}`, }, }); provider = new NodeTracerProvider({ resource: resourceFromAttributes({ [ATTR_SERVICE_NAME]: serviceName, }), spanProcessors: [ new BatchSpanProcessor(exporter), budgetProcessor, ], }); provider.register(); listener = new SpanListener({ controller: controllerForBridge as BudgetController }); return listener;}export async function shutdownOtel(): Promise<void> { if (provider) { await provider.shutdown(); provider = undefined; initialized = false; }}
When Langfuse credentials are present in the environment, this module creates an OTLP exporter that sends traces to Langfuse’s public OTel ingestion endpoint. The SpanListener from @reaatech/agent-budget-otel-bridge monitors span attributes for budget-related fields (budget.scope_type, budget.scope_key, gen_ai.usage.input_tokens, etc.) and automatically records spend entries against the correct budget scope. When credentials are absent, it still creates a SpanListener — but in a no-op mode that logs a warning. This lets you develop and test locally without a Langfuse account.
Step 9: Build the chat API route
Create app/api/chat/route.ts — the main budget-controlled chat endpoint. Every request passes through the BudgetInterceptor from @reaatech/agent-budget-middleware for a pre-flight check before reaching Azure OpenAI.
ts
import { NextRequest, NextResponse } from "next/server";import { BudgetInterceptor } from "@reaatech/agent-budget-middleware";import { BudgetScope } from "@reaatech/agent-budget-types";import { budgetController } from "../../../src/middleware/budget.js";import { getBudgetContext } from "../../../src/middleware/budget.js";import { AzureOpenAiError, createAzureOpenAiService } from "../../../src/services/azure-openai.js";import { initLangfuseOtel } from "../../../src/services/langfuse-otel.js";import { z } from "zod";const interceptor = new BudgetInterceptor({ controller: budgetController });let otelInitialized = false;
The route handler performs these steps in order:
OTel initialization — lazily starts the Langfuse bridge on first request
Body parsing — requires valid JSON; returns 400 if malformed
Zod validation — requires prompt to be a non-empty string; returns 400 with issue messages on failure
Scope resolution — combines request body fields (scopeType, scopeKey) with x-budget-scope-type / x-budget-scope-key headers
Budget check — calls interceptor.beforeStep() which checks remaining budget, applies auto-downgrade rules, and filters expensive tools
LLM call — invokes Azure OpenAI with the model suggested by the interceptor
Cost recording — records actual spend via both interceptor.afterStep() and budgetController.record()
Response headers — attaches X-Budget-Remaining and X-Budget-Status to every success response
Error cases return HTTP 402 (budget exceeded), 503 (Azure OpenAI error), or 500 (generic error). Failed requests are still recorded (with cost 0) so you can audit blocked attempts.
Step 10: Build the admin spend API routes
Create app/api/admin/spend/route.ts for listing all budget scopes:
These routes give you visibility into every budget scope’s spending. The list endpoint returns all budgets; the detail endpoint adds per-minute spend rate and a projected hourly total from the SpendStore.
Note that Next.js 16 (this project uses 16.2.6) uses async params — the route handler destructures them with await params inside the function body.
Step 11: Create the health server
Create src/server.ts — a lightweight Express server with a /health endpoint and graceful OpenTelemetry shutdown:
ts
import express from "express";import { shutdownOtel } from "./services/langfuse-otel.js";export function createHealthApp(): express.Express { const app = express(); app.get("/health", (_req, res) => { res.json({ status: "ok" }); }); return app;}export function startHealthServer(port?: number): Promise<void> { return new Promise((resolve) => { const app = createHealthApp(); const serverPort = port ?? Number(process.env.HEALTH_PORT ?? 3001); const server = app.listen(serverPort, () => { console.log(`Health server listening on port ${String(serverPort)}`); resolve(); }); const onSignal = () => { console.log("Shutting down health server..."); shutdownOtel().then(() => { server.close(() => process.exit(0)); }).catch((err: unknown) => { console.error("Error during OTel shutdown:", err); server.close(() => process.exit(1)); }); }; process.on("SIGTERM", onSignal); process.on("SIGINT", onSignal); });}
This sidecar server is used for orchestration — container orchestrators (Kubernetes, etc.) can ping /health on port 3001 without triggering the budget engine.
Step 12: Write and run the tests
The project includes a comprehensive test suite covering the pricing provider, budget controller, Azure OpenAI service, Langfuse OTel bridge, route handlers, and integration flows. Here is one representative test — tests/services/azure-pricing.test.ts — that validates pricing calculations:
ts
import { describe, it, expect, vi, beforeEach, afterEach, type MockInstance } from "vitest";import { AzurePricingProvider } from "../../src/services/azure-pricing.js";describe("azure-pricing", () => { let warnSpy: MockInstance; beforeEach(() => { warnSpy = vi.spyOn(console, "warn").mockImplementation(() => {}); }); afterEach(() => { warnSpy.mockRestore(); }); it("estimateCost for gpt-4o: 1M input → $2.50 input + $5.00 estimated output = $7.50", () => { const provider = new AzurePricingProvider(); const cost = provider.estimateCost("gpt-4o", 1_000_000); expect(cost).toBe(7.5); }); it("estimateCost for gpt-4o-mini: 2K input estimates small output", () => { const provider = new AzurePricingProvider(); const cost = provider.estimateCost("gpt-4o-mini", 2_000); expect(cost).toBeCloseTo(0.0009, 4); }); it("unknown model falls back to gpt-4o pricing and logs warning", () => { const provider = new AzurePricingProvider(); const cost = provider.estimateCost("unknown-model", 1_000_000); expect(cost).toBe(7.5); expect(warnSpy).toHaveBeenCalledWith( expect.stringContaining("unknown-model"), ); }); it("zero tokens returns 0", () => { const provider = new AzurePricingProvider(); const cost = provider.estimateCost("gpt-4o", 0); expect(cost).toBe(0); }); // ... full suite has 7 tests covering overrides, overflow, getModelPricing});
Run the full test suite with coverage:
terminal
pnpm test
Expected output: All tests pass with coverage meeting the 90% threshold (lines, branches, functions, and statements) on runtime code under src/ and app/**/route.ts. Test files are located under tests/ and include:
UI files (.tsx) and Next.js boilerplate files (layout.ts, error.ts, loading.ts, not-found.ts) are excluded from coverage.
Step 13: Start the server and make a test request
Run the Next.js development server:
terminal
pnpm dev
The server starts on the port configured in PORT (default 3000). In a separate terminal, make a test request:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -H "x-budget-scope-type: user" \ -H "x-budget-scope-key: team-alpha" \ -d '{"prompt": "Hello, how do budgets work?"}'
If Azure OpenAI is configured, you’ll get a response like:
json
{ "reply": "Budgets help you control costs across models...", "cost": 0.000175, "modelId": "gpt-4o", "usage": { "inputTokens": 10, "outputTokens": 55 }}
Define budget limits programmatically — call defineBudget() at startup with scopeType, scopeKey, limit, and a policy object that sets softCap, hardCap, and optional autoDowngrade rules to downgrade expensive models when budgets approach exhaustion
Connect Langfuse dashboards — once LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set, every chat request generates OpenTelemetry spans visible in Langfuse’s cost dashboards, showing spend per model, per scope, and per deployment over time
Custom scope extractors for OTel — pass a scopeExtractor function to the SpanListener constructor to map custom span attributes (like myapp.user_id) to budget scopes, enabling cost attribution from your existing telemetry
function ensureOtel(): void {
if (!otelInitialized) {
initLangfuseOtel(budgetController);
otelInitialized = true;
}
}
const RequestSchema = z.object({
prompt: z.string().min(1, "prompt is required"),
scopeType: z.string().optional(),
scopeKey: z.string().optional(),
modelId: z.string().optional(),
tools: z.array(z.string()).optional(),
});
export async function POST(req: NextRequest): Promise<NextResponse> {
ensureOtel();
let body: unknown;
try {
body = (await req.json()) as Record<string, unknown>;