Azure AI Spend Control for Multi-Model SMB Workflows

Real-time budget enforcement and cost telemetry for Azure AI deployments across multiple models, preventing runaway spend.

azure-ai cost-control budget-enforcement multi-model express langfuse open-telemetry

The problem

Small businesses using Azure AI services often lose control of per-model, per-session costs, especially when orchestrating multiple models for complex workflows. Unexpected overages hurt margins.

Built from

Intro

In this tutorial, you’ll build a budget-controlled Azure AI chat server that prevents runaway spend across multiple models. You’ll use the REAA budget engine to check every chat request against per-scope spending limits, auto-downgrade to cheaper models when budgets tighten, and export cost telemetry to Langfuse via OpenTelemetry. By the end, you’ll have a working Next.js app with two API routes — one for budget-controlled chat and one for admin spend summaries — plus a sidecar health server for orchestration.

Prerequisites

Node.js >= 22 (the engines field in package.json enforces this)
pnpm 10.x (the project uses pnpm@10.0.0 as its package manager)
Azure OpenAI deployment — you need an endpoint URL, an API key, and a deployment name (default: gpt-4o)
A Langfuse account — for OpenTelemetry-based cost dashboards (optional but recommended)
Familiarity with TypeScript and Next.js App Router conventions

Step 1: Scaffold the project and install dependencies

Create a new Next.js project and install all the required dependencies. This project uses the Next.js App Router with the src/ directory layout and vitest for testing.

terminal

npx create-next-app@latest azure-ai-spend-control --typescript --app --src-dir --no-tailwind --import-alias "@/*"

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

156 kB·79 tests·98.2% coverage·vitest passing

SHA-256bc4de0d685099cb5f47ab6447bd86a559dd92597c42dbcb8fdcf74bf02047c15

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 (the engines field in package.json enforces this)
pnpm 10.x (the project uses pnpm@10.0.0 as its package manager)
Azure OpenAI deployment — you need an endpoint URL, an API key, and a deployment name (default: gpt-4o)
A Langfuse account — for OpenTelemetry-based cost dashboards (optional but recommended)
Familiarity with TypeScript and Next.js App Router conventions

Step 1: Scaffold the project and install dependencies

Create a new Next.js project and install all the required dependencies. This project uses the Next.js App Router with the src/ directory layout and vitest for testing.

terminal

npx create-next-app@latest azure-ai-spend-control --typescript --app --src-dir --no-tailwind --import-alias "@/*"

import type { PricingProvider } from "@reaatech/agent-budget-engine"; import type { AzureModelPricing } from "../types.js"; const DEFAULT_PRICING: AzureModelPricing[] = [ { modelId: "gpt-4o", inputPricePer1M: 2.50, outputPricePer1M: 10.00 }, { modelId: "gpt-4o-mini", inputPricePer1M: 0.15, outputPricePer1M: 0.60 }, { modelId: "gpt-4", inputPricePer1M: 30.00, outputPricePer1M: 60.00 }, { modelId: "gpt-4-turbo", inputPricePer1M: 10.00, outputPricePer1M: 30.00 }, ]; export class AzurePricingProvider implements PricingProvider { private pricing: Map<string, AzureModelPricing>; constructor(overrides?: AzureModelPricing[]) { this.pricing = new Map(); for (const p of DEFAULT_PRICING) { this.pricing.set(p.modelId, p); } if (overrides) { for (const p of overrides) { this.pricing.set(p.modelId, p); } } } estimateCost(modelId: string, estimatedInputTokens: number, _provider?: string): number { void _provider; const pricing = this.pricing.get(modelId); if (!pricing) { const fallback = this.pricing.get("gpt-4o"); console.warn(`AzurePricingProvider: unknown model "${modelId}", falling back to gpt-4o pricing`); const fp = fallback ?? { inputPricePer1M: 2.50, outputPricePer1M: 10.00 }; const estimatedOutputTokens = Math.round(estimatedInputTokens * 0.5); return (estimatedInputTokens / 1_000_000) * fp.inputPricePer1M + (estimatedOutputTokens / 1_000_000) * fp.outputPricePer1M; } const estimatedOutputTokens = Math.round(estimatedInputTokens * 0.5); const inputCost = (estimatedInputTokens / 1_000_000) * pricing.inputPricePer1M; const outputCost = (estimatedOutputTokens / 1_000_000) * pricing.outputPricePer1M; return inputCost + outputCost; } getModelPricing(modelId: string): AzureModelPricing | undefined { return this.pricing.get(modelId); } } export function createDefaultPricingProvider(): AzurePricingProvider { return new AzurePricingProvider(); }

import OpenAI from "openai"; import type { AzureOpenAiConfig } from "../types.js"; export class AzureOpenAiError extends Error { constructor( message: string, public readonly statusCode: number, ) { super(message); this.name = "AzureOpenAiError"; } } type ChatMessage = OpenAI.Chat.ChatCompletionMessageParam; export class AzureOpenAiService { private client: OpenAI; private deployment: string; constructor(config: AzureOpenAiConfig) { if (!config.endpoint || !config.apiKey) { throw new AzureOpenAiError("AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY are required", 0); } this.deployment = config.deployment; this.client = new OpenAI({ apiKey: config.apiKey, baseURL: `${config.endpoint}/openai/deployments/${config.deployment}`, defaultQuery: { "api-version": config.apiVersion }, defaultHeaders: { "api-key": config.apiKey }, }); } async chatCompletion(args: { deployment?: string; messages: Array<{ role: string; content: string }>; tools?: string[]; }): Promise<{ text: string; usage: { promptTokens: number; completionTokens: number } }> { const deployment = args.deployment ?? this.deployment; try { const response = await this.client.chat.completions.create({ model: deployment, messages: args.messages as ChatMessage[], }); const choice = response.choices[0] as { message: { content: string | null } } | undefined; const text = choice ? choice.message.content ?? "" : ""; return { text, usage: { promptTokens: response.usage?.prompt_tokens ?? 0, completionTokens: response.usage?.completion_tokens ?? 0, }, }; } catch (err: unknown) { if (err instanceof OpenAI.APIError) { throw new AzureOpenAiError(err.message, err.status as number); } throw new AzureOpenAiError( err instanceof Error ? err.message : "Unknown Azure OpenAI error", 0, ); } } } export function createAzureOpenAiService(): AzureOpenAiService { const endpoint = process.env.AZURE_OPENAI_ENDPOINT ?? ""; const apiKey = process.env.AZURE_OPENAI_API_KEY ?? ""; const deployment = process.env.AZURE_OPENAI_DEPLOYMENT ?? "gpt-4o"; const apiVersion = process.env.AZURE_OPENAI_API_VERSION ?? "2025-01-01-preview"; return new AzureOpenAiService({ endpoint, apiKey, deployment, apiVersion }); }

import { SpanListener } from "@reaatech/agent-budget-otel-bridge"; import { NodeTracerProvider, BatchSpanProcessor } from "@opentelemetry/sdk-trace-node"; import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http"; import type { BudgetController } from "@reaatech/agent-budget-engine"; import type { ReadableSpan, SpanProcessor } from "@opentelemetry/sdk-trace-node"; import { resourceFromAttributes } from "@opentelemetry/resources"; export type ControllerLike = Pick<BudgetController, "check" | "record" | "getState" | "defineBudget" | "listAll" | "on">; const ATTR_SERVICE_NAME = "service.name"; let provider: NodeTracerProvider | undefined; let listener: SpanListener | undefined; let initialized = false; export const budgetProcessor: SpanProcessor = { onEnd(span: ReadableSpan): void { listener?.onSpanEnd(span.attributes); }, forceFlush: () => Promise.resolve(), shutdown: () => Promise.resolve(), onStart: () => { return; }, }; export function initLangfuseOtel(controller: ControllerLike): SpanListener { if (initialized && listener) return listener; initialized = true; const publicKey = process.env.LANGFUSE_PUBLIC_KEY ?? ""; const secretKey = process.env.LANGFUSE_SECRET_KEY ?? ""; const baseUrl = process.env.LANGFUSE_BASE_URL ?? "https://cloud.langfuse.com"; const serviceName = process.env.OTEL_SERVICE_NAME ?? "azure-ai-spend-control"; const controllerForBridge: unknown = controller; if (!publicKey || !secretKey) { console.warn("Langfuse credentials not configured — skipping OTel export"); listener = new SpanListener({ controller: controllerForBridge as BudgetController }); return listener; } const exporter = new OTLPTraceExporter({ url: `${baseUrl}/api/public/otel/v1/traces`, headers: { Authorization: `Basic ${Buffer.from(`${publicKey}:${secretKey}`).toString("base64")}`, }, }); provider = new NodeTracerProvider({ resource: resourceFromAttributes({ [ATTR_SERVICE_NAME]: serviceName, }), spanProcessors: [ new BatchSpanProcessor(exporter), budgetProcessor, ], }); provider.register(); listener = new SpanListener({ controller: controllerForBridge as BudgetController }); return listener; } export async function shutdownOtel(): Promise<void> { if (provider) { await provider.shutdown(); provider = undefined; initialized = false; } }

Azure AI Spend Control for Multi-Model SMB Workflows

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Configure environment variables

Step 3: Define the TypeScript types

Step 4: Create the Azure pricing provider

Step 5: Initialize the budget engine

Step 6: Create the Express budget middleware bridge

Step 7: Create the Azure OpenAI service

Step 8: Set up the Langfuse OTel bridge

Step 9: Build the chat API route

Step 10: Build the admin spend API routes

Step 11: Create the health server

Step 12: Write and run the tests

Step 13: Start the server and make a test request

Next steps