Files · Azure AI Spend Control for Multi-Model SMB Workflows
66 (1 binary, 517.0 kB total)attempt 1
README.md·2747 B·markdown
markdown
# Azure AI Spend Control for Multi-Model SMB Workflows
Real-time budget enforcement and cost telemetry for Azure AI deployments across multiple models, preventing runaway spend.
## Problem
Small businesses using Azure AI services often lose control of per-model, per-session costs, especially when orchestrating multiple models for complex workflows. Unexpected overages hurt margins.
## How It Works
Next.js API server → `/api/chat` → BudgetController checks spend → Azure OpenAI call → spend recorded → OTel spans exported to Langfuse
## Features
- Real-time budget checks per scope (user/org/session/task)
- Auto-downgrade to cheaper models when budgets tighten
- Expensive tool filtering
- OpenTelemetry spend export to Langfuse
- Admin dashboard for per-team spend summaries
- Azure OpenAI integration with multiple deployment support
## Prerequisites
- Node.js >=22
- pnpm
- Azure OpenAI deployment (with API key and endpoint)
- Langfuse account (for OTel observability)
## Getting Started
1. Copy `.env.example` to `.env` and fill in your values
2. `pnpm install`
3. `pnpm dev`
## API Reference
### POST /api/chat
Send a chat prompt through the budget-controlled Azure OpenAI pipeline.
**Request headers:**
- `x-budget-scope-type`: One of `user`, `org`, `session`, `task` (default: `user`)
- `x-budget-scope-key`: Scope identifier (default: `default`)
**Request body:**
```json
{
"prompt": "string (required)",
"modelId": "string (optional, default: gpt-4o)",
"tools": ["string array (optional)"]
}
```
**Response (200):**
```json
{
"reply": "string",
"cost": 0.0,
"modelId": "string",
"usage": { "inputTokens": 0, "outputTokens": 0 }
}
```
**Response (402):** Budget exceeded — `{ "error": "Budget exceeded", "remaining": 0.0 }`
### GET /api/admin/spend
List all budget scopes with spend summaries.
**Response (200):**
```json
{
"budgets": [
{ "scopeType": "user", "scopeKey": "team-alpha", "spent": 0.0, "limit": 10.0, "remaining": 10.0, "state": "Active" }
]
}
```
### GET /api/admin/spend/:scopeType/:scopeKey
Get detailed spend for a specific scope including rate and projection.
## Budget Scopes & Headers
The budget enforcement uses a state machine per scope:
`Active → Warned → Degraded → Stopped`
Send `x-budget-scope-type` and `x-budget-scope-key` headers to associate each request with a budget scope.
## Observability
Spans are routed to Langfuse via the OpenTelemetry bridge. View cost dashboards and traces at https://cloud.langfuse.com
## Testing
```bash
pnpm test # runs vitest with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
```
Coverage target: >=90% on runtime code (services, middleware, route handlers).