Files · Azure AI Spend Control for Multi-Model SMB Workflows

66 (1 binary, 517.0 kB total)attempt 1

README.md·2747 B·markdown

markdown

# Azure AI Spend Control for Multi-Model SMB Workflows
 
Real-time budget enforcement and cost telemetry for Azure AI deployments across multiple models, preventing runaway spend.
 
## Problem
 
Small businesses using Azure AI services often lose control of per-model, per-session costs, especially when orchestrating multiple models for complex workflows. Unexpected overages hurt margins.
 
## How It Works
 
Next.js API server → `/api/chat` → BudgetController checks spend → Azure OpenAI call → spend recorded → OTel spans exported to Langfuse
 
## Features
 
- Real-time budget checks per scope (user/org/session/task)
- Auto-downgrade to cheaper models when budgets tighten
- Expensive tool filtering
- OpenTelemetry spend export to Langfuse
- Admin dashboard for per-team spend summaries
- Azure OpenAI integration with multiple deployment support
 
## Prerequisites
 
- Node.js >=22
- pnpm
- Azure OpenAI deployment (with API key and endpoint)
- Langfuse account (for OTel observability)
 
## Getting Started
 
1. Copy `.env.example` to `.env` and fill in your values
2. `pnpm install`
3. `pnpm dev`
 
## API Reference
 
### POST /api/chat
 
Send a chat prompt through the budget-controlled Azure OpenAI pipeline.
 
**Request headers:**
- `x-budget-scope-type`: One of `user`, `org`, `session`, `task` (default: `user`)
- `x-budget-scope-key`: Scope identifier (default: `default`)
 
**Request body:**
```json
{
  "prompt": "string (required)",
  "modelId": "string (optional, default: gpt-4o)",
  "tools": ["string array (optional)"]
}
```
 
**Response (200):**
```json
{
  "reply": "string",
  "cost": 0.0,
  "modelId": "string",
  "usage": { "inputTokens": 0, "outputTokens": 0 }
}
```
 
**Response (402):** Budget exceeded — `{ "error": "Budget exceeded", "remaining": 0.0 }`
 
### GET /api/admin/spend
 
List all budget scopes with spend summaries.
 
**Response (200):**
```json
{
  "budgets": [
    { "scopeType": "user", "scopeKey": "team-alpha", "spent": 0.0, "limit": 10.0, "remaining": 10.0, "state": "Active" }
  ]
}
```
 
### GET /api/admin/spend/:scopeType/:scopeKey
 
Get detailed spend for a specific scope including rate and projection.
 
## Budget Scopes & Headers
 
The budget enforcement uses a state machine per scope:
 
`Active → Warned → Degraded → Stopped`
 
Send `x-budget-scope-type` and `x-budget-scope-key` headers to associate each request with a budget scope.
 
## Observability
 
Spans are routed to Langfuse via the OpenTelemetry bridge. View cost dashboards and traces at https://cloud.langfuse.com
 
## Testing
 
```bash
pnpm test        # runs vitest with coverage
pnpm typecheck   # TypeScript type checking
pnpm lint        # ESLint
```
 
Coverage target: >=90% on runtime code (services, middleware, route handlers).