Files · Databricks AI Spend Control for Budget-Conscious SMBs
95 (1 binary, 613.9 kB total)attempt 1
README.md·7357 B·markdown
markdown
# Databricks AI Spend Control for Budget-Conscious SMBs
> Enforce per-agent LLM budgets and automatically downgrade models when costs exceed thresholds to keep SMB AI operations within budget.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Problem description
SMBs deploying Databricks-powered AI agents face unpredictable LLM costs. Without automated spend controls, they risk overspending or service disruption.
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Express Middleware │
│ (@reaatech/agent-budget-middleware) │
│ │
│ POST /agent/chat ─► budget check ─► route to LLM │
│ │ │
│ ├──► @reaatech/agent-budget-engine │
│ │ └── per-agent budgets from EDGE_CONFIG │
│ │ │
│ ├──► @reaatech/llm-router-engine │
│ │ └── DBRX (primary) → Mixtral (fallback) │
│ │ │
│ ├──► @reaatech/llm-cost-telemetry-calculator │
│ │ └── real-time cost tracking per request │
│ │ │
│ └──► @reaatech/agent-budget-spend-tracker │
│ └── multi-tenant spend aggregator (Redis) │
│ │
│ POST /webhook/helicone ─► spend-tracker update │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────▼──────┐
│ Redis │
│ (spend) │
└──────┬──────┘
│
┌──────▼──────────┐
│ Next.js App │
│ Dashboard │
│ (reads Redis) │
└─────────────────┘
Helicone Async Tracing ───────────────────► all request/response data
```
Key flow:
1. Client sends `POST /agent/chat` to the Express middleware
2. `@reaatech/agent-budget-middleware` checks the agent's current spend via `@reaatech/agent-budget-engine` (budget limits from Vercel Edge Config)
3. If within budget, `@reaatech/llm-router-engine` routes the request — DBRX (primary, higher cost) or Mixtral (fallback, lower cost) based on budget state
4. `@reaatech/llm-cost-telemetry-calculator` computes real-time costs for each request
5. `@reaatech/agent-budget-spend-tracker` aggregates multi-tenant spend in Redis
6. Helicone asynchronously traces all requests; cost events arrive via `POST /webhook/helicone`
7. Next.js dashboard reads spend state directly from Redis
## REAA package list
| Package | Version | Role |
|---|---|---|
| @reaatech/agent-budget-engine | 0.1.1 | foundation |
| @reaatech/agent-budget-middleware | 0.1.1 | supporting |
| @reaatech/llm-cost-telemetry-calculator | 0.1.1 | supporting |
| @reaatech/llm-router-engine | 1.0.1 | supporting |
| @reaatech/agent-budget-types | 0.1.1 | supporting |
| @reaatech/agent-budget-spend-tracker | 0.1.1 | supporting |
## Environment variables reference
| Variable | Description |
|---|---|
| DATABRICKS_API_KEY | Databricks LLM endpoint auth |
| DATABRICKS_BASE_URL | Databricks serving endpoint base URL |
| REDIS_URL | Redis connection string |
| EDGE_CONFIG | Vercel Edge Config endpoint URL |
| HELICONE_API_KEY | Helicone observability API key |
| LANGFUSE_PUBLIC_KEY | Langfuse public key |
| LANGFUSE_SECRET_KEY | Langfuse secret key |
| LANGFUSE_BASE_URL | Langfuse base URL |
| BUDGET_DEFAULT_LIMIT | Default per-agent daily budget limit |
| BUDGET_SOFT_CAP | Soft cap fraction for warning/downgrade |
| BUDGET_HARD_CAP | Hard cap fraction for stop |
| PORT | Express middleware server port |
## API reference
### `POST /agent/chat`
Make an LLM request with budget enforcement.
**Request:**
```json
{
"prompt": "string",
"scopeType": "string",
"scopeKey": "string",
"modelId": "string (optional)",
"tools": "array (optional)"
}
```
**Response (200):**
```json
{
"content": "string",
"model": "string",
"cost": "number",
"latencyMs": "number"
}
```
**Error responses:**
- `402` — Budget exceeded
- `503` — Router not initialized
### `POST /webhook/helicone`
Receive cost events from Helicone.
**Request:**
```json
{
"scopeType": "string",
"scopeKey": "string",
"cost": "number",
"requestId": "string",
"provider": "string",
"modelId": "string",
"inputTokens": "number",
"outputTokens": "number"
}
```
**Response:** `204 No Content`
### `GET /api/dashboard/spend`
List all scope spend data.
### `GET /api/dashboard/spend/:scopeKey`
Get per-scope spend plus remaining budget.
### `GET /api/models`
List registered models with their costs.
## Budget state machine
```
┌──────────┐
│ Active │
└────┬─────┘
│ spend > softCap * defaultLimit
▼
┌──────────┐
│ Warned │
└────┬─────┘
│ spend > defaultLimit
▼
┌───────────┐
│ Degraded │
└────┬──────┘
│ spend > hardCap * defaultLimit
▼
┌──────────┐
│ Stopped │
└──────────┘
```
- **Active** — request routed to primary model (DBRX)
- **Warned** — warning logged, still using primary model
- **Degraded** — downgraded to fallback model (Mixtral)
- **Stopped** — all requests rejected with 402
## Model hierarchy
| Priority | Model | Role | Cost |
|---|---|---|---|
| 1 | DBRX | Primary (default) | Higher |
| 2 | Mixtral | Fallback (degraded) | Lower |
When the budget state transitions to **Degraded**, the `@reaatech/llm-router-engine` automatically falls back from DBRX to Mixtral, reducing per-request cost while still serving requests.
## Setup
```bash
pnpm install
pnpm test # vitest run with coverage
pnpm dev # next dev
```
## License
MIT — see [LICENSE](./LICENSE).