Files · Databricks AI Spend Control for Budget-Conscious SMBs

95 (1 binary, 613.9 kB total)attempt 1
README.md·7357 B·markdown
markdown
# Databricks AI Spend Control for Budget-Conscious SMBs
 
> Enforce per-agent LLM budgets and automatically downgrade models when costs exceed thresholds to keep SMB AI operations within budget.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem description
 
SMBs deploying Databricks-powered AI agents face unpredictable LLM costs. Without automated spend controls, they risk overspending or service disruption.
 
## Architecture
 
```
┌─────────────────────────────────────────────────────────────┐
│                     Express Middleware                        │
│           (@reaatech/agent-budget-middleware)                 │
│                                                              │
│  POST /agent/chat ─► budget check ─► route to LLM            │
│       │                                                      │
│       ├──► @reaatech/agent-budget-engine                     │
│       │       └── per-agent budgets from EDGE_CONFIG         │
│       │                                                      │
│       ├──► @reaatech/llm-router-engine                       │
│       │       └── DBRX (primary) → Mixtral (fallback)        │
│       │                                                      │
│       ├──► @reaatech/llm-cost-telemetry-calculator           │
│       │       └── real-time cost tracking per request        │
│       │                                                      │
│       └──► @reaatech/agent-budget-spend-tracker              │
│               └── multi-tenant spend aggregator (Redis)      │
│                                                              │
│  POST /webhook/helicone ─► spend-tracker update              │
└──────────────────────────┬──────────────────────────────────┘
                           │
                    ┌──────▼──────┐
                    │    Redis    │
                    │  (spend)    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────────┐
                    │  Next.js App    │
                    │  Dashboard      │
                    │  (reads Redis)  │
                    └─────────────────┘
 
Helicone Async Tracing ───────────────────► all request/response data
```
 
Key flow:
1. Client sends `POST /agent/chat` to the Express middleware
2. `@reaatech/agent-budget-middleware` checks the agent's current spend via `@reaatech/agent-budget-engine` (budget limits from Vercel Edge Config)
3. If within budget, `@reaatech/llm-router-engine` routes the request — DBRX (primary, higher cost) or Mixtral (fallback, lower cost) based on budget state
4. `@reaatech/llm-cost-telemetry-calculator` computes real-time costs for each request
5. `@reaatech/agent-budget-spend-tracker` aggregates multi-tenant spend in Redis
6. Helicone asynchronously traces all requests; cost events arrive via `POST /webhook/helicone`
7. Next.js dashboard reads spend state directly from Redis
 
## REAA package list
 
| Package | Version | Role |
|---|---|---|
| @reaatech/agent-budget-engine | 0.1.1 | foundation |
| @reaatech/agent-budget-middleware | 0.1.1 | supporting |
| @reaatech/llm-cost-telemetry-calculator | 0.1.1 | supporting |
| @reaatech/llm-router-engine | 1.0.1 | supporting |
| @reaatech/agent-budget-types | 0.1.1 | supporting |
| @reaatech/agent-budget-spend-tracker | 0.1.1 | supporting |
 
## Environment variables reference
 
| Variable | Description |
|---|---|
| DATABRICKS_API_KEY | Databricks LLM endpoint auth |
| DATABRICKS_BASE_URL | Databricks serving endpoint base URL |
| REDIS_URL | Redis connection string |
| EDGE_CONFIG | Vercel Edge Config endpoint URL |
| HELICONE_API_KEY | Helicone observability API key |
| LANGFUSE_PUBLIC_KEY | Langfuse public key |
| LANGFUSE_SECRET_KEY | Langfuse secret key |
| LANGFUSE_BASE_URL | Langfuse base URL |
| BUDGET_DEFAULT_LIMIT | Default per-agent daily budget limit |
| BUDGET_SOFT_CAP | Soft cap fraction for warning/downgrade |
| BUDGET_HARD_CAP | Hard cap fraction for stop |
| PORT | Express middleware server port |
 
## API reference
 
### `POST /agent/chat`
 
Make an LLM request with budget enforcement.
 
**Request:**
```json
{
  "prompt": "string",
  "scopeType": "string",
  "scopeKey": "string",
  "modelId": "string (optional)",
  "tools": "array (optional)"
}
```
 
**Response (200):**
```json
{
  "content": "string",
  "model": "string",
  "cost": "number",
  "latencyMs": "number"
}
```
 
**Error responses:**
- `402` — Budget exceeded
- `503` — Router not initialized
 
### `POST /webhook/helicone`
 
Receive cost events from Helicone.
 
**Request:**
```json
{
  "scopeType": "string",
  "scopeKey": "string",
  "cost": "number",
  "requestId": "string",
  "provider": "string",
  "modelId": "string",
  "inputTokens": "number",
  "outputTokens": "number"
}
```
 
**Response:** `204 No Content`
 
### `GET /api/dashboard/spend`
 
List all scope spend data.
 
### `GET /api/dashboard/spend/:scopeKey`
 
Get per-scope spend plus remaining budget.
 
### `GET /api/models`
 
List registered models with their costs.
 
## Budget state machine
 
```
                  ┌──────────┐
                  │  Active   │
                  └────┬─────┘
                       │ spend > softCap * defaultLimit
                       ▼
                  ┌──────────┐
                  │  Warned   │
                  └────┬─────┘
                       │ spend > defaultLimit
                       ▼
                  ┌───────────┐
                  │ Degraded   │
                  └────┬──────┘
                       │ spend > hardCap * defaultLimit
                       ▼
                  ┌──────────┐
                  │ Stopped   │
                  └──────────┘
```
 
- **Active** — request routed to primary model (DBRX)
- **Warned** — warning logged, still using primary model
- **Degraded** — downgraded to fallback model (Mixtral)
- **Stopped** — all requests rejected with 402
 
## Model hierarchy
 
| Priority | Model | Role | Cost |
|---|---|---|---|
| 1 | DBRX | Primary (default) | Higher |
| 2 | Mixtral | Fallback (degraded) | Lower |
 
When the budget state transitions to **Degraded**, the `@reaatech/llm-router-engine` automatically falls back from DBRX to Mixtral, reducing per-request cost while still serving requests.
 
## Setup
 
```bash
pnpm install
pnpm test            # vitest run with coverage
pnpm dev             # next dev
```
 
## License
 
MIT — see [LICENSE](./LICENSE).