SMBs deploying Databricks-powered AI agents face unpredictable LLM costs that can spiral, especially when agents handle fluctuating request volumes. Without automated spend controls, they risk overspending or service disruption.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a Databricks AI Spend Control system — a full-stack solution that enforces per-agent LLM budgets and automatically downgrades models when costs exceed thresholds. You’ll wire together REAA budget engines, a cost-telemetry calculator, a config-driven LLM router, an Express middleware layer for real-time budget enforcement, and a Next.js dashboard that displays per-scope spend data. Each step is copy-paste ready, and by the end you’ll have a running system that keeps SMB AI operations within budget without manual intervention.
Prerequisites
Node.js 22+ and pnpm 10 installed on your machine
A Databricks workspace with a serving endpoint for LLM inference (or a compatible OpenAI-compatible API)
A Redis instance (local or remote) for shared spend state
A Vercel Edge Config store for per-agent budget profiles (or a local fallback)
Basic familiarity with TypeScript, Express, Next.js App Router, and async/await patterns
Step 1: Scaffold the project
Create a new directory and initialize the project with the exact dependencies needed. The package manager is pnpm, and every dependency is pinned to a precise version to avoid surprises.
terminal
mkdir databricks-ai-spend-control && cd databricks-ai-spend-controlpnpm init
Expected output: A package.json with all dependencies pinned to exact versions (no ^ or ~ prefixes). Your node_modules/ directory is populated and pnpm-lock.yaml exists.
Step 2: Configure TypeScript, ESLint, Vitest, and Next.js
Create the root configuration files. These are the scaffolding your project needs to compile, test, and lint correctly.
The experimental.instrumentationHook: true flag is required because you will add a src/instrumentation.ts file later. Without it, the instrumentation code is dead code that never executes.
Expected output: Running pnpm typecheck should exit 0 with no errors.
Step 3: Set up environment variables
Create .env.example with placeholders for every configuration value the system reads at runtime. Never commit real secrets — only names and placeholders.
env
# Databricks AI Spend Control — environment variables# Keep placeholders only — never commit real values.NODE_ENV=development# Databricks LLM endpointDATABRICKS_API_KEY=<your-databricks-api-key>DATABRICKS_BASE_URL=<your-databricks-workspace-url># Redis for shared spend stateREDIS_URL=redis://localhost:6379# Vercel Edge Config for per-agent budget profilesEDGE_CONFIG=<your-edge-config-endpoint># Helicone observabilityHELICONE_API_KEY=<your-helicone-api-key># Langfuse tracingLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=https://cloud.langfuse.com# Budget defaultsBUDGET_DEFAULT_LIMIT=10.0BUDGET_SOFT_CAP=0.8BUDGET_HARD_CAP=1.0# Express middleware server portPORT=3001
Copy it to .env and fill in your real values:
terminal
cp .env.example .env
Expected output: An .env.example file with placeholders for Databricks credentials, Redis, Edge Config, Helicone, and Langfuse.
Step 4: Create shared type schemas
Before writing the budget engine and route handlers, you need Zod-backed schemas that validate every boundary in the system. These schemas define the shape of budget profiles, API requests, and webhook payloads.
Create the source directory structure:
terminal
mkdir -p src/budget src/lib
Create src/budget/types.ts — the budget profile schema:
ts
import { z } from "zod"export const BudgetProfileSchema = z.object({ scopeType: z.string(), scopeKey: z.string(), limit: z.number().positive(), softCap: z.number(), hardCap: z.number(), autoDowngrade: z.array(z.object({ from: z.array(z.string()), to: z.string() })), disableTools: z.array(z.string()),})export type BudgetProfile = z.infer<typeof BudgetProfileSchema>
Create src/lib/validation.ts — schemas for agent chat requests and Helicone webhook payloads:
ts
import { z } from "zod"export const AgentChatRequestSchema = z.object({ prompt: z.string().min(1), scopeType: z.string(), scopeKey: z.string(), modelId: z.string().optional(), tools: z.array(z.string()).optional(),})export type AgentChatRequest = z.infer<typeof AgentChatRequestSchema>export const WebhookPayloadSchema = z.object({ scopeType: z.string(), scopeKey: z.string(), cost: z.number(), requestId: z.string(), provider: z.string(), modelId: z.string(), inputTokens: z.number(), outputTokens: z.number(),})export type WebhookPayload = z.infer<typeof WebhookPayloadSchema>export { BudgetProfileSchema } from "../budget/types.js"export type { BudgetProfile } from "../budget/types.js"
Expected output:pnpm typecheck exits 0. Zod can now validate every HTTP request and webhook payload before it touches the budget engine.
Step 5: Register Databricks pricing tiers
The @reaatech/llm-cost-telemetry-calculator package ships with pricing for public providers but not for Databricks-hosted models. You’ll register custom tiers so the calculator can estimate and track Databricks inference costs.
Expected output: Calling registerDatabricksPricing() registers two pricing tiers. DBRX costs 50¢ per million input tokens and $1.00 per million output tokens; Mixtral costs 20¢ and 40¢ respectively.
Step 6: Create the Redis client
Redis stores per-scope spend state that the dashboard reads. The client uses a singleton pattern so all modules share one connection.
Create src/redis/client.ts:
ts
import { createClient } from "redis"import type { RedisClientType } from "redis"export let redis: RedisClientType | undefinedexport async function connectRedis(): Promise<void> { const client = createClient({ url: process.env.REDIS_URL ?? "redis://localhost:6379" }) client.on("error", (err) => { console.error("Redis Client Error", err); }) await client.connect() redis = client}export function closeRedis(): void { if (redis) { redis.destroy() redis = undefined }}
The client.on("error", …) listener is mandatory — Redis will throw unhandled errors and crash the process without it. The closeRedis() function cleanly tears down the connection.
Expected output:pnpm typecheck exits 0. The redis export starts as undefined and gets assigned after connectRedis() is called at startup.
Step 7: Build the budget engine controller
@reaatech/agent-budget-engine is the heart of spend enforcement. You’ll wire it with a SpendStore (from @reaatech/agent-budget-spend-tracker), load budget profiles from Vercel Edge Config, and subscribe to threshold events.
Create src/budget/config.ts — loads budget profiles from Edge Config and validates them with Zod:
ts
import { get } from "@vercel/edge-config"import { BudgetProfileSchema, type BudgetProfile } from "./types.js"export async function loadBudgetProfiles(): Promise<BudgetProfile[]> { const raw = await get("budgetProfiles") if (!Array.isArray(raw)) return [] return raw.map((r: unknown) => BudgetProfileSchema.parse(r))}export async function getBudgetProfile(scopeKey: string): Promise<BudgetProfile | undefined> { const profiles = await loadBudgetProfiles() return profiles.find(p => p.scopeKey === scopeKey)}
Create src/budget/controller.ts — the controller singleton that enforces real-time budgets and persists spend deltas to Redis:
ts
import { BudgetController } from "@reaatech/agent-budget-engine"import { SpendStore } from "@reaatech/agent-budget-spend-tracker"import { BudgetScope } from "@reaatech/agent-budget-types"import { loadBudgetProfiles } from "./config.js"import { redis } from "../redis/client.js"let controller: BudgetController | undefinedexport async function getBudgetController(): Promise<BudgetController> { if (controller) return controller const store = new SpendStore({ maxEntries: 500_000 }) controller = new BudgetController({ spendTracker: store }) const profiles = await loadBudgetProfiles() for (const p of profiles) { controller.defineBudget({ scopeType: parseScopeType(p.scopeType), scopeKey: p.scopeKey, limit: p.limit, policy: { softCap: p.softCap, hardCap: p.hardCap, autoDowngrade: p.autoDowngrade, disableTools: p.disableTools }, }) } controller.on("threshold-breach", (event) => { console.warn("threshold-breach", event) }) controller.on("hard-stop", (event) => { console.warn("hard-stop", event) }) return controller}function parseScopeType(s: string): BudgetScope { switch (s.toLowerCase()) { case "user": return BudgetScope.User case "org": return BudgetScope.Org case "session": return BudgetScope.Session case "task": return BudgetScope.Task default: return BudgetScope.User }}export async function recordSpendToRedis(scopeKey: string, cost: number): Promise<void> { if (redis?.isReady) { await redis.hIncrByFloat(`scope:${scopeKey}`, "spent", cost) }}
Create src/budget/index.ts — the barrel export:
ts
export * from "./types.js"export * from "./config.js"export { getBudgetController, recordSpendToRedis } from "./controller.js"
Expected output:pnpm typecheck exits 0. The controller subscribes to threshold-breach and hard-stop events, and every controller.record() call is paired with a Redis HINCRBYFLOAT to persist the spend delta.
Step 8: Configure the LLM router
The router uses a YAML string to define models, strategies, and budgets. You’ll configure DBRX as the primary workhorse and Mixtral as the cost-effective fallback.
Expected output:parseRouterConfig() returns a RouterConfig object with two models, one cost-optimized strategy, and a daily budget with alert thresholds at 50%, 75%, and 90%.
Step 9: Create the model execution layer
When the router decides which model to call, it needs an executeModel callback that actually dispatches the LLM request. This uses the Vercel AI SDK’s generateText with the OpenAI-compatible Databricks endpoint and logs telemetry to Helicone.
Create src/router/instance.ts — the router singleton:
ts
import { LLMRouter } from "@reaatech/llm-router-engine"import { buildRouterConfig } from "./config.js"import { executeModel } from "./execute-model.js"export let router: LLMRouter | undefinedexport function initRouter(): Promise<LLMRouter> { const config = buildRouterConfig() router = LLMRouter.fromConfig(config, { executeModel }) return Promise.resolve(router)}
Create src/router/cost-tracker.ts — cost calculation helpers that wrap the telemetry calculator:
ts
import { calculateCost, estimateCost } from "@reaatech/llm-cost-telemetry-calculator"type CostBreakdown = { inputCostUsd: number outputCostUsd: number cacheReadCostUsd?: number cacheCreationCostUsd?: number}function asProvider(s: string): Parameters<typeof calculateCost>[0]["provider"] { return s as Parameters<typeof calculateCost>[0]["provider"]}export function calculateRequestCost(model: string, inputTokens: number, outputTokens: number): { costUsd: number; breakdown: CostBreakdown } { return calculateCost({ provider: asProvider("databricks"), model, inputTokens, outputTokens })}export async function estimateRequestCost(model: string, estimatedInputTokens: number, estimatedOutputTokens: number): Promise<number> { const est = await estimateCost({ provider: asProvider("databricks"), model, inputTokens: estimatedInputTokens, outputTokens: estimatedOutputTokens }) return est.usd}
Expected output:pnpm typecheck exits 0. The executeModel function wraps the Databricks call with Helicone async logging and returns content plus token counts.
Step 10: Create the budget middleware
The middleware sits between HTTP requests and your LLM call. The beforeStep handler runs a pre-flight budget check and injects scope context; the afterStep handler records actual spend after the LLM responds.
import express from "express"import { configureBudgetMiddleware } from "../budget/middleware.js"import { agentChatHandler } from "./routes/agent.js"import { webhookHandler } from "./routes/webhook.js"export function createApp() { const app = express() app.use(express.json()) return app}export async function setupMiddleware(app: ReturnType<typeof express>) { const mw = await configureBudgetMiddleware() app.use("/agent", (req, res, next) => { mw.beforeStep(req, res, next) }) app.use("/agent", (req, res, next) => { mw.afterStep(req, res, next) }) app.post("/agent/chat", agentChatHandler) app.post("/webhook/helicone", webhookHandler)}export async function startServer(): Promise<void> { const app = createApp() await setupMiddleware(app) const port = process.env.PORT || "3001" app.listen(parseInt(port, 10), () => { console.log(`Express server on port ${port}`) })}
Expected output:pnpm typecheck exits 0. The Express app mounts the budget middleware on /agent, then attaches the chat and webhook routes.
Step 12: Create the Redis spend store
The dashboard needs aggregated spend data from Redis. These helper functions read per-scope metrics that the route handlers have been persisting.
Create src/redis/spend-store.ts:
ts
import { redis } from "./client.js"export async function getAgentSpend(scopeKey: string): Promise<number> { if (!redis?.isReady) return 0 const val = await redis.hGet(`scope:${scopeKey}`, "spent") return val ? parseFloat(val) : 0}export async function getAgentRemaining(scopeKey: string): Promise<number> { if (!redis?.isReady) return 0 const val = await redis.hGet(`scope:${scopeKey}`, "remaining") return val ? parseFloat(val) : 0}export async function getAllScopes(): Promise<Array<{ scopeKey: string; spent: number; remaining: number }>> { if (!redis?.isReady) return [] const keys = await redis.keys("scope:*") const results: Array<{ scopeKey: string; spent: number; remaining: number }> = [] for (const key of keys) { const data = await redis.hGetAll(key) const scopeKey = key.replace("scope:", "") const spent = parseFloat(data.spent || "0") const remaining = parseFloat(data.remaining || "0") results.push({ scopeKey, spent, remaining }) } return results}export async function persistSpendEntry(scopeKey: string, cost: number): Promise<void> { if (redis?.isReady) { await redis.hIncrByFloat(`scope:${scopeKey}`, "spent", cost) await redis.hIncrByFloat(`scope:${scopeKey}`, "remaining", -cost) }}
Expected output:pnpm typecheck exits 0. All four functions guard against an unready Redis connection and return sensible defaults.
Step 13: Create the Next.js dashboard
The dashboard provides a real-time view of per-agent spend. It reads from Redis and renders budget gauge bars that change color as spend approaches the limit.
Create app/api/dashboard/spend/route.ts — returns all scope spend as JSON:
ts
import { NextRequest, NextResponse } from "next/server"import { getAllScopes } from "../../../../src/redis/spend-store.js"export async function GET(_req: NextRequest) { const scopes = await getAllScopes() void _req return NextResponse.json(scopes)}
Because this file exports a register() function, the next.config.ts you created in Step 2 must have experimental.instrumentationHook: true — verify it’s there. Without this flag, the instrumentation is dead code.
Create src/index.ts — a standalone entry point for non-Next.js usage:
Expected output:pnpm typecheck exits 0. The register guard checks process.env.NEXT_RUNTIME === "nodejs" so the Edge runtime doesn’t hit Node-only imports.
Step 16: Run the tests
Create the test infrastructure — an MSW server for HTTP mocking and mock modules for external dependencies.
Now verify everything compiles and the test runner is set up correctly:
terminal
pnpm typecheckpnpm lintpnpm test
Expected output:
pnpm typecheck exits 0 with no TypeScript errors.
pnpm lint exits 0 with no ESLint violations.
pnpm test exits 0 with numFailedTests === 0, numTotalTests >= 3, and all four coverage thresholds (lines, branches, functions, statements) at 90% or above.
Next steps
Add more model tiers — extend the router config with additional models (Llama, Claude, GPT) and define fallback chains that step down through progressively cheaper options as budgets tighten.
Implement chargeback webhooks — extend the /webhook/helicone endpoint to push aggregated cost events to an internal billing system for per-tenant chargebacks.
Add alert notifications — wire the threshold-breach and hard-stop events to Slack, email, or PagerDuty so ops teams get notified when budgets approach their limits.