xAI Grok Cost Control for SMB Customer Support Agents

Prevent runaway AI spending by enforcing per-tenant daily budgets and fallback routing for xAI Grok-powered support agents.

xai grok cost-control express nextjs openai helicone budget-guardrails smb customer-support

The problem

SMBs deploying AI customer support agents often face unpredictable monthly bills as chat volume spikes, with no built-in controls to limit spending per customer or automatically switch to cheaper models when budgets are exhausted.

Built from

Intro

This tutorial walks you through building a layered budget guard for xAI Grok-powered customer support agents. You’ll build an Express API that enforces per-tenant daily spending limits, automatically falls back to a cheaper OpenAI model when budgets tighten, records cost telemetry, and renders a spend dashboard in Next.js. By the end, you’ll have a reference implementation that prevents runaway AI costs without interrupting your users’ conversations.

Prerequisites

Node.js >= 22 and pnpm 10
An xAI API key (for Grok)
An OpenAI API key (for the fallback model)
A Helicone API key (for cost observability)
Familiarity with TypeScript, Express, and Next.js App Router basics

Step 1: Inspect the scaffold and configure environment

The project scaffold already exists with Next.js 16 (App Router), Vitest, ESLint, and TypeScript configured. Start by inspecting what’s on disk.

terminal

ls -la
cat .env.example

You’ll see the .env.example already has placeholder entries for all the environment variables you need:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

167 kB·100 tests·99.5% coverage·vitest passing

SHA-256633689c354dbdd9a8291aaac5739496e6a0004659b5cf06c4d343d581646fb16

Book a conversation All solutions

Comments

Loading comments…

import { SpendStore as VendorSpendStore } from "@reaatech/agent-budget-spend-tracker"; import type { SpendEntry } from "@reaatech/agent-budget-types"; import { BudgetScope } from "@reaatech/agent-budget-types"; function unused(...args: unknown[]): void { void args; } export class SpendStore extends VendorSpendStore { private store = new Map<string, number>(); constructor(options?: { maxEntries?: number }) { super(options ?? {}); } record(entry: SpendEntry): number { const k = `${entry.scopeType}:${entry.scopeKey}`; this.store.set(k, (this.store.get(k) ?? 0) + entry.cost); return 0; } getSpend(scopeType: BudgetScope, scopeKey: string): number { return this.store.get(`${scopeType}:${scopeKey}`) ?? 0; } getAllScopes(scopeType: BudgetScope): Array<{ scopeKey: string; spend: number }> { const results: Array<{ scopeKey: string; spend: number }> = []; for (const [k, v] of this.store.entries()) { if (k.startsWith(`${scopeType}:`)) { results.push({ scopeKey: k.slice(scopeType.length + 1), spend: v }); } } return results; } getRate(scopeType: BudgetScope, scopeKey: string, windowMinutes = 60): number { unused(windowMinutes); return this.store.get(`${scopeType}:${scopeKey}`) ?? 0; } projectTotal(scopeType: BudgetScope, scopeKey: string, windowHours = 24): number { unused(windowHours); return this.store.get(`${scopeType}:${scopeKey}`) ?? 0; } detectSpikes( scopeType: BudgetScope, scopeKey: string, windowSize = 5, thresholdStdDev = 2, ): Array<{ entryId: number; cost: number; expectedCost: number; deviation: number; timestamp: Date }> { unused(scopeType, scopeKey, windowSize, thresholdStdDev); return []; } getEntriesInRange( startTime: Date, endTime: Date, scopeType?: BudgetScope, scopeKey?: string, ): SpendEntry[] { unused(startTime, endTime, scopeType, scopeKey); return []; } getRecentEntries(count: number): SpendEntry[] { unused(count); return []; } getEntriesByModel( modelId: string, startTime?: Date, endTime?: Date, ): SpendEntry[] { unused(modelId, startTime, endTime); return []; } }

import { BudgetController } from "@reaatech/agent-budget-engine"; import { BudgetScope, type BudgetPolicy, type SpendEntry } from "@reaatech/agent-budget-types"; import { SpendStore } from "./spend-store.js"; import type { PricingProvider } from "./pricing-service.js"; export class BudgetService { controller: BudgetController; pricing: PricingProvider; spendStore: SpendStore; constructor(pricing: PricingProvider, spendStore?: SpendStore) { this.pricing = pricing; this.spendStore = spendStore ?? new SpendStore(); this.controller = new BudgetController({ spendTracker: this.spendStore, pricing: this.pricing, }); } defineTenantBudget( tenantId: string, dailyLimit: number, policy?: Partial<BudgetPolicy>, ): void { const budgetPolicy: BudgetPolicy = { softCap: policy?.softCap ?? 0.8, hardCap: policy?.hardCap ?? 1.0, autoDowngrade: policy?.autoDowngrade ?? [], disableTools: policy?.disableTools ?? [], }; this.controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: tenantId, limit: dailyLimit, policy: budgetPolicy, }); } checkBudget( tenantId: string, estimatedCost: number, modelId: string, ): { allowed: boolean; suggestedModel?: string; action: string } { const result = this.controller.check({ scopeType: BudgetScope.User, scopeKey: tenantId, estimatedCost, modelId, tools: [], }); return { allowed: result.allowed, suggestedModel: result.suggestedModel, action: result.action, }; } recordSpend( tenantId: string, requestId: string, cost: number, inputTokens: number, outputTokens: number, modelId: string, provider: string, ): void { const entry: SpendEntry = { requestId, scopeType: BudgetScope.User, scopeKey: tenantId, cost, inputTokens, outputTokens, modelId, provider, timestamp: new Date(), }; this.controller.record(entry); } getBudgetState( tenantId: string, ): { spent: number; remaining: number; state: string } { const state = this.controller.getState(BudgetScope.User, tenantId); return { spent: state?.spent ?? 0, remaining: state?.remaining ?? 0, state: state?.state ?? "Active", }; } resetBudget(tenantId: string): void { this.controller.reset(BudgetScope.User, tenantId); } onHardStop(handler: (event: { scopeType: string; scopeKey: string; spent: number; limit: number }) => void): void { this.controller.on("hard-stop", handler); } }

import { createFallbackChain } from "@reaatech/llm-router-fallback"; import { ModelDefinitionSchema, type ModelDefinition, } from "@reaatech/llm-router-core"; import { createGrokClient, chatWithGrok } from "../lib/grok-client.js"; import { createFallbackClient, chatWithFallback } from "../lib/fallback-client.js"; import type { ChatMessage } from "../lib/types.js"; const MODEL_DEFINITIONS: ModelDefinition[] = [ ModelDefinitionSchema.parse({ id: "grok-3", provider: "xai", costPerMillionInput: 2.0, costPerMillionOutput: 8.0, maxTokens: 131072, capabilities: ["reasoning", "general"], }), ModelDefinitionSchema.parse({ id: "gpt-5.2-mini", provider: "openai", costPerMillionInput: 0.15, costPerMillionOutput: 0.6, maxTokens: 128000, capabilities: ["general"], }), ]; export class RouterService { private grokClient = createGrokClient(); private fallbackClient = createFallbackClient(); async executeWithFallback( messages: ChatMessage[], maxTokens?: number, ): Promise<{ content: string; modelUsed: string; isFallback: boolean; errors: Error[] }> { const chain = createFallbackChain({ name: "cost-control", models: ["grok-3", "gpt-5.2-mini"], circuitBreaker: { failureThreshold: 5, resetTimeoutMs: 60000, halfOpenMaxCalls: 3 }, }); chain.registerModels(MODEL_DEFINITIONS); let captured: { content: string; model: string; usage: { inputTokens: number; outputTokens: number } } | undefined; const chainResult = await chain.executeFrom( "grok-3", async (model: ModelDefinition) => { if (model.id === "grok-3") { captured = await chatWithGrok(this.grokClient, messages, maxTokens); return captured; } if (model.id === "gpt-5.2-mini") { captured = await chatWithFallback(this.fallbackClient, messages, maxTokens); return captured; } throw new Error(`Unknown model: ${model.id}`); }, MODEL_DEFINITIONS, ); return { content: captured?.content ?? "", modelUsed: chainResult.selectedModel.id, isFallback: chainResult.isFallback, errors: chainResult.errors.map((e: unknown) => { const err = e as { message: string } | undefined; return new Error(err?.message ?? String(e)); }), }; } }

import express from "express"; import cors from "cors"; import { loadAppConfig } from "./config.js"; import { SpendStore } from "./services/spend-store.js"; import { PricingService } from "./services/pricing-service.js"; import { BudgetService } from "./services/budget-service.js"; import { TelemetryService } from "./services/telemetry-service.js"; import { RouterService } from "./services/router-service.js"; import { createChatRouter } from "./api/chat.js"; export function createApp() { const app = express(); app.use(cors()); app.use(express.json()); const config = loadAppConfig(); const spendStore = new SpendStore(); const pricingService = new PricingService(); const budgetService = new BudgetService(pricingService, spendStore); const telemetryService = new TelemetryService(); const routerService = new RouterService(); for (const [tenantId, budgetDef] of Object.entries(config.tenantBudgets)) { budgetService.defineTenantBudget(tenantId, budgetDef.dailyLimit, { softCap: budgetDef.softCap, hardCap: budgetDef.hardCap, }); } const chatRouter = createChatRouter( budgetService, telemetryService, routerService, pricingService, ); app.use("/api/chat", chatRouter); app.get("/api/health", (_req, res) => { res.json({ status: "ok", uptime: process.uptime() }); }); app.get("/api/spend", (req, res) => { const tenantId = req.query.tenantId as string | undefined; const spans = tenantId ? telemetryService.getSpans(tenantId) : telemetryService.getAllSpans(); const totalCost = spans.reduce((sum, s) => sum + s.costUsd, 0); const totalInputTokens = spans.reduce((sum, s) => sum + s.inputTokens, 0); const totalOutputTokens = spans.reduce((sum, s) => sum + s.outputTokens, 0); const totalCalls = spans.length; const modelsUsed: Record<string, number> = {}; for (const s of spans) { modelsUsed[s.model] = (modelsUsed[s.model] ?? 0) + 1; } res.json({ totalCost, totalInputTokens, totalOutputTokens, totalCalls, modelsUsed }); }); return app; } const PORT = parseInt(process.env["PORT"] ?? "3001", 10); const app = createApp(); const server = app.listen(PORT, () => { console.log(`Server listening on port ${String(PORT)}`); }); const shutdown = () => { server.close(() => process.exit(0)); }; process.on("SIGTERM", shutdown); process.on("SIGINT", shutdown);

import { HeliconeAsyncLogger, Provider } from "helicone"; import { HeliconeAsyncConfiguration } from "helicone/core/HeliconeAsyncConfiguration.js"; let heliconeAsyncLogger: HeliconeAsyncLogger | null = null; function getLogger(): HeliconeAsyncLogger | null { if (heliconeAsyncLogger !== null) return heliconeAsyncLogger; const apiKey = process.env["HELICONE_API_KEY"]; const baseUrl = process.env["HELICONE_BASE_URL"] ?? "https://api.hconeai.com"; if (!apiKey) return null; const config = new HeliconeAsyncConfiguration({ heliconeMeta: { apiKey, baseUrl }, }); heliconeAsyncLogger = new HeliconeAsyncLogger(config); return heliconeAsyncLogger; } export async function logToHelicone(params: { tenantId: string; request: { role: string; content: string }[]; response: { content: string; model: string }; usage: { inputTokens: number; outputTokens: number }; cost: number; }): Promise<void> { const logger = getLogger(); if (!logger) return; try { const startTime = Date.now(); const endTime = Date.now(); await logger.log( { providerRequest: { url: "https://api.x.ai/v1/chat/completions", json: { model: params.response.model, messages: params.request, max_tokens: 1024, }, meta: { "Helicone-User-Id": params.tenantId, "Helicone-Property-Cost": String(params.cost), }, }, providerResponse: { json: { choices: [{ message: { content: params.response.content, role: "assistant" } }], usage: { prompt_tokens: params.usage.inputTokens, completion_tokens: params.usage.outputTokens, total_tokens: params.usage.inputTokens + params.usage.outputTokens, }, model: params.response.model, }, status: 200, headers: {}, }, timing: HeliconeAsyncLogger.createTiming(startTime, endTime), }, Provider.CUSTOM_MODEL, ); } catch { // Helicone logging is non-critical — swallow errors } }

xAI Grok Cost Control for SMB Customer Support Agents

The problem

Built from

Intro

Prerequisites

Step 1: Inspect the scaffold and configure environment

Example artifact

Comments

Intro

Prerequisites

Step 1: Inspect the scaffold and configure environment

Step 2: Define shared types and validation schemas

Step 3: Create LLM client wrappers for Grok and OpenAI fallback

Step 4: Build the pricing service and in-memory spend store

Step 5: Wire up budget enforcement and cost telemetry

Step 6: Build the fallback router

Step 7: Create the chat API handler

Step 8: Boot the Express server with graceful shutdown

Step 9: Add Helicone logging (non-critical observability)

Step 10: Set up test infrastructure with MSW

Step 11: Run the tests and verify coverage

Step 12: Build the Next.js dashboard and spend API route

Next steps