BigCommerce merchants running AI-powered customer support chatbots face unpredictable LLM costs, risking budget overruns. Without real-time spend governance, a traffic spike can lead to surprise bills.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
BigCommerce merchants running AI-powered customer support chatbots face unpredictable LLM costs. Without real-time spend governance, a traffic spike from a flash sale or holiday rush can produce a surprise bill that wipes out margins. This tutorial walks you through building a self-policing cost-control layer that enforces per-tenant budget limits, dynamically routes to cheaper models when budgets tighten, tracks every penny spent, and caches frequently asked questions to avoid redundant API calls. You’ll wire up six @reaatech/* packages inside a Next.js 16 App Router project, orchestrated by a single API route that handles the full budget-check-route-record-cache lifecycle.
Prerequisites
Node.js 22+ and pnpm 10 installed on your machine
An OpenRouter API key — sign up at openrouter.ai/keys and add a credit balance
A Helicone API key — sign up at helicone.ai for cost telemetry (free tier works)
Basic familiarity with TypeScript and Next.js App Router route handlers
Step 1: Create the project and install dependencies
Start by scaffolding a Next.js project and installing the dependencies this recipe needs.
Pin every dependency to an exact semver in package.json — no ^ or ~ prefixes. The scaffold may already do this, but verify with:
terminal
grep -n '\"[~^>]' package.json
Expected output: no lines shown (all versions are bare X.Y.Z).
Step 2: Configure environment variables
This recipe reads configuration from environment variables. Create a .env.example file at the project root:
env
# Env vars used by openrouter-budget-guardrails-for-bigcommerce-smb-customer-support.# Keep placeholders only — never commit real values.NODE_ENV=developmentOPENROUTER_API_KEY=<your-openrouter-key>HELICONE_API_KEY=<your-helicone-key>DEFAULT_DAILY_BUDGET=10.0DEFAULT_MONTHLY_BUDGET=100.0DEFAULT_MODEL=openai/gpt-5.2-miniOTEL_SERVICE_NAME=openrouter-budget-guardrails
Create a .env.local file with real values for local development:
Expected output: the file compiles with pnpm typecheck and the config loader returns defaults when env vars are unset, or real values when they’re present.
Step 4: Define shared types with Zod
The recipe needs a Zod schema for the chat request payload and TypeScript interfaces for responses and budget state. Create src/lib/types.ts:
This module follows a clear lifecycle: create the controller (with event listeners for threshold breaches and hard stops), define per-tenant budgets with soft/hard caps and auto-downgrade rules, check each request against the budget before the LLM call, record spend after the call completes, and query the current budget state.
Step 6: Build the cost tracker
The cost tracker records token usage and cost per LLM call, then logs telemetry to Helicone. Create src/lib/costTracker.ts:
The startSpan method creates a fresh CostSpan with a generated id and zeroed costs. completeSpan uses calculateCostFromTokens from @reaatech/llm-cost-telemetry to compute the dollar cost, validates the shape with CostSpanSchema.parse, and returns the completed span. logToHelicone sends the span asynchronously — it wraps the entire send in a try/catch so that a telemetry failure never breaks the chat flow.
Step 7: Create the model router
The model router configures a pool of models available through OpenRouter and selects the best one based on the current budget state. Create src/lib/modelRouter.ts:
ts
import OpenAI from "openai";import { LLMRouter, parseRouterConfig } from "@reaatech/llm-router-engine";import { BudgetScope, BudgetExceededError } from "@reaatech/agent-budget-types";import type { BudgetState } from "./types.js";const ROUTER_CONFIG_YAML = `models: workhorses: - id: openai/gpt-5.2-mini provider: openai cost_per_million_input: 0.30 cost_per_million_output: 0.60 max_tokens: 128000 capabilities: [general] - id: mistralai/mistral-small-latest provider: mistral cost_per_million_input: 0.20 cost_per_million_output: 0.40 max_tokens: 128000 capabilities: [general] - id: meta-llama/llama-3.3-70b-instruct provider: meta
The YAML router config defines three workhorse models with their OpenRouter pricing, one judge model for quality evaluations, and two routing strategies. The selectStrategy function maps budget state to strategy — Active uses the default cost-optimized strategy, Warned switches to strict cost-optimized, Degraded forces the absolute cheapest model, and Stopped throws a BudgetExceededError.
Step 8: Implement the semantic cache
The semantic cache stores frequently asked product questions and returns cached answers when a similar question comes in. Create src/lib/cache.ts:
The CacheEngine uses an in-memory storage adapter and an OpenAIEmbedder (pointed at OpenRouter for the text-embedding-3-small model). The similarity threshold is set at 0.85 cosine — questions with embeddings closer than that return the cached answer. Use-case segmentation is enabled with "bigcommerce-support" as the default, so cache entries are isolated from other recipes running alongside.
Step 9: Create the API route handler
Now wire the modules together in a Next.js App Router route handler. Create app/api/chat/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { ZodError } from "zod";import { generateId, now } from "@reaatech/llm-cost-telemetry";import { ChatRequestSchema } from "../../../src/lib/types.js";import { createOpenRouterClient, createModelRouter, routeQuery, selectStrategy } from "../../../src/lib/modelRouter.js";import { createBudgetController, defineTenantBudget, withBudgetCheck, recordSpend, getBudgetState, BudgetExceededError } from "../../../src/lib/budgetMiddleware.js";import { CostTracker } from "../../../src/lib/costTracker.js";import { createCacheEngine, getCached, setCache } from "../../../src/lib/cache.js";let cachedBudgetCtrl: ReturnType<typeof createBudgetController> | null
The POST handler follows a six-step pipeline: check the semantic cache for a fast hit, verify the request doesn’t exceed the budget, route through the model router with the strategy matching the budget state, record the spend so the budget state machine updates, log telemetry to Helicone, and cache the response for future identical questions. Error handling catches BudgetExceededError (returns 429), ZodError (returns 400 with validation details), and everything else (returns 500).
Step 10: Set up instrumentation and Next.js config
Next.js 16 supports an instrumentation.ts file that runs at server startup. Use it to initialize the budget, router, cache, and cost tracker services. Create src/instrumentation.ts:
The NEXT_RUNTIME === "nodejs" guard ensures this only runs in the Node.js runtime, not in Edge runtime where Node-only APIs would fail. All imports use dynamic await import() so the Edge bundler never sees them.
Now enable instrumentation in next.config.ts:
ts
import type { NextConfig } from "next";const nextConfig: NextConfig = { experimental: { instrumentationHook: true, } as never,};export default nextConfig;
The key is experimental.instrumentationHook — spelled exactly that way. Without this flag, src/instrumentation.ts is dead code.
Step 11: Create the programmatic entry point
For programmatic reuse outside the HTTP handler, create src/index.ts with an OpenRouterGuardrails class that duplicates the orchestration:
ts
import { generateId, now } from "@reaatech/llm-cost-telemetry";import { createBudgetController, defineTenantBudget, withBudgetCheck, recordSpend, getBudgetState } from "./lib/budgetMiddleware.js";import { CostTracker } from "./lib/costTracker.js";import { createModelRouter, createOpenRouterClient, routeQuery, selectStrategy } from "./lib/modelRouter.js";import { createCacheEngine, getCached, setCache } from "./lib/cache.js";import type { ChatRequest, ChatResponse, BudgetState } from "./lib/types.js";import { loadAppConfig, type AppConfig } from "./config.js";export type { ChatRequest, ChatResponse, BudgetState, AppConfig };const config = loadAppConfig();const openaiClient = createOpenRouterClient
This class is useful when you want to call the guardrails logic from another server-side context — a cron job, a webhook handler, or a different route.
Step 12: Set up the vitest configuration
Before running the tests, create a vitest.config.ts at the project root. This configures vitest with single-threaded execution, v8 coverage with 90% thresholds on all metrics, and scopes coverage tracking to src/** and app/** route files:
Expected output: the config file compiles and vitest picks it up automatically.
Step 13: Run the tests
The test suite uses vitest with vi.mock to mock all external packages and MSW to intercept OpenRouter HTTP calls. Run the full suite:
terminal
pnpm typecheckpnpm lintpnpm test
Expected output: typecheck passes with zero errors, lint passes with zero warnings, and vitest reports numFailedTests=0 with coverage metrics (lines, branches, functions, statements) all at 90% or above.
Try the API with curl:
terminal
pnpm dev
In another terminal:
terminal
curl -X POST http://localhost:3000/api/chat \ -H 'Content-Type: application/json' \ -d '{"prompt":"How do I track my order?","customerId":"cust-1","tenantId":"store-1"}'
Expected output: a JSON response with reply, model, costUsd, cached, inputTokens, outputTokens, and latencyMs. The first call returns cached: false; the second call with the same prompt returns cached: true with costUsd: 0.
Next steps
Add persistent storage — Replace InMemoryAdapter and SpendStore with Redis-backed adapters so budget state and cache survive server restarts.
Extend the model pool — Add more models to the router YAML config (Mistral, Llama 4, DeepSeek) and define custom routing strategies per use case.
Add per-tenant budget dashboards — Expose a GET /api/admin/budgets/:tenantId endpoint that returns the current budget state, spend history, and auto-downgrade events.
Wire up proper alerting — Replace console.warn in event handlers with a notification service (email, Slack) when a tenant hits soft cap or hard stop.