Small businesses using OpenRouter often see unpredictable LLM bills because one expensive model call can blow their monthly budget. Without granular cost tracking and automatic throttling, spend control is reactive at best.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small businesses using OpenRouter often see unpredictable API bills — one expensive model call can blow the monthly budget. Without automatic throttling and per-tenant cost tracking, spend control is reactive at best.
This tutorial builds a cost-aware proxy that sits between your application and OpenRouter. Every chat completion passes through a budget check — if a tenant’s daily cap is at risk, the proxy downgrades to a cheaper fallback model. Each call is recorded as a cost span, aggregated across tenants and models, and pushed to observability backends.
By the end you’ll have a working Next.js App Router project that enforces per-tenant daily/monthly budgets, routes through a fallback chain with circuit breakers, and exports telemetry. The @reaatech/* package family does the heavy lifting — you wire the pieces together.
Prerequisites
Node.js >= 22 — the project uses modern JavaScript features
pnpm 10 — the package manager is pinned in package.json
Open package.json and check that every dependency is pinned to an exact version — no ^ or ~ prefixes.
Expected output:pnpm install resolves all packages and creates pnpm-lock.yaml with no warnings.
Step 2: Set up environment variables
The proxy reads its configuration from environment variables. Create .env.example at the project root:
env
# OpenRouter Cost Control — environment variables# Keep placeholders only — never commit real values.# Required: OpenRouter API key (from https://openrouter.ai/keys)OPENROUTER_API_KEY=<your-openrouter-key># Budget defaults (in USD)DEFAULT_DAILY_BUDGET=100DEFAULT_MONTHLY_BUDGET=2000# Optional: Per-tenant budget overrides as JSON# Format: {"tenant-id":{"daily":200,"monthly":4000}}TENANT_BUDGETS={"acme-corp":{"daily":200,"monthly":4000},"startup-inc":{"daily":50,"monthly":1000}}# Model selectionPRIMARY_MODEL=openai/gpt-5.2FALLBACK_MODEL_CHAIN=openai/gpt-4o-mini,deepseek/deepseek-v4-flash# TelemetryOTEL_SERVICE_NAME=openrouter-cost-controlLOKI_HOST=http://loki:3100
Copy it to .env.local and add your real OpenRouter API key:
terminal
cp .env.example .env.local
Expected output: The OPENROUTER_API_KEY placeholder is replaced with your key. The budget defaults and model chain are ready.
Step 3: Define the shared types
Create src/types.ts with the domain types every module references:
PACKAGE_NAME is a constant used across the codebase. ChatCompletionBody mirrors the OpenAI chat completion request shape. ProxyConfig holds the settings the proxy needs to talk to OpenRouter and enforce budgets.
Step 4: Build the configuration loader
Create src/lib/config.ts. This module reads environment variables and returns typed config objects:
getEnvVar and getEnvFloat come from @reaatech/llm-cost-telemetry and handle missing values with typed defaults. parseFallbackChain splits a comma-separated string of model IDs into an array. getTenantFromRequest reads the X-Tenant-Id header so you can track spend per customer.
Expected output: Running npx tsx -e "import { parseFallbackChain } from './src/lib/config.js'; console.log(parseFallbackChain('a,b,c'))" prints [ 'a', 'b', 'c' ].
Step 5: Create telemetry helpers
Create src/lib/telemetry.ts — the bridge to the cost telemetry package:
createCostSpan builds a CostSpan with a generated ID and a USD cost computed from total token count. estimateCallCost wraps the calculator’s estimateCost so you can check whether a call fits within budget before forwarding.
Step 6: Set up budget enforcement
Create src/lib/budget-check.ts. This module wraps @reaatech/agent-budget-engine’s BudgetController:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { BudgetScope, type BudgetCheckResult } from "@reaatech/agent-budget-types";export function createBudgetController(): BudgetController { return new BudgetController({ spendTracker: new SpendStore() });}export function defineTenantBudget( controller: BudgetController, tenantId: string, dailyLimit: number, monthlyLimit: number, autoDowngrade?: Array<{ from: string[]; to: string }>,): void { controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: tenantId, limit: dailyLimit, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: autoDowngrade ?? [], }, });}export function checkBudget( controller: BudgetController, tenantId: string, estimatedCost: number, modelId: string, tools?: string[],): BudgetCheckResult { return controller.check({ scopeType: BudgetScope.User, scopeKey: tenantId, estimatedCost, modelId, tools: tools ?? [], });}export function recordSpend( controller: BudgetController, tenantId: string, requestId: string, cost: number, inputTokens: number, outputTokens: number, modelId: string, provider: string,): void { controller.record({ requestId, scopeType: BudgetScope.User, scopeKey: tenantId, cost, inputTokens, outputTokens, modelId, provider, timestamp: new Date(), });}export function onThresholdBreach(event: { threshold: number; scopeType: string; scopeKey: string }): void { console.warn(`Budget threshold breached at ${String(event.threshold * 100)}% for ${event.scopeType}:${event.scopeKey}`);}export function onHardStop(event: { spent: number; limit: number; scopeType: string; scopeKey: string }): void { console.error(`Hard stop triggered for ${event.scopeType}:${event.scopeKey} — spent ${String(event.spent)} / limit ${String(event.limit)}`);}export function onStateChange(event: { from: string; to: string; scopeType: string; scopeKey: string }): void { console.log(`Budget state change for ${event.scopeType}:${event.scopeKey}: ${event.from} -> ${event.to}`);}export function attachBudgetEvents(controller: BudgetController): void { controller.on("threshold-breach", onThresholdBreach); controller.on("hard-stop", onHardStop); controller.on("state-change", onStateChange);}
The budget lifecycle uses a state machine: Active → Warned (80%) → Degraded → Stopped (100%). When a check returns suggestedModel, the proxy switches to that cheaper model. attachBudgetEvents subscribes to the controller’s event emitter so you get console logs when thresholds are breached or a budget hard-stops.
Step 7: Build the fallback chain
Create src/lib/fallback.ts — a wrapper around @reaatech/llm-router-fallback:
ts
import { FallbackChain, createFallbackChain, FallbackChainExhaustedError } from "@reaatech/llm-router-fallback";import type { FallbackChainDefinition, ModelDefinition } from "@reaatech/llm-router-core";export class FallbackExhaustedError extends Error { constructor(message: string) { super(message); this.name = "FallbackExhaustedError"; }}export function buildFallbackChain(name: string, modelIds: string[]): FallbackChain { const definition: FallbackChainDefinition = { name, models: modelIds, circuitBreaker: { failureThreshold: 3, resetTimeoutMs: 60000, halfOpenMaxCalls: 2, }, }; return createFallbackChain(definition);}export function registerFallbackModels(chain: FallbackChain, models: ModelDefinition[]): void { chain.registerModels(models);}export async function executePrimaryOrFallback( chain: FallbackChain, primaryModelId: string, executor: (modelId: string) => Promise<Response>, models: ModelDefinition[],): Promise<{ response: Response; selectedModel: ModelDefinition; isFallback: boolean }> { let capturedResponse: Response | undefined; try { const result = await chain.executeFrom( primaryModelId, async (model) => { capturedResponse = await executor(model.id); return capturedResponse; }, models, ); if (!capturedResponse) { throw new Error("executor did not return a response"); } return { response: capturedResponse, selectedModel: result.selectedModel, isFallback: result.isFallback, }; } catch (err) { if (err instanceof FallbackChainExhaustedError) { throw new FallbackExhaustedError("all fallback models exhausted"); } throw err; }}
buildFallbackChain creates an ordered fallback chain with a circuit breaker per model. After 3 failures, that model’s breaker opens for 60 seconds. executePrimaryOrFallback tries the primary model first and walks the chain automatically on failure.
Step 8: Create the cost aggregation pipeline
Create src/lib/cost-sink.ts — the aggregation and export side:
ts
import { CostCollector, CostAggregator, BudgetManager,} from "@reaatech/llm-cost-telemetry-aggregation";import { PhoenixExporter, type ExportResult,} from "@reaatech/llm-cost-telemetry-exporters";import type { CostSpan, CostRecord } from "@reaatech/llm-cost-telemetry";interface CostExporterLike { isEnabled: boolean; exportSpans(spans: CostSpan[]): Promise<ExportResult>; exportRecords(records: CostRecord[]): Promise<ExportResult>;}export function createCostPipeline(tenants: Record<string, { daily: number; monthly: number }>): { collector: CostCollector; aggregator: CostAggregator; budgetManager: BudgetManager;} { const aggregator = new CostAggregator({ dimensions: ["tenant", "feature", "provider", "model"], timeWindows: ["hour", "day", "month"], }); const budgetManager = new BudgetManager({ tenants }); const collector = new CostCollector({ maxBufferSize: 1000, flushIntervalMs: 60000, onFlush: (spans: CostSpan[]) => { for (const span of spans) { aggregator.add(span); void budgetManager.record({ tenant: span.tenant ?? "unknown", cost: span.costUsd }); } }, }); return { collector, aggregator, budgetManager };}export function createPhoenixExporter(host: string): PhoenixExporter { return new PhoenixExporter({ host, defaultLabels: { service: "openrouter-cost-control" }, });}export async function pushCostData( exporters: CostExporterLike[], spans: CostSpan[], records: CostRecord[],): Promise<void> { for (const exporter of exporters) { if (exporter.isEnabled) { await exporter.exportSpans(spans); await exporter.exportRecords(records); } }}
CostCollector buffers spans and flushes every 60 seconds or when the buffer hits 1,000 entries. On flush, spans are added to the CostAggregator (grouped by tenant, feature, provider, model) and recorded in the BudgetManager for running totals.
Step 9: Wire up the proxy service
Create src/services/proxy-service.ts — the orchestrator that ties everything together:
ts
import OpenAI from "openai";import { BudgetController } from "@reaatech/agent-budget-engine";import { FallbackChain, FallbackChainExhaustedError } from "@reaatech/llm-router-fallback";import { CostCollector, CostAggregator, BudgetManager,} from "@reaatech/llm-cost-telemetry-aggregation";import { BaseExporter } from "@reaatech/llm-cost-telemetry-exporters";import { NextRequest, NextResponse } from "next/server";import type { ProxyConfig, ChatCompletionBody } from "../types.js";import { getTenantFromRequest } from "../lib/config.js";import { createCostSpan, estimateCallCost } from "../lib/telemetry.js";
The handleChatCompletion method: parses and validates the JSON body, extracts the tenant from the X-Tenant-Id header (defaults to "default"), estimates the cost, checks the budget, and either returns 429 when the budget is exceeded or forwards the request through the fallback chain. On success it records a cost span and reports spend. The forwardToOpenRouter private method sends the request to OpenRouter’s API.
Step 10: Bootstrap the proxy factory
Create src/index.ts — the factory function that wires all components together:
This factory reads the configuration, sets up the budget controller with per-tenant limits (parsing the TENANT_BUDGETS JSON), builds the fallback chain, creates the cost pipeline, optionally creates the Phoenix exporter when LOKI_HOST is set, and constructs the OpenAI client pointed at OpenRouter’s API.
Step 11: Wire up Next.js route handlers
Create three API routes under app/api/.
Health check — app/api/health/route.ts:
ts
import { NextResponse } from "next/server";export function GET(): NextResponse { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString() });}
Next.js 16 requires params to be awaited — it’s a Promise. Use NextRequest for typed request objects and always return NextResponse.json(...) to ensure the Content-Type: application/json header is set.
Step 12: Write and run the tests
Create a test for the health endpoint at tests/app/health-route.test.ts. Import route handlers with .js extensions per the NodeNext module resolution:
ts
import { describe, it, expect } from "vitest";import { GET } from "../../app/api/health/route.js";describe("GET /api/health", () => { it('returns 200 with { status: "ok", timestamp: <iso-string> }', async () => { const res = GET(); expect(res.status).toBe(200); const data = await res.json() as Record<string, unknown>; expect(data.status).toBe("ok"); expect(data.timestamp).toBeDefined(); expect(typeof data.timestamp).toBe("string"); expect(() => new Date(data.timestamp as string)).not.toThrow(); }); it("response has Content-Type: application/json header", () => { const res = GET(); expect(res.headers.get("Content-Type")).toBe("application/json"); });});
Create a test for the config module at tests/lib/config.test.ts:
ts
import { describe, it, expect, afterEach } from "vitest";import { PACKAGE_NAME } from "../../src/types.js";import { getProxyConfig, parseFallbackChain, getTenantFromRequest, loadAppConfig,} from "../../src/lib/config.js";const OLD_ENV = { ...process.env };afterEach(() => { process.env = { ...OLD_ENV };});describe("getProxyConfig", () => { it("returns primaryModel from env when PRIMARY_MODEL is set", () => { process.env["PRIMARY_MODEL"] = "openai/gpt-5.2"; process.env["FALLBACK_MODEL_CHAIN"] = "openai/gpt-4,anthropic/claude-3"; process.env["DEFAULT_DAILY_BUDGET"] = "50"; process.env["DEFAULT_MONTHLY_BUDGET"] = "1000"; const config = getProxyConfig(); expect(config.primaryModel).toBe("openai/gpt-5.2"); expect(config.fallbackModels).toEqual(["openai/gpt-4", "anthropic/claude-3"]); expect(config.defaultDailyBudget).toBe(50); expect(config.defaultMonthlyBudget).toBe(1000); }); it("returns defaults when env vars are missing", () => { delete process.env["PRIMARY_MODEL"]; delete process.env["FALLBACK_MODEL_CHAIN"]; const config = getProxyConfig(); expect(config.primaryModel).toBe("openai/gpt-5.2"); expect(config.fallbackModels).toEqual([]); expect(config.defaultDailyBudget).toBe(100); expect(config.defaultMonthlyBudget).toBe(2000); });});describe("types", () => { it("PACKAGE_NAME is defined", () => { expect(PACKAGE_NAME).toBe("openrouter-cost-control"); });});describe("parseFallbackChain", () => { it('returns ["a","b","c"] for "a,b,c"', () => { expect(parseFallbackChain("a,b,c")).toEqual(["a", "b", "c"]); }); it('returns [] for ""', () => { expect(parseFallbackChain("")).toEqual([]); }); it("trims whitespace around model names", () => { expect(parseFallbackChain(" a , b , c ")).toEqual(["a", "b", "c"]); });});describe("getTenantFromRequest", () => { it("extracts X-Tenant-Id header", () => { const headers = new Headers({ "X-Tenant-Id": "acme" }); expect(getTenantFromRequest(headers)).toBe("acme"); }); it('returns "default" with missing header', () => { const headers = new Headers(); expect(getTenantFromRequest(headers)).toBe("default"); });});describe("loadAppConfig", () => { it("returns proxy and budget sections", () => { const appConfig = loadAppConfig(); expect(appConfig.proxy).toBeDefined(); expect(appConfig.budget).toBeDefined(); expect(appConfig.proxy.primaryModel).toBeDefined(); });});
Now run the suite:
terminal
pnpm run typecheckpnpm run lintpnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: TypeScript compiles with zero errors, ESLint reports zero issues, and the test runner shows all tests passing with coverage above 90% on src/**/*.ts and app/**/route.ts.
Next steps
Add CloudWatch export — pull in @aws-sdk/client-cloudwatch and configure CloudWatchExporter to push cost metrics to AWS
Build a dashboard — create a page under app/dashboard/ that reads from GET /api/usage/:tenant and visualizes daily/monthly spend with charts
Add streaming support — modify forwardToOpenRouter to stream token-by-token while still recording cost after the stream completes
Webhook alerts — subscribe to the threshold-breach event on BudgetController and POST to Slack or PagerDuty when a tenant hits 80% utilization
Multi-region routing — extend the fallback chain with geographic latency-aware routing for globally distributed tenants
import
{ checkBudget, recordSpend }
from
"../lib/budget-check.js"
;
import { executePrimaryOrFallback, FallbackExhaustedError } from "../lib/fallback.js";
import type { ModelDefinition } from "@reaatech/llm-router-core";