Small businesses using OpenAI for customer-facing agents watch bills climb from repeated questions and no way to cap monthly spend. They need automatic cost tracking, caching of common answers, and hard budget limits.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small businesses using OpenAI for customer-facing agents watch their bills climb as repeated questions and unmonitored usage pile up. This recipe builds a complete cost-control system that wraps every OpenAI call with automatic token-cost tracking via @reaatech/llm-cost-telemetry, enforces monthly budgets with hard stop limits using @reaatech/agent-budget-engine, and serves cached responses for semantically similar prompts through @reaatech/llm-cache with Redis — cutting redundant API spending. A Next.js dashboard exposes four endpoints for budget state, cost aggregation, usage history, and cache health.
Prerequisites
Node.js 22+ and pnpm 10+ installed
A Next.js project scaffolded with the App Router — or create one with npx create-next-app@latest
OpenAI API key — set as OPENAI_API_KEY
A running Redis instance — set as REDIS_URL (defaults to redis://localhost:6379)
A running PostgreSQL database — set as DATABASE_URL
A Langfuse account (optional) — set as LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY
Familiarity with TypeScript, Next.js App Router, and environment variable management
Step 1: Create the project and install dependencies
Scaffold a fresh Next.js project with the App Router, then install all the cost-control packages in one go.
terminal
npx create-next-app@latest
openai-cost-control
--typescript
--app
--src-dir
cd openai-cost-control
Add the REAA foundation packages for cost telemetry, budget enforcement, caching, and aggregation:
Now open your package.json and verify every dependency is pinned to an exact version with no ^ or ~ prefix. Your dependencies block should look like this:
Expected output:pnpm install completes without errors, and pnpm typecheck reports zero type errors.
Step 2: Configure environment variables
Create a .env.example file with placeholder values for every environment variable the application reads:
env
# Env vars used by openai-cost-control-for-smb-agent-workflows.# Keep placeholders only — never commit real values.NODE_ENV=developmentOPENAI_API_KEY=<your-openai-key>REDIS_URL=redis://localhost:6379DATABASE_URL=postgres://localhost:5432/cost_controlLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.comDEFAULT_DAILY_BUDGET=100DEFAULT_MONTHLY_BUDGET=2000
Copy it to .env.local and fill in your real values.
Step 3: Set up the database connection and schema
Create a database client module that instantiates Postgres via the postgres library and wraps it with Drizzle ORM.
Create src/lib/db.ts:
ts
import postgres from "postgres";import { drizzle } from "drizzle-orm/postgres-js";const databaseUrl = process.env.DATABASE_URL ?? "";const client = postgres(databaseUrl);export const db = drizzle(client);export { client as sql };
Now define the database schema with three tables: tenants for per-tenant budget configuration, budget_alerts for threshold-based alert rules, and usage_log for recording every LLM call’s token and cost data.
export { db, sql } from "../db.js";export { tenants, budgetAlerts, usageLog } from "./schema.js";
Expected output:pnpm typecheck passes. The schema compiles without type errors.
Step 4: Define shared types
Create the types that flow through every layer of the pipeline. These interfaces describe how OpenAI requests and responses look, how budget checks are returned, and how tenant budget configurations are structured.
Step 5: Build the OpenAI wrapper with cost telemetry
This wrapper instantiates the OpenAI client, calls the Responses API (client.responses.create), wraps every response in a validated CostSpan using @reaatech/llm-cost-telemetry, and emits the span to the CostCollector for aggregation. It also catches API errors and connection failures, wrapping them in a custom ApplicationError.
Expected output: The calculateCostFromTokens function computes (tokens / 1,000,000) * pricePerMillion for each cost bucket, and every span passes CostSpanSchema.parse() validation before being emitted.
Step 6: Build the budget middleware
The budget middleware integrates @reaatech/agent-budget-engine into your application. It creates a SpendStore and a BudgetController, defines per-tenant budgets with soft and hard caps, performs pre-flight checks before each LLM call, and records spend after calls complete. It also subscribes to budget events for logging.
Create src/lib/budget-middleware.ts:
ts
import { BudgetController, PolicyEvaluator, DowngradeEngine, ToolFilter } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import type { BudgetScope } from "@reaatech/agent-budget-types";import type { TenantBudgetConfig, BudgetCheckResult } from "./cost-types.js";const store = new SpendStore();const controller = new BudgetController({ spendTracker: store });const policyEvaluator = new PolicyEvaluator();const downgradeEngine = new DowngradeEngine();const toolFilter = new ToolFilter();controller.on("threshold-breach", (event) => { const evt = event as { threshold: number }; console.warn("Budget at " + String(evt.threshold * 100) + "% for scope");});controller.on("hard-stop", (event) => { const evt = event as { spent: number }; console.error("Budget exhausted for scope", evt.spent);});controller.on("budget-reset", (event) => { console.info("Budget reset for scope", event);});void policyEvaluator;void downgradeEngine;void toolFilter;export function loadBudgetsForTenant( tenantId: string, config: TenantBudgetConfig,): void { controller.defineBudget({ scopeType: "user" as BudgetScope, scopeKey: tenantId, limit: config.monthlyBudget, policy: { softCap: config.softCap, hardCap: config.hardCap, autoDowngrade: config.autoDowngrade ? [{ from: ["gpt-5.2"], to: "gpt-5.2-mini" }] : [], disableTools: config.disabledTools, }, });}export function budgetGuard( tenantId: string, estimatedCost: number, modelId: string, tools?: string[],): BudgetCheckResult { const r = controller.check({ scopeType: "user" as BudgetScope, scopeKey: tenantId, estimatedCost, modelId, tools: tools ?? [], }) as Partial<{ allowed: boolean; action: string; suggestedModel: string | null; disabledTools: string[]; remaining: number; }>; const action = r.action as string; return { allowed: r.allowed ?? true, action: action === "hard-stop" ? "Block" : action === "allow" ? "Allow" : "Warn", suggestedModel: r.suggestedModel ?? null, disabledTools: r.disabledTools ?? [], remaining: r.remaining ?? 0, };}export function recordSpend(entry: { requestId: string; tenantId: string; cost: number; inputTokens: number; outputTokens: number; modelId: string; provider: string;}): void { controller.record({ requestId: entry.requestId, scopeType: "user" as BudgetScope, scopeKey: entry.tenantId, cost: entry.cost, inputTokens: entry.inputTokens, outputTokens: entry.outputTokens, modelId: entry.modelId, provider: entry.provider, timestamp: new Date(), });}export function getBudgetState(tenantId: string) { return controller.getState("user" as BudgetScope, tenantId);}export function getDisabledToolsForTenant(tenantId: string): string[] { return controller.getDisabledTools("user" as BudgetScope, tenantId);}
Expected output: The budgetGuard function maps the engine’s internal action values ("hard-stop", "allow") to the public enum ("Block", "Allow", "Warn"). On a budget exceedance, allowed is false and the action is "Block".
Step 7: Build the cache layer with Redis and semantic matching
The cache layer uses @reaatech/llm-cache with a RedisAdapter for exact-match storage and an InMemoryAdapter for vector-based semantic matching. The OpenAIEmbedder generates embeddings using text-embedding-3-small at 1536 dimensions. A similarity threshold of 0.85 controls when a semantic match qualifies as a cache hit.
Create src/lib/cache-layer.ts:
ts
import { CacheEngine, InMemoryAdapter, OpenAIEmbedder, buildPromptHash, buildCacheFingerprint,} from "@reaatech/llm-cache";import { RedisAdapter } from "@reaatech/llm-cache-adapters-redis";import { createClient } from "redis";import { generateId, now, type CostSpan, type TelemetryContext } from "@reaatech/llm-cost-telemetry";import type { OpenAIRequest } from "./cost-types.js";import { makeOpenAICall } from "./wrap-openai.js";const redisUrl = process.env.REDIS_URL ?? "redis://localhost:6379";const storage = new
Expected output: On a cache hit, cachedCompletion returns hit: true with costUsd: 0. On a miss, it calls makeOpenAICall, stores the response with cache.set, and returns the live result.
Step 8: Build the cost aggregation layer
The aggregation layer uses @reaatech/llm-cost-telemetry-aggregation to buffer, flush, and aggregate cost spans across tenants, features, providers, and models. It also manages global daily and monthly budgets with cascading alert thresholds at 50%, 75%, and 90% utilization.
Expected output: The flushHandler callback is wired into the CostCollector via the onFlush constructor option. Each time the collector buffers 1,000 spans or 60 seconds pass, the handler moves spans into the aggregator and budget manager.
Step 9: Add Langfuse observability
The Langfuse integration creates traces for every OpenAI completion and budget event, providing OTel-compliant observability into your cost-control pipeline.
Step 10: Build the settings store (Drizzle-backed CRUD)
The settings store is the database access layer for tenant budget config and usage logs. It reads and writes from the tenants and usage_log tables using Drizzle ORM queries.
Step 11: Build the orchestrator — the unified pipeline
The orchestrator ties every subsystem together into a single processLLMRequest function. The flow is:
Load the tenant’s budget config from the database (or use default env-var values).
Run a budget guard check — if the estimated cost exceeds remaining budget, return immediately.
Try cachedCompletion() — on cache hit, return the cached response.
On cache miss, record the spend, log usage to the database, and trace the completion to Langfuse.
If the OpenAI call itself fails (catch block), log usage with costUsd: 0 so the failure is recorded.
Create src/lib/orchestrator.ts:
ts
import { cachedCompletion } from "./cache-layer.js";import { budgetGuard, recordSpend, loadBudgetsForTenant, getBudgetState } from "./budget-middleware.js";import { traceCompletion } from "./langfuse.js";import { generateId, now, type CostSpan } from "@reaatech/llm-cost-telemetry";import { getTenantCosts, getBudgetStatus, collector } from "./aggregation.js";import { getTenantBudget, logUsage } from "./settings-store.js";import { flushTraces } from "./langfuse.js";import type { OpenAIRequest, OpenAIResponse, BudgetCheckResult } from "./cost-types.js";export { getBudgetState, getDisabledToolsForTenant, budgetGuard } from "./budget-middleware.js";export async function
Expected output:initializeTenant has a complete else branch — when a tenant has no DB entry, it falls back to DEFAULT_DAILY_BUDGET and DEFAULT_MONTHLY_BUDGET env vars with sensible defaults (100 and 2000). The processLLMRequest catch block logs usage with costUsd: 0 on API failure.
Step 12: Create the Next.js API routes
Create four API route handlers under app/api/ that expose the cost-control system to external dashboards and clients.
import { type NextRequest, NextResponse } from "next/server";import { getUsageHistory } from "../../../src/lib/orchestrator.js";export async function GET(req: NextRequest): Promise<Response> { const tenantId = req.nextUrl.searchParams.get("tenantId"); if (!tenantId) { return NextResponse.json({ error: "tenantId required" }, { status: 400 }); } const sinceRaw = req.nextUrl.searchParams.get("since"); const since = sinceRaw ? new Date(sinceRaw) : undefined; const history = await getUsageHistory(tenantId, since); return NextResponse.json(history);}
Create app/api/cache/route.ts — cache health:
ts
import { NextResponse } from "next/server";import { getCacheHealth } from "../../../src/lib/orchestrator.js";export async function GET(): Promise<Response> { const health = await getCacheHealth(); return NextResponse.json(health);}
Expected output: Every route handler uses NextRequest for typed parameters and NextResponse.json() for responses. Missing tenantId on the budget and usage endpoints returns a 400 error. Budget exhaustion on the POST budget endpoint returns a 429 status.
Step 13: Write the tests
The test suite covers every module with happy-path, error-path, and boundary tests. Here are the key test files.
Write the remaining test files for the cache layer, aggregation, budget-manager, settings-store, langfuse, events, and each API route using the same pattern of vi.hoisted mocks and direct function imports.
Expected output:pnpm test exits with numFailedTests: 0 and coverage above 90% across all metrics (lines, branches, functions, statements).
Next steps
Add a customer-facing dashboard — build React components under app/dashboard/ that call the cost, budget, usage, and cache endpoints to display real-time spend data and budget alerts.
Extend tenant config via the settings store — add a PATCH /api/budget endpoint that calls updateTenantBudget() so administrators can adjust budgets per tenant without redeploying.
Wire up the instrumentation hook — add src/instrumentation.ts that calls initCache() and startCollector() on server startup, with experimental.instrumentationHook: true in next.config.ts.
Add multi-provider support — extend wrap-openai.ts to support Anthropic and Google models, routing each provider’s spans through the same cost telemetry pipeline.