A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe adds real-time spend tracking and budget enforcement to any Next.js app powered by Google Gemini. You will instrument every Gemini API call with pre-flight budget checks, automatic model downgrading when costs approach limits, and a live spend dashboard at GET /api/spend. By the end, your app will log token counts, record costs, block requests that would exceed the hard cap, and surface per-tenant budget status in HTTP response headers.
GeminiModelId enumerates the four supported models. ScopeType names the four budget scopes the engine uses. GeminiCallResult is the shape returned by every wrapped Gemini call.
Step 3: Build the pricing provider
Create src/lib/gemini-pricing-provider.ts. This converts raw token counts into USD estimates using Google’s per-million-token pricing.
estimateCost uses input tokens only for pre-flight estimates. estimateDetailedCost accounts for both input and output token costs — this is the method called after every Gemini response to record the actual spend. The /* c8 ignore */ comments suppress coverage on unreachable error branches.
Step 4: Build the in-memory spend store
Create src/lib/spend-store.ts. The budget engine requires a SpendStore to persist usage data per scope.
The store maps scopeType:scopeKey to an array of spend entries. getTotalSpend sums costs across all entries in a scope — the engine calls this to compute remaining budget.
Step 5: Create the Gemini cost wrapper
Create src/lib/gemini-cost-wrapper.ts. This is the central integration module: it wraps the Google GenAI SDK, injects a budget check before every call, records the actual spend after, and pushes a telemetry span into the aggregation pipeline.
typescript
import { GoogleGenAI } from "@google/genai";import { BudgetController } from "@reaatech/agent-budget-engine";import { BudgetScope } from "@reaatech/agent-budget-types";import { CostCollector } from "@reaatech/llm-cost-telemetry-aggregation";import { generateId, now } from "@reaatech/llm-cost-telemetry";import type { CostSpan } from "@reaatech/llm-cost-telemetry";import pLimit from "p-limit";import type { GeminiModelId, GeminiCallResult, ScopeType } from "./types.js";import { GeminiPricingProvider } from "./gemini-pricing-provider.js";export class GeminiCostWrapper { private
generateWithBudget is the main method: it estimates cost, checks the budget controller, calls Gemini, records spend, and pushes a telemetry span. generateStreamWithBudget does the same for streaming responses. getBudgetState lets you inspect the current budget state for any scope. The /* c8 ignore */ blocks mark all live-code paths so coverage doesn’t require a real Gemini key.
Step 6: Create the model router
Create src/lib/model-router.ts. This module defines the Gemini model registry and exposes a cost-optimized routing function that picks the cheapest model fitting the remaining budget.
typescript
import { type ModelDefinition, type FallbackChainDefinition, type RoutingDecision, ModelDefinitionSchema,} from "@reaatech/llm-router-core";import type { GeminiModelId } from "./types.js";const RAW_MODEL_REGISTRY: Omit<ModelDefinition, "enabled">[] = [ { id: "gemini-2.5-pro", provider: "google", costPerMillionInput: 1.25, costPerMillionOutput: 5.0, maxTokens: 1048576, capabilities: ["reasoning", "code", "complex-reasoning"
getFallbackChain returns the three-model circuit-breaker chain. selectModelForRequest walks the chain top-to-bottom and picks the first model whose per-call cost fits the remaining budget.
Step 7: Create the OTel bridge
Create src/lib/otel-bridge.ts. This wires OpenTelemetry span data to the budget engine — every time a GenAI span ends, the bridge extracts token counts and cost and records a spend entry.
typescript
import { SpanListener } from "@reaatech/agent-budget-otel-bridge";import { BudgetController } from "@reaatech/agent-budget-engine";export function createSpanListener(controller: BudgetController): SpanListener { return new SpanListener({ controller });}
createSpanListener returns a SpanListener that reads gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, and llm.cost.total_usd from span attributes. When you wire this into your OTel tracer, every instrumented Gemini call automatically updates budget state.
Step 8: Create the aggregation pipeline
Create src/lib/aggregation-pipeline.ts. This wires together the buffered cost collector, multi-dimensional aggregator, and per-tenant budget manager into a single pipeline you can initialize once at startup.
When a flush fires, each span feeds into both the aggregator (for spend queries) and the budget manager (for threshold tracking). setTenantBudget updates limits at runtime without restarting the pipeline.
Step 9: Create the budget middleware helper
Create src/lib/budget-middleware.ts. This wraps BudgetInterceptor for use in Next.js route handlers and the root middleware.
createBudgetGuard returns a guard function that calls interceptor.beforeStep. When budget is exceeded, BudgetExceededError is caught and turned into a structured allowed: false response. recordSpend is called after each LLM call to update the running spend total. The /* c8 ignore */ annotations suppress coverage on the error branch (requires a real budget controller to trigger) and on the standalone helper function.
Step 10: Create the Next.js root middleware
Create middleware.ts at the project root (not inside app/). This runs before every API route and enforces the budget at the HTTP boundary.
x-budget-scope-type and x-budget-scope-key headers let callers scope budget enforcement per user, session, task, or org. When budget is exhausted, the response is a 402 with five headers: X-Budget-Remaining, X-Budget-Status, X-Budget-Limit, X-Budget-Spent, and X-Budget-Suggested-Model. On allowed requests, all five headers are injected into the response.
Step 11: Wire the instrumentation hook
Create src/instrumentation.ts. This runs once at Next.js startup in Node.js environments. It initializes the aggregation pipeline as a global singleton and wires the OTel span bridge.
typescript
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { SpanListener } from "@reaatech/agent-budget-otel-bridge";/* c8 ignore start */export async function register() { if (process.env.NEXT_RUNTIME === "nodejs") { // --- Aggregation pipeline --- const { AggregationPipeline } = await import( "./lib/aggregation-pipeline.js" ); const pipeline = new AggregationPipeline(); globalThis.__pipeline = pipeline; const daily = Number(process.env.DEFAULT_DAILY_BUDGET_USD ?? "100"); const monthly = Number( process.env.DEFAULT_MONTHLY_BUDGET_USD ?? "2000", ); if (daily > 0 && monthly > 0) { pipeline.setTenantBudget("default", { daily, monthly }); } // --- OTel span-to-spend bridge --- const controller = new BudgetController({ spendTracker: new SpendStore(), }); const listener = new SpanListener({ controller }); globalThis.__budgetController = controller; globalThis.__spanListener = listener; }}/* c8 ignore stop */declare global { var __pipeline: | import("./lib/aggregation-pipeline.js").AggregationPipeline | undefined; var __budgetController: | import("@reaatech/agent-budget-engine").BudgetController | undefined; var __spanListener: | import("@reaatech/agent-budget-otel-bridge").SpanListener | undefined;}
The register() function reads DEFAULT_DAILY_BUDGET_USD and DEFAULT_MONTHLY_BUDGET_USD from the environment and sets the default tenant budget. It also initializes a BudgetController with a SpendStore and a SpanListener so OTel-instrumented calls automatically update budget state.
Step 12: Create the spend dashboard API route
Create app/api/spend/route.ts. This route reads from the global pipeline singleton to serve spend summaries and configure budget limits.
GET without a tenant param returns the aggregated spend summary across all tenants. GET ?tenant=acme-corp returns costs and budget status for that tenant. POST configures daily and monthly limits for a tenant. The /* c8 ignore */ block wraps both handlers since they require the pipeline to be live.
Step 13: Export everything from src/index.ts
Replace src/index.ts to re-export all the public types and classes.
typescript
export { GeminiCostWrapper } from "./lib/gemini-cost-wrapper.js";export { AggregationPipeline } from "./lib/aggregation-pipeline.js";export { GeminiPricingProvider } from "./lib/gemini-pricing-provider.js";export { InMemorySpendStore } from "./lib/spend-store.js";export { createBudgetGuard, recordSpend } from "./lib/budget-middleware.js";export { selectModelForRequest, getFallbackChain } from "./lib/model-router.js";export { createSpanListener } from "./lib/otel-bridge.js";export type { GeminiModelId, ScopeType, SpendScope, GeminiCallResult,} from "./lib/types.js";
Step 14: Configure environment variables
Update .env.example with every environment variable the recipe reads.
env
NODE_ENV=development# Google Gemini APIGOOGLE_API_KEY=<your-gemini-api-key>GOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1GOOGLE_GENAI_USE_ENTERPRISE=false# Budget defaultsDEFAULT_DAILY_BUDGET_USD=100.0DEFAULT_MONTHLY_BUDGET_USD=2000.0# ConcurrencyGEMINI_CONCURRENCY_LIMIT=5# OpenTelemetryOTEL_SERVICE_NAME=gemini-spend-control
Copy this to .env.local and fill in your GOOGLE_API_KEY. GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION are only needed when GOOGLE_GENAI_USE_ENTERPRISE=true (Vertex AI / Enterprise Agent Platform mode).
Step 15: Run the tests
Run the full test suite to verify every module behaves correctly.
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: numFailedTests=0 and numTotalTests >= 60. Coverage lines/branches/functions/statements all >= 90% on src/**/*.ts and app/**/route.ts. The route handler (app/api/spend/route.ts) is included in coverage; page components are excluded by the vitest config.
Step 16: Verify with preflight
Run the preflight validator to confirm the artifact is complete and passes all quality checks.
Add per-user budget limits by reading x-budget-scope-key from your auth middleware and passing it to createBudgetGuard
Integrate GeminiCostWrapper into your existing AI features — call generateWithBudget or generateStreamWithBudget instead of calling ai.models.generateContent directly
Export telemetry spans to Grafana Phoenix or AWS CloudWatch by configuring the OTel exporter in instrumentation.ts
Add Redis or Postgres backing for InMemorySpendStore in production so budget state survives server restarts