As a product manager at a vertical SaaS company, you need to offer AI features to your SMB customers but each customer may use different LLM providers based on their plan. You lack per-tenant cost tracking, making it impossible to charge back usage accurately. This leads to margin erosion and prevents you from scaling AI features profitably. You need a solution that captures LLM costs per tenant and integrates with your existing billing system.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a per-tenant LLM cost chargeback system for a vertical SaaS platform. You’ll track every LLM call by customer tenant, attribute costs to the right provider and model, define per-tenant budgets, detect spend anomalies, and generate Stripe chargeback invoices — all through Next.js API routes. If you’ve ever needed to bill each SMB customer individually for the AI features they use, this is the pattern.
Prerequisites
Node.js 22+ and pnpm 10 installed
A Stripe account with a secret key (for chargeback invoicing)
An OpenAI API key (for GPT-5.2 calls)
An Anthropic API key (for Claude Sonnet 4-6 calls)
Familiarity with Next.js App Router, TypeScript, and basic Zod schemas
Step 1: Scaffold the Next.js project
Create the project with the Next.js App Router and install the core dependencies. The scaffold includes TypeScript, ESLint, Vitest with coverage, and all the workspace configs so you can focus on feature code.
Now install the packages you’ll use — REAA cost-tracking libraries, the Vercel AI SDK with both OpenAI and Anthropic providers, OpenTelemetry API, Zod for validation, and Stripe for billing:
Expected output: Your package.json shows exact-pinned versions (no ^ or ~) for every dependency. The scripts block includes dev, build, typecheck, lint, start, and test.
Step 2: Define shared types with Zod
Create the type definitions that every service in the system shares. You need schemas for tenant configuration, cost events, chargeback records, and cost reports.
Expected output: A clean TypeScript file with Zod-validated schemas and typed interfaces. The CostEventSchema is used later to validate incoming POST bodies on the costs route.
Step 3: Create the TenantStore service
Build an in-memory tenant registry with seed data — three SMB customers (Acme Corp, Beta Inc, Gamma LLC) on different plans. The store supports CRUD operations and validates everything through TenantConfigSchema.
Expected output: A 60-line service that initializes with 3 tenants. Note that tenant-3 (Gamma LLC) has a stripeCustomerId — this tenant is the only one who can generate Stripe chargebacks.
Step 4: Create the CostTracker with SpendStore
Wrap @reaatech/agent-budget-spend-tracker’s SpendStore in a service that records cost events per tenant and exposes query methods: spend totals, rate (dollars per minute), projections, anomaly detection, and filtered entry queries.
Expected output: The CostTracker delegates to SpendStore under the hood. scopeType: BudgetScope.Org paired with scopeKey: event.tenantId is what gives you per-tenant isolation — every query method accepts the same pattern.
Step 5: Create the LlmClient for multi-provider LLM calls
Build a client that calls either OpenAI or Anthropic through the Vercel AI SDK, calculates cost from token usage using a pricing table, and automatically records each call’s cost through the CostTracker.
Create src/services/llm-client.ts:
ts
import { generateText, Output } from "ai";import { openai } from "@ai-sdk/openai";import { anthropic } from "@ai-sdk/anthropic";import type { LanguageModel } from "ai";import { z } from "zod";import { generateId, now, calculateCostFromTokens } from "@reaatech/llm-cost-telemetry";import type { CostEvent } from "../lib/types.js";import { CostTracker } from "./cost-tracker.js";const MODEL_PRICING: Record<string, { input: number; output: number }> = {
Expected output: Both generate() and generateStructured() create a CostEvent and call costTracker.recordCost() before returning. Every LLM call is automatically attributed to the tenant, provider, model, and feature.
Step 6: Create the Reporter for budget utilization and exports
The Reporter generates per-tenant cost reports aggregated by hour/day/week/month, calculates budget utilization percentages, and exports cost data to CloudWatch or Loki/Phoenix.
Create src/services/reporter.ts:
ts
import { now, getWindowStart, getWindowEnd, percentage, roundTo, type CostSpan, type Provider } from "@reaatech/llm-cost-telemetry";import { CloudWatchExporter, PhoenixExporter } from "@reaatech/llm-cost-telemetry-exporters";import { CostTracker } from "./cost-tracker.js";import { TenantStore } from "./tenant-store.js";import type { CostReport } from "../lib/types.js";export class Reporter { private costTracker: CostTracker; private tenantStore: TenantStore; constructor(costTracker: CostTracker, tenantStore: TenantStore) { this.costTracker
Expected output:getBudgetUtilization() uses percentage(spend, budget) from the telemetry library. If Acme Corp has spent $40 against a $50 daily budget, dailyPercent returns 80.
Step 7: Create the BillingService with Stripe chargeback
The BillingService takes a tenant’s LLM spend, creates a Stripe invoice item, and generates an invoice — all keyed to the tenant’s Stripe customer ID. Only tenants with a stripeCustomerId can generate chargebacks.
Expected output: The amount is converted to cents (Math.round(totalCost * 100)) since Stripe expects amounts in the smallest currency unit. The invoice is created with auto_advance: true so Stripe will attempt to collect payment automatically.
Step 8: Create the OpenTelemetry instrumentation
Set up OpenTelemetry cost telemetry that fires inside the Next.js Node.js runtime. The instrumentation.ts file is discovered automatically by Next.js, but requires experimental.instrumentationHook: true in your config.
Create src/instrumentation.ts:
ts
export async function register() { if (process.env.NEXT_RUNTIME === "nodejs") { const { setupOtelCostTelemetry } = await import("./services/otel-setup.js"); await setupOtelCostTelemetry(); }}
Expected output: The register() function uses a dynamic import() for the OTel setup module. This is required because register() runs in both Node and Edge runtimes — the dynamic import ensures Node-only APIs are loaded only when NEXT_RUNTIME === "nodejs".
Step 9: Create the OTel span listener adapter
This adapter bridges OpenTelemetry spans into the cost-tracking system. When an OTel span carries a budget.scope_key attribute, the listener extracts the tenant ID and records the spend.
Expected output: The scopeExtractor reads budget.scope_key from span attributes. If present, the span’s cost is attributed to that tenant. The BudgetController interface is duck-typed with as BudgetController — a type-level import that is erased at compile time.
Step 10: Create the instance registry
Centralize your singleton instances so route handlers can import them without creating duplicate CostTracker or TenantStore objects.
Create src/lib/instances.ts:
ts
import { CostTracker } from "../services/cost-tracker.js";import { TenantStore } from "../services/tenant-store.js";export const costTracker = new CostTracker();export const tenantStore = new TenantStore();
Expected output: Both instances are created once at module load time and shared across every route handler and service that imports from this file.
Step 11: Wire the API route handlers
Now connect everything through Next.js App Router API routes. You’ll create 6 route handlers that expose the chargeback system.
Create app/api/tenants/route.ts — list and create tenants:
ts
import { type NextRequest, NextResponse } from "next/server";import { TenantStore } from "../../../src/services/tenant-store.js";import { TenantConfigSchema } from "../../../src/lib/types.js";import { ZodError } from "zod";const tenantStore = new TenantStore();export function GET() { return NextResponse.json(tenantStore.list());}export async function POST(req: NextRequest) { try { const body = (await req.json()) as Record<string, unknown>; const parsed = TenantConfigSchema.parse(body); const created = tenantStore.create(parsed); return NextResponse.json(created, { status: 201 }); } catch (err) { if (err instanceof ZodError) { return NextResponse.json({ error: err.message }, { status: 400 }); } throw err; }}
Create app/api/tenants/[id]/route.ts — get, update, and delete a single tenant:
ts
import { type NextRequest, NextResponse } from "next/server";import { TenantStore } from "../../../../src/services/tenant-store.js";const tenantStore = new TenantStore();export async function GET( _req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; const tenant = tenantStore.get(id); if (!tenant) { return NextResponse.json({ error: "Tenant not found" }, { status: 404 }); } return NextResponse.json(tenant);}export async function PATCH( req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; const body = (await req.json()) as Record<string, unknown>; const updated = tenantStore.update(id, body); if (!updated) { return NextResponse.json({ error: "Tenant not found" }, { status: 404 }); } return NextResponse.json(updated);}export async function DELETE( _req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; tenantStore.delete(id); return new NextResponse(null, { status: 204 });}
Create app/api/tenants/[id]/costs/route.ts — record and query cost events per tenant:
Create app/api/tenants/[id]/budget/route.ts — query and update budget for a tenant:
ts
import { type NextRequest, NextResponse } from "next/server";import { Reporter } from "../../../../../src/services/reporter.js";import { ZodError } from "zod";import { costTracker, tenantStore } from "../../../../../src/lib/instances.js";const reporter = new Reporter(costTracker, tenantStore);export async function GET( _req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; const utilization = reporter.getBudgetUtilization(id); return NextResponse.json(utilization);}export async function POST( req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; try { const body = (await req.json()) as Record<string, unknown>; const updated = tenantStore.update(id, body); if (!updated) { return NextResponse.json({ error: "Tenant not found" }, { status: 404 }); } return NextResponse.json(updated); } catch (err) { if (err instanceof ZodError) { return NextResponse.json({ error: err.message }, { status: 400 }); } throw err; }}
Create app/api/costs/report/route.ts — generate cost reports for one or all tenants:
ts
import { type NextRequest, NextResponse } from "next/server";import { CostTracker } from "../../../../src/services/cost-tracker.js";import { TenantStore } from "../../../../src/services/tenant-store.js";import { Reporter } from "../../../../src/services/reporter.js";const costTracker = new CostTracker();const tenantStore = new TenantStore();const reporter = new Reporter(costTracker, tenantStore);export function GET(req: NextRequest) { const { searchParams } = req.nextUrl; const tenantId = searchParams.get("tenantId"); const periodRaw = searchParams.get("period") || "day"; const period = periodRaw as "hour" | "day" | "week" | "month"; if (tenantId) { const report = reporter.generateTenantReport(tenantId, period); return NextResponse.json(report); } const reports = reporter.generateAllTenantsReport(period); return NextResponse.json(reports);}
Create app/api/costs/anomalies/route.ts — detect spend spikes for a tenant:
ts
import { type NextRequest, NextResponse } from "next/server";import { CostTracker } from "../../../../src/services/cost-tracker.js";const costTracker = new CostTracker();export function GET(req: NextRequest) { const { searchParams } = req.nextUrl; const tenantId = searchParams.get("tenantId"); const thresholdStdDev = searchParams.get("thresholdStdDev"); if (!tenantId) { return NextResponse.json([]); } const anomalies = costTracker.detectAnomalies( tenantId, thresholdStdDev ? Number(thresholdStdDev) : undefined ); return NextResponse.json(anomalies);}
Create app/api/billing/chargeback/route.ts — generate and list Stripe chargeback invoices:
ts
import { type NextRequest, NextResponse } from "next/server";import { BillingService } from "../../../../src/services/billing-service.js";import { costTracker, tenantStore } from "../../../../src/lib/instances.js";const stripeSecretKey = process.env.STRIPE_SECRET_KEY || "";const billingService = new BillingService(stripeSecretKey, costTracker, tenantStore);export async function GET(req: NextRequest) { const { searchParams } = req.nextUrl; const tenantId = searchParams.get("tenantId"); if (!tenantId) { return NextResponse.json( { error: "tenantId is required" }, { status: 400 } ); } const invoices = await billingService.listInvoices(tenantId); return NextResponse.json(invoices);}export async function POST(req: NextRequest) { try { const body = (await req.json()) as Record<string, unknown>; const { tenantId, periodStart, periodEnd } = body as Record<string, string>; if (!tenantId || !periodStart || !periodEnd) { return NextResponse.json( { error: "tenantId, periodStart, and periodEnd are required" }, { status: 400 } ); } const invoice = await billingService.generateChargeback( tenantId, new Date(periodStart), new Date(periodEnd) ); return NextResponse.json(invoice, { status: 201 }); } catch (err) { const message = err instanceof Error ? err.message : String(err); if (message.includes("STRIPE_ERROR")) { return NextResponse.json( { error: "Stripe chargeback failed", details: message }, { status: 402 } ); } throw err; }}
Expected output: Every route handler uses NextRequest/NextResponse from next/server, NOT bare Request/Response. Dynamic route params use { params }: { params: Promise<{ id: string }> } with an await — the Next 16 pattern.
Step 12: Configure environment variables
Create your .env file from the example and fill in the keys:
env
# Env vars used by agnostic-per-tenant-llm-cost-chargeback-2.OPENAI_API_KEY=<your-openai-key>ANTHROPIC_API_KEY=<your-anthropic-key>STRIPE_SECRET_KEY=<your-stripe-secret-key>DEFAULT_DAILY_BUDGET=100.0OTEL_SERVICE_NAME=llm-cost-chargebackNEXT_PUBLIC_BASE_URL=http://localhost:3000
Expected output:OPENAI_API_KEY and ANTHROPIC_API_KEY are used by the LlmClient. STRIPE_SECRET_KEY is consumed by BillingService. DEFAULT_DAILY_BUDGET is available for budget defaults. OTEL_SERVICE_NAME sets the OpenTelemetry service identity. NEXT_PUBLIC_BASE_URL is used on the client side for API fetches.
Step 13: Run the tests
The test suite covers every service — cost tracking with multi-tenant isolation, billing service with Stripe mocks, reporter aggregation, tenant CRUD, LLM client cost recording, span listener bridging, and full route handler integration tests.
All 99 tests pass with coverage above 90% across all four axes. The Stripe-dependent tests use vi.mock("stripe", ...) to avoid live API calls while verifying the correct invoice items and invoices are created.
Next steps
Add a Stripe webhook handler — listen for invoice.paid / invoice.payment_failed events to update ChargebackRecord.status from "pending" to "paid" or "failed".
Replace the in-memory TenantStore with a database — use Prisma or Drizzle with PostgreSQL so tenant data persists across restarts and can be managed from a dashboard.
Add rate-limiting per tenant — use CostTracker.getTenantRate() to implement an API gateway middleware that rejects requests when a tenant’s spend rate exceeds a threshold.
Wire real OpenTelemetry export — configure OTEL_EXPORTER_OTLP_ENDPOINT and point the OTel setup at a real collector (Jaeger, Grafana Tempo, or a managed OTel backend) to visualize cost data alongside traces.