Small businesses deploying AI agents on multiple models through Vercel AI Gateway lack visibility into token consumption, latency, and failure rates across providers. Without centralized monitoring, they cannot pinpoint cost spikes, detect degradation, or enforce budgets, leading to runaway bills and unreliable customer experiences.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small businesses running AI agents on multiple models through Vercel AI Gateway quickly lose visibility into token consumption, latency, and failure rates across providers. Without centralized monitoring, cost spikes go unnoticed, performance degrades silently, and budgets get blown. This tutorial builds an observability layer that auto-instruments every LLM call with OpenTelemetry GenAI semantics, tracks per-agent spend, enforces budgets, and sends Slack alerts — without writing per-provider instrumentation code.
You’ll use the REAA telemetry ecosystem (@reaatech/otel-genai-semconv-core, @reaatech/llm-cost-telemetry, and friends) to instrument your gateway, aggregate costs in Supabase, export traces to Langfuse, and surface everything through a Next.js admin dashboard with a Hono API backend.
Prerequisites
Node.js 22+ — the project uses "node": ">=22" and ESM ("type": "module")
pnpm 10+ — the package manager is pinned in package.json as "packageManager": "pnpm@10.0.0"
Vercel AI Gateway account and API key — the dashboard fetches real-time metrics from the gateway
Langfuse account — for trace export (get your public and secret keys from the project settings)
Supabase project — for persisting cost spans, budget configs, and alerts
Slack webhook URL — optional, for budget alert notifications
TypeScript and basic Next.js App Router familiarity
Step 1: Scaffold the Next.js project and install dependencies
Create a new Next.js 16 project with the App Router and TypeScript:
Next, edit package.json to pin every dependency to an exact version (no ^ or ~ prefixes). The scaffolded file will include Next and React. Add all the remaining dependencies your observability layer needs:
Expected output: pnpm creates a pnpm-lock.yaml and installs all packages into node_modules/.
Step 2: Configure environment variables
Create .env.example in the project root with every variable the system reads:
env
# Env vars used by vercel-ai-gateway-observability-for-smb-ai-agent-operations.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# OpenTelemetryOTEL_SERVICE_NAME=vercel-ai-gateway-obsOTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318# LangfuseLANGFUSE_PUBLIC_KEY=<your-public-key>LANGFUSE_SECRET_KEY=<your-secret-key># SupabaseSUPABASE_URL=<your-supabase-url>SUPABASE_ANON_KEY=<your-anon-key>NEXT_PUBLIC_SUPABASE_URL=<your-supabase-url>NEXT_PUBLIC_SUPABASE_ANON_KEY=<your-anon-key># SlackSLACK_WEBHOOK_URL=<your-slack-webhook-url># Budget defaultsDEFAULT_DAILY_BUDGET=100DEFAULT_MONTHLY_BUDGET=2000
Copy it to .env.local and fill in your real values:
terminal
cp .env.example .env.local
Step 3: Create typed configuration with Zod
Create src/config.ts. This module reads environment variables, validates them with a Zod schema, and exports a singleton config object:
Expected output: Fields like otelEndpoint and serviceName have sensible defaults. Required fields (supabaseUrl, langfusePublicKey, etc.) throw a ZodError at startup if missing.
Step 4: Create the Supabase client wrapper
Create src/lib/supabase.ts for typed database access:
ts
import { createClient } from "@supabase/supabase-js";import type { CostSpan } from "@reaatech/llm-cost-telemetry";import { config } from "../config.js";export const supabase = createClient(config.supabaseUrl, config.supabaseAnonKey);interface DbResult<T> { data: T | null; error: Error | null;}export async function insertCostSpan(span: CostSpan): Promise<CostSpan> { const { data, error }: DbResult<CostSpan> = await supabase.from("cost_spans").insert(span).select().single(); if (error) throw error; return data as CostSpan;}export async function getBudgetConfigs(): Promise<Record<string, unknown>[]> { const { data, error }: DbResult<Record<string, unknown>[]> = await supabase.from("budget_configs").select("*"); if (error) throw error; return data as Record<string, unknown>[];}export async function saveAlert(alert: Record<string, unknown>): Promise<Record<string, unknown>> { const { data, error }: DbResult<Record<string, unknown>> = await supabase.from("alerts").insert(alert).select().single(); if (error) throw error; return data as Record<string, unknown>;}export async function getAlertConfigs(): Promise<Record<string, unknown>[]> { const { data, error }: DbResult<Record<string, unknown>[]> = await supabase.from("alert_configs").select("*"); if (error) throw error; return data as Record<string, unknown>[];}
Expected output: Each function returns typed data or throws on error. The createClient call uses the singleton config object from step 3.
Step 5: Build the span enricher
Create src/lib/span-enricher.ts. This wraps @reaatech/otel-genai-semconv-core’s SpanBuilder to build OpenTelemetry spans with GenAI semantic convention attributes:
ts
import { SpanBuilder, GEN_AI_ATTRIBUTES, createAttributeMapper, type LLMRequest, type LLMResponse, type CostData, type ProviderType } from "@reaatech/otel-genai-semconv-core";export class SpanEnricher { private builder: SpanBuilder; private mapper: ReturnType<typeof createAttributeMapper>; constructor(provider: string) { void GEN_AI_ATTRIBUTES; this.builder = new SpanBuilder({ provider: provider as ProviderType, addMessageEvents: true, addChoiceEvents: true }); this.mapper = createAttributeMapper(provider as ProviderType); } buildRequestSpan(request: LLMRequest, spanName?: string) { return this.builder.startSpan(request, spanName ?? "gen_ai.chat.completion"); } enrichWithResponse(_span: unknown, response: LLMResponse) { this.builder.addResponse(response); } addCostData(_span: unknown, costData: CostData) { this.builder.addCostAttributes(costData); } recordError(_span: unknown, error: Error) { this.builder.recordError(error); } finalizeOk(_span: unknown) { void _span; this.builder.setOk(); this.builder.endSpan(); } mapFinishReason(reason: string) { return this.mapper.mapFinishReason(reason); }}export function createSpanEnricher(provider: string) { return new SpanEnricher(provider);}
Expected output: The enricher manages a SpanBuilder instance internally. The buildRequestSpan method defaults the span name to "gen_ai.chat.completion" when none is provided.
Step 6: Create the cost tracker
Create src/lib/cost-tracker.ts. This tracks LLM call costs using the REAA cost telemetry ecosystem:
Expected output:trackCall() builds a CostSpan, validates it with CostSpanSchema.parse(), emits it to the CostCollector for in-memory aggregation, and persists it to Supabase.
Step 7: Wire up the cost aggregation service
Create src/services/cost-aggregation-service.ts. This is the central hub that owns the CostCollector, CostAggregator, and BudgetManager from @reaatech/llm-cost-telemetry-aggregation:
ts
import { CostCollector, CostAggregator, BudgetManager } from "@reaatech/llm-cost-telemetry-aggregation";import type { AppConfig } from "../config.js";import { getBudgetConfigs } from "../lib/supabase.js";export class CostAggregationService { collector: CostCollector; aggregator: CostAggregator; budget: BudgetManager; constructor(config: AppConfig) { this.aggregator = new CostAggregator({ dimensions: ["tenant", "feature", "provider", "model"], timeWindows: ["hour", "day", "month"], }); this.budget = new BudgetManager({ global: { daily: config.defaultDailyBudget, monthly: config.defaultMonthlyBudget }, tenants: {}, alerts: [ { threshold: 0.5, action: "log" }, { threshold: 0.75, action: "notify" }, { threshold: 0.9, action: "block" }, ], }); this.collector = new CostCollector({ maxBufferSize: 1000, flushIntervalMs: 60000, onFlush: (spans) => { for (const s of spans) { this.aggregator.add(s); void this.budget.record({ tenant: s.tenant ?? "default", cost: s.costUsd }); } }, }); } async init() { try { const configs = await getBudgetConfigs(); for (const c of configs) { const row = c as { tenant: string; daily?: number; monthly?: number }; this.budget.setLimits(row.tenant, { daily: row.daily, monthly: row.monthly }); } } catch { // budget configs not available — use defaults } } async checkBudget(tenant: string, estimatedCost: number) { return await this.budget.check({ tenant, estimatedCost }); } getTenantCosts(tenant: string, period?: string) { const window = period as "hour" | "day" | "month" | undefined; return this.aggregator.getByTenant(tenant, window); } getSummary(options?: { period?: string; groupBy?: string[] }) { return this.aggregator.getSummary({ period: (options?.period ?? "day") as "hour" | "day" | "month", groupBy: options?.groupBy as ("tenant" | "feature" | "provider" | "model")[] | undefined, }); } async flush() { await this.collector.flush(); } close() { void this.collector.close(); }}export async function createCostAggregationService(config: AppConfig) { const service = new CostAggregationService(config); await service.init(); return service;}
Expected output: The constructor wires the three sub-components together. The onFlush callback feeds each flushed span to both the aggregator and the budget manager. init() loads tenant-specific budget overrides from Supabase.
Step 8: Set up OpenTelemetry instrumentation
Create src/services/instrumentation.ts. This initialises the OTel SDK, Langfuse exporter, tracing, and metrics:
Expected output:initInstrumentation() creates and starts the TracingManager, MetricsManager, LangfuseExporter, and the NodeSDK. The instrumentation singleton is used by the rest of the system to access the logger and metrics.
Step 9: Build the telemetry service — the orchestration hub
Create src/services/telemetry-service.ts. This orchestrates the span-enrichment, cost-tracking, and metrics-logging pipeline:
Expected output: A single recordLLMCall() call runs the full pipeline: validate context → build OTel span → enrich with response → compute cost → track and persist → record metrics → log → finalize.
Step 10: Create the Hono API
Create src/api/telemetry.ts. This is a Hono app that exposes REST endpoints for ingesting spans, querying costs, checking budgets, and managing alert configs:
ts
import { Hono } from "hono";import { z } from "zod";import { createSpanEnricher } from "../lib/span-enricher.js";import { createCostTracker } from "../lib/cost-tracker.js";import { createCostAggregationService, CostAggregationService } from "../services/cost-aggregation-service.js";import { TelemetryService } from "../services/telemetry-service.js";import { config } from "../config.js";import { insertCostSpan, getAlertConfigs, saveAlert } from "../lib/supabase.js";import { getLogger } from "@reaatech/llm-cost-telemetry-observability";import { CostSpanSchema } from "@reaatech/llm-cost-telemetry";const logger
Expected output: Eight REST endpoints: span ingestion (POST), per-tenant costs (GET), summary (GET), budget check (POST), budget status (GET), alert config CRUD (POST/GET), and dashboard metrics (GET). All errors route through the onError handler and return a 500 with { error: "internal error" }.
Step 11: Wire up the Next.js route handler and instrumentation hook
Create the Next.js catch-all route at app/api/telemetry/[[...route]]/route.ts to bridge the Hono app into Next.js:
ts
import type { NextRequest } from "next/server";import app from "../../../../src/api/telemetry.js";export async function GET(request: NextRequest) { return app.fetch(request);}export async function POST(request: NextRequest) { return app.fetch(request);}export async function PUT(request: NextRequest) { return app.fetch(request);}export async function DELETE(request: NextRequest) { return app.fetch(request);}
Make sure next.config.ts exports a valid Next.js config:
ts
import type { NextConfig } from "next";const nextConfig: NextConfig = {};export default nextConfig;
Create the instrumentation hook at src/instrumentation.ts to bootstrap OTel and the budget scheduler at server startup:
Expected output: The register() function runs only in the Node.js runtime (not Edge). It initialises the OTel SDK, creates the aggregation service, and starts a cron-based budget alert scheduler. On SIGTERM or SIGINT, the instrumentation is shut down gracefully.
Step 12: Create the budget alert scheduler
Create src/jobs/budget-alerts.ts. This runs a cron job every five minutes that evaluates budget thresholds and sends Slack notifications:
ts
import cron from "node-cron";import type { ScheduledTask } from "node-cron";import { BudgetAlertService } from "../services/budget-alert-service.js";import type { CostAggregationService } from "../services/cost-aggregation-service.js";import { supabase } from "../lib/supabase.js";import { getLogger } from "@reaatech/llm-cost-telemetry-observability";const logger = getLogger({ name: "budget-alert-scheduler" });export function startBudgetAlertScheduler(aggregation: CostAggregationService, slackWebhookUrl: string) { const alertService = new BudgetAlertService({ aggregation, supabaseClient: supabase, slackWebhookUrl, }); const task = cron.schedule("*/5 * * * *", async () => { try { await alertService.evaluateAlerts(); } catch (err) { logger.logError(err, { action: "budgetAlertScheduler" }); } }); return task;}export function stopBudgetAlertScheduler(task: ScheduledTask) { void task.stop();}
Expected output: Every five minutes, the scheduler evaluates all tenants’ budget status. For tenants at 75%+ utilization it sends a Slack notification. For tenants at 90%+ it persists a “block” alert to Supabase and also sends a Slack notification.
Step 13: Create the dashboard page
Create app/dashboard/page.tsx — a server component that fetches metrics from the API and renders summary cards:
tsx
export default async function DashboardPage() { let metrics: Record<string, unknown> = {}; try { const baseUrl = process.env.NEXT_PUBLIC_VERCEL_URL ? `https://${process.env.NEXT_PUBLIC_VERCEL_URL}` : "http://localhost:3000"; const res = await fetch(`${baseUrl}/api/telemetry/dashboard/metrics`, { cache: "no-store" }); if (res.ok) { metrics = await res.json() as Record<string, unknown>; } } catch { // API not available during static generation } return ( <div> <h1 style={{ fontSize: "1.5rem", fontWeight: 600, marginBottom: "1rem" }}>Overview</h1> <div style={{ display: "grid", gridTemplateColumns: "repeat(auto-fill, minmax(200px, 1fr))", gap: "1rem" }}> <SummaryCard title="Total Cost" value={String((metrics as { totalUsd?: number }).totalUsd ?? "—")} /> <SummaryCard title="Total Calls" value={String((metrics as { totalCalls?: number }).totalCalls ?? "—")} /> <SummaryCard title="Input Tokens" value={String((metrics as { totalInputTokens?: number }).totalInputTokens ?? "—")} /> <SummaryCard title="Output Tokens" value={String((metrics as { totalOutputTokens?: number }).totalOutputTokens ?? "—")} /> </div> </div> );}function SummaryCard({ title, value }: { title: string; value: string }) { return ( <div style={{ border: "1px solid #e5e7eb", borderRadius: "0.5rem", padding: "1rem" }}> <p style={{ fontSize: "0.875rem", color: "#6b7280", margin: 0 }}>{title}</p> <p style={{ fontSize: "1.5rem", fontWeight: 700, margin: "0.25rem 0 0" }}>{value}</p> </div> );}
Expected output: The dashboard page fetches from /api/telemetry/dashboard/metrics and renders four summary cards: total cost, total calls, input tokens, and output tokens. The cache: "no-store" flag ensures fresh data on every request.
Step 14: Run the tests
The project includes a comprehensive test suite. Run all tests with coverage reporting:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: All tests pass with 0 failures and coverage metrics (lines, branches, functions, statements) at 90% or higher on runtime code. The coverage report is written to vitest-report.json.
You can also run the type checker and linter:
terminal
pnpm typecheckpnpm lint
Expected output:pnpm typecheck exits with no TypeScript errors. pnpm lint exits with no lint violations.
Next steps
Add per-tenant dashboard views — extend app/dashboard/costs/page.tsx with tables showing cost breakdown by model and provider for each tenant
Wire up Slack notifications — deploy with a real SLACK_WEBHOOK_URL and test the 75%/90% budget alert thresholds by sending spans that push utilization over the limit
Add rate limiting to the Hono API — use Hono’s built-in middleware or a rate-limiter to protect your telemetry ingestion endpoint from abuse
Deploy to Vercel — add NEXT_PUBLIC_VERCEL_URL to your environment variables and deploy; the instrumentation hook bootstraps OTel and the budget scheduler automatically on the server runtime
Extend to multi-model cost calculations — the @reaatech/llm-cost-telemetry-calculator already supports provider-specific pricing; add support for custom model pricing by extending the calculator’s model registry
=
getLogger
({ name:
"telemetry-api"
});
let telemetryService: TelemetryService | null = null;
let aggregation: CostAggregationService | null = null;