Small businesses running Cohere-powered support bots have no per-call cost visibility; a single verbose handling loop can silently triple the monthly bill.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe wraps the Cohere TypeScript SDK (cohere-ai) with per-call cost telemetry, OpenTelemetry spans, and real-time budget tracking so small businesses running support bots can see exactly where their LLM budget goes. You’ll build an InstrumentedCohereClient that captures token counts and calculates costs on every chat() and chatStream() call, an in-memory spend dashboard with a Next.js API route, and a polling BudgetWatcher that fires Pino alerts when daily limits are breached.
Prerequisites
Node.js 22+ — runtime for the Next.js app
pnpm 10 — package manager (exact version: 10.0.0; npm or yarn won’t match the lockfile)
Langfuse account — optional but recommended for the OTel dashboard; sign up at langfuse.com
Basic familiarity with Next.js App Router, TypeScript, and pnpm
Step 1: Scaffold the project
Create an empty directory and initialise a Next.js project. This recipe pins every dependency to an exact version so you don’t hit surprises on upgrades.
Create package.json with the full dependency list. The foundation is @reaatech/llm-cost-telemetry and its three companion packages — the calculator for per-model pricing, the observability layer for OTel tracing and Pino logging, and for OpenTelemetry GenAI semantic convention types.
Now create .env.example with the environment variables you’ll wire up in the next steps. The Cohere SDK also reads CO_API_KEY, which should match COHERE_API_KEY.
Expected output: You now have a Next.js 16 project with all dependencies installed at exact versions. Running pnpm ls --depth=0 shows every package above.
Step 2: Create the Cohere config loader
You’ll load runtime configuration from environment variables using Zod schema validation. Create src/lib/config.ts.
The helper calls loadConfig() from the foundation package — it reads OTEL_*, DEFAULT_DAILY_BUDGET, and TENANT_BUDGETS env vars and populates the global configuration. Then CohereConfigSchema.parse(...) validates five fields and throws a ZodError if anything is invalid (for example, a negative dailyBudget).
Expected output:loadCohereConfig() returns a typed CohereConfig object or throws with clear validation errors when env vars are missing or invalid.
Step 3: Define shared types
Create src/cost/types.ts with the domain types used across the codebase.
ts
import { type CostSpan, type CostBreakdown, type BudgetConfig, type BudgetStatus, type TelemetryContext,} from "@reaatech/llm-cost-telemetry";import { type LLMRequest, type LLMResponse } from "@reaatech/otel-genai-semconv-core";export type { CostSpan, CostBreakdown, BudgetConfig, BudgetStatus, TelemetryContext };export type { LLMRequest, LLMResponse };export interface CohereCallRecord extends CostSpan { conversationId: string; tenantId: string; feature: string; model: string;}export interface AggregatedSpend { tenantId: string; periodStart: Date; periodEnd: Date; totalCostUsd: number; totalCalls: number; totalInputTokens: number; totalOutputTokens: number;}export type BudgetAlert = { tenantId: string; level: "warn" | "critical"; utilizationPercent: number; currentSpendUsd: number; limitUsd: number; timestamp: Date;};export interface DashboardQuery { tenantId: string; startDate: string; endDate: string; granularity: "hour" | "day" | "week";}export interface CohereConfig { apiKey: string; model: string; tenantId: string; feature: string; dailyBudget: number;}export interface SpanStore { querySpans(filter: DashboardQuery): AggregatedSpend[];}
Expected output: The foundation types (CostSpan, CostBreakdown, etc.) are re-exported from @reaatech/llm-cost-telemetry and you add four recipe-specific interfaces — CohereCallRecord, AggregatedSpend, BudgetAlert, DashboardQuery, CohereConfig, and SpanStore.
Step 4: Initialise telemetry with Langfuse
Create src/cost/telemetry.ts. This module initialises Langfuse, OTel tracing, metrics, and a Pino logger — all as module-level singletons.
ts
import { TracingManager, MetricsManager, getLogger, CostLogger, type TracingOptions, type MetricsOptions,} from "@reaatech/llm-cost-telemetry-observability";import { Langfuse } from "langfuse";let tracer: TracingManager | undefined;let metrics: MetricsManager | undefined;let logger: CostLogger | undefined;let langfuseClient: Langfuse | undefined;export async function initTelemetry(): Promise<void> { const otlpEndpoint = process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? ""; const serviceName = process.env.OTEL_SERVICE_NAME ?? "cohere-cost-observability"; langfuseClient = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "", secretKey: process.env.LANGFUSE_SECRET_KEY ?? "", baseUrl: process.env.LANGFUSE_HOST ?? "https://us.cloud.langfuse.com", }); const tracingManager = new TracingManager({ serviceName, otlpEndpoint: `${otlpEndpoint}/v1/traces`, } satisfies TracingOptions); tracingManager.init(); tracer = tracingManager; const metricsManager = new MetricsManager({ serviceName, otlpEndpoint: `${otlpEndpoint}/v1/metrics`, } satisfies MetricsOptions); metricsManager.init(); metrics = metricsManager; logger = getLogger({ name: "cohere-cost" });}export async function shutdownTelemetry(): Promise<void> { if (langfuseClient) { langfuseClient.flush(); } if (tracer) { await tracer.close(); } if (metrics) { await metrics.close(); }}export function getTracer(): TracingManager | undefined { return tracer;}export function getMetricsRecorder(): MetricsManager | undefined { return metrics;}export function getCostLogger(): CostLogger | undefined { return logger;}export function getLangfuse(): Langfuse | undefined { return langfuseClient;}
TracingManager opens an OTLP HTTP exporter that sends spans to Langfuse. MetricsManager exports counters and histograms for token usage, costs, and call volumes. Both init() calls are synchronous — the SDK buffers and exports asynchronously.
Expected output: Calling initTelemetry() sets up four global singletons you can retrieve with getTracer(), getMetricsRecorder(), getCostLogger(), and getLangfuse().
Step 5: Build the instrumented Cohere client
This is the core of the recipe. Create src/cost/cohere-wrapper.ts — a wrapper around CohereClientV2 that records every API call as a CostSpan with OTel attributes, logs it, and reports it to the metrics pipeline.
ts
import { CohereClientV2, CohereError, CohereTimeoutError } from "cohere-ai";import { generateId, now, calculateCostFromTokens, CostSpanSchema, type CostSpan, type TelemetryContext,} from "@reaatech/llm-cost-telemetry";void ({} as TelemetryContext);import { addCustomPricing, getPricing } from "@reaatech/llm-cost-telemetry-calculator";import { SpanBuilder } from "@reaatech/otel-genai-semconv-core";import { getTracer, getMetricsRecorder, getCostLogger } from "./telemetry.js";import { type CohereConfig } from "./types.js";addCustomPricing([ { provider:
Three things happen on every call:
SpanBuilder from @reaatech/otel-genai-semconv-core creates an OTel-compliant span with GenAI attributes (gen_ai.request.model, gen_ai.usage.input_tokens, llm.cost.total, etc.).
calculateCostFromTokens converts token counts to USD using real Cohere pricing. The addCustomPricing() call at the top of the file registers per-model rates for three Cohere models.
Telemetry propagation — the span is sent to all three outputs: OTel tracing (recordCostSpan), OTel metrics (recordCostSpan), and the Pino logger (logCostSpan).
The void ({} as TelemetryContext) line suppresses an unused-import warning from TypeScript when TelemetryContext is imported only as a type but the full import brings in the runtime symbol.
The factory function createInstrumentedCohere handles the two-step initialisation: it calls initTelemetry() lazily and only then constructs the client.
Expected output: A class that wraps CohereClientV2.chat() and .chatStream() and returns { response, span, costUsd } on every call, with full OTel tracing and Pino logging in the background.
Step 6: Build the dashboard API route
Create app/api/dashboard/route.ts — a Next.js route handler that stores cost spans in memory and supports aggregated queries.
ts
import { type NextRequest, NextResponse } from "next/server";import { CostSpanSchema, type CostSpan, getWindowStart, getWindowEnd, roundTo } from "@reaatech/llm-cost-telemetry";import { z } from "zod";import { type AggregatedSpend, type DashboardQuery } from "../../../src/cost/types.js";const costStore: CostSpan[] = [];export function addSpan(span: CostSpan): void { costStore.push(span);}export function querySpans(filter: DashboardQuery)
The POST handler accepts any valid CostSpan (validated via CostSpanSchema.parse()). The GET handler filters by tenantId, optional date range, and granularity — it uses getWindowStart() / getWindowEnd() from the foundation package to bucket spans into hour, day, or week windows.
Note the use of NextRequest and NextResponse.json() — this is the required pattern for Next.js App Router route handlers.
Expected output: Two endpoints — POST /api/dashboard accepts a cost span body and returns 201 { stored: true, id } (or 400 on validation failure), and GET /api/dashboard?tenantId=acme&granularity=day returns aggregated spend records.
Step 7: Create the budget watcher
Create src/lib/budget-watch.ts. This module polls the SpanStore on a configurable interval and fires Pino alerts when spend exceeds warning or critical thresholds.
ts
import { getPricing } from "@reaatech/llm-cost-telemetry-calculator";import { calculateCostFromTokens, getWindowStart, getWindowEnd, percentage, retryWithBackoff, type BudgetStatus,} from "@reaatech/llm-cost-telemetry";import { getCostLogger } from "../cost/telemetry.js";import { type DashboardQuery, type SpanStore } from "../cost/types.js";export interface BudgetWatcherConfig { pollIntervalMs: number; warnThreshold: number; critThreshold: number; dailyLimit: number
The start() method uses retryWithBackoff for the initial poll (retries up to 3 times with exponential backoff) then switches to a regular setInterval at the configured pollIntervalMs. The tick() method queries the SpanStore via querySpans(), computes the utilisation percentage, and calls logger.logBudgetAlert() when thresholds are exceeded.
withinBudget() lets you check “can I afford this call?” before making it — it estimates output tokens at 30% of input, computes the projected cost, and checks whether the result fits inside the daily budget.
Expected output: A BudgetWatcher that queries spend every N milliseconds and logs "warn" or "critical" alerts to Pino when spend reaches the configured threshold percentages of the daily limit.
Step 8: Wire up the Next.js instrumentation hook
Create src/instrumentation.ts. Next.js 16 calls register() at server startup when experimental.instrumentationHook is enabled.
The dynamic import() ensures this module only loads in the Node.js runtime — the Edge runtime skips it entirely.
Enable the hook in next.config.ts:
ts
import type { NextConfig } from "next";const nextConfig = { experimental: { instrumentationHook: true, },} as NextConfig;export default nextConfig;
Expected output: Every time your Next.js dev server starts, initTelemetry() fires automatically, connecting Langfuse and OTel before any request arrives.
Step 9: Export the public API surface
Create src/index.ts — a barrel file that exposes everything consumers need to import.
ts
export { createInstrumentedCohere, InstrumentedCohereClient } from "./cost/cohere-wrapper.js";export { BudgetWatcher, createBudgetWatcher } from "./lib/budget-watch.js";export { initTelemetry, shutdownTelemetry } from "./cost/telemetry.js";export type { CohereConfig, CohereCallRecord, AggregatedSpend, BudgetAlert, DashboardQuery } from "./cost/types.js";
Expected output: Other modules or tests can import from ./src/index.js without knowing the internal directory layout — import { createInstrumentedCohere, BudgetWatcher } from "./src/index.js".
Step 10: Run the tests
The recipe includes a full test suite. Run it with:
terminal
pnpm test
Expected output: You’ll see a JSON report. All 66 tests pass with numFailedTests: 0. Coverage meets 90%+ on lines, branches, functions, and statements across runtime code (src/**/*.ts and app/**/route.ts).
Test files live in the tests/ directory mirroring the source structure:
tests/cost/cohere-wrapper.test.ts — mocks the Cohere SDK via vi.mock and tests chat(), chatStream(), error propagation, token accounting, and span ID uniqueness. Verifies that tenantId and feature propagate into the returned CostSpan.
tests/cost/cohere-wrapper-no-telemetry.test.ts — tests the InstrumentedCohereClient in isolation without the telemetry layer initialised, verifying that the wrapper degrades gracefully when tracing is unavailable.
tests/api/dashboard/route.test.ts — tests POST and GET handlers end-to-end via NextRequest, including validation errors, bucket merging, and date-range filtering.
tests/lib/budget-watch.test.ts — tests threshold alerts, zero spend, error handling, start()/stop() lifecycle, and withinBudget() gating.
tests/instrumentation.test.ts — tests that register() calls initTelemetry() only when NEXT_RUNTIME === "nodejs".
tests/lib/config.test.ts — tests loadCohereConfig() with various env var combinations, fallback defaults, and Zod validation failures.
tests/index.test.ts — verifies the public barrel exports, confirming that createInstrumentedCohere, BudgetWatcher, initTelemetry, and shutdownTelemetry are all exported correctly.
All tests mock external network calls (cohere-ai, langfuse, @reaatech/* packages) via vi.mock — no live HTTP from tests.
Next steps
Add a real database backend — replace the in-memory costStore array in the dashboard route with a SQLite or PostgreSQL table so spend data survives restarts.
Extend custom pricing — call addCustomPricing() with your own negotiated Cohere rates or support additional models like command-r7b-12-2024.
Wire up Helicone — the helicone package is already in the dependency list. Route Cohere calls through the Helicone proxy for an alternative cost dashboard and request replay.
Deploy to production — set the env vars on your hosting platform (Vercel, Railway, Fly.io) and point the OTLP exporter at a managed Langfuse instance or your own OpenTelemetry collector.