Drop-in OpenTelemetry instrumentation that gives SMBs real-time cost, latency, and error insights across all Anthropic API calls, plus pre-built dashboards for Langfuse and Phoenix.
Small businesses using Anthropic's Claude models for customer support or content generation lack visibility into token spend, latency patterns, and sudden error rate changes. Without integrated observability, they overspend and cannot diagnose issues before customers complain.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build an observability layer for Anthropic’s Claude API that gives any small business real-time visibility into token spend, latency patterns, and error rates — without standing up a custom observability stack. You’ll instrument the Anthropic Node.js SDK with OpenTelemetry GenAI semantic convention spans, track per-request cost using @reaatech/llm-cost-telemetry, enforce per-tenant daily budgets with @reaatech/agent-budget-otel-bridge, and export everything to Langfuse or Phoenix via @reaatech/otel-genai-semconv-exporters. By the end, you’ll have a Next.js project with an Express API server that exposes a chat API, a streaming SSE endpoint, a budget status endpoint, and a Langfuse webhook receiver — all wired through a single instrumentation pipeline.
Prerequisites
Node.js >= 22 and pnpm 10.x installed on your machine
A Langfuse account (or a Phoenix instance) for the observability backend — sign up at langfuse.com or phoenix.arize.com
Familiarity with TypeScript, Express, Next.js, and basic OpenTelemetry concepts
Step 1: Create the project and install dependencies
Start by creating a new project directory and initializing it with pnpm. Then create package.json with the full set of dependencies — each pinned to an exact version.
Expected output: pnpm resolves all packages and writes pnpm-lock.yaml. There should be no errors.
Step 2: Configure TypeScript
Create tsconfig.json at the project root. This is a Next.js-compatible TypeScript config with strict mode, JSX support, and the bundler module resolution strategy.
The EXPORTER_BACKEND variable selects the active observability backend. Set it to phoenix if you’re running a Phoenix instance locally. The TENANT_BUDGETS JSON defines per-tenant daily spend caps in USD — acme-corp gets $5/day, widget-co gets $20/day.
Step 4: Define shared types
Create src/types.ts with the domain types used across the recipe. These define the shape of tenant labels, API requests and responses, the Langfuse webhook payload, and budget status.
Expected output: A 44-line TypeScript file exporting six interfaces.
Step 5: Set up Anthropic SDK instrumentation
This is the heart of the observability layer. src/lib/instrumentation.ts wraps the Anthropic client with OpenTelemetry GenAI semantic convention spans using @reaatech/otel-genai-semconv-anthropic, records cost telemetry using @reaatech/llm-cost-telemetry, and exports a lazily-initialized singleton.
Create src/lib/instrumentation.ts:
ts
import Anthropic from "@anthropic-ai/sdk";import { AnthropicInstrumentation } from "@reaatech/otel-genai-semconv-anthropic";import { SpanBuilder, type LLMResponse } from "@reaatech/otel-genai-semconv-core";import { generateId, now, calculateCostFromTokens } from "@reaatech/llm-cost-telemetry";import type { CostSpan } from "@reaatech/llm-cost-telemetry";import type { Span } from "@opentelemetry/api";import type { Message } from "@anthropic-ai/sdk/resources";let instrumentation: AnthropicInstrumentation | null = null;let instrumentedClient: Anthropic | null
The key flow: getInstrumentedClient() lazily creates an Anthropic client, wraps it with AnthropicInstrumentation (which hooks into client.messages.create() to emit OTel spans), and caches the result. The onStart hook sets a provider attribute; onEnd builds an LLMResponse from the Anthropic Message and calls spanBuilder.addResponse() to populate GenAI semantic convention attributes.
Step 6: Create the telemetry layer
src/lib/telemetry.ts sets up the exporters (Langfuse or Phoenix), enriches spans with tenant labels using OpenInference context helpers, and registers the OpenInference instrumentation that runs alongside the REAA instrumentation.
Create src/lib/telemetry.ts:
ts
import { LangfuseExporter, PhoenixExporter } from "@reaatech/otel-genai-semconv-exporters";import { setUser, setSession, setMetadata, setAttributes } from "@arizeai/openinference-core";import { context } from "@opentelemetry/api";import type { Context } from "@opentelemetry/api";import { AnthropicInstrumentation as OIAnthropicInstrumentation } from "@arizeai/openinference-instrumentation-anthropic";import Anthropic from "@anthropic-ai/sdk";const activeExporters: Array<LangfuseExporter | PhoenixExporter> = [];export function setupLangfuseExporter(): LangfuseExporter { const publicKey = process.env.LANGFUSE_PUBLIC_KEY; const secretKey = process.env.LANGFUSE_SECRET_KEY; const exporter = new LangfuseExporter({ publicKey, secretKey }); activeExporters.push(exporter); return exporter;}export function setupPhoenixExporter(): PhoenixExporter { const endpoint = process.env.PHOENIX_ENDPOINT; const exporter = new PhoenixExporter({ endpoint, datasetName: "anthropic-observability", maxSpans: 1000, }); activeExporters.push(exporter); return exporter;}export function enrichWithTenantLabels(tenant: { id: string; name?: string }): Context { const sessionId = crypto.randomUUID(); let ctx = context.active(); ctx = setSession(ctx, { sessionId }); ctx = setUser(ctx, { userId: tenant.id }); ctx = setMetadata(ctx, { tenantName: tenant.name ?? "" }); ctx = setAttributes(ctx, { "budget.scope_type": "user", "budget.scope_key": tenant.id, }); return ctx;}export function registerOpenInferenceInstrumentation(): OIAnthropicInstrumentation { const instance = new OIAnthropicInstrumentation({ traceConfig: {} }); instance.manuallyInstrument(Anthropic); return instance;}export function shutdownExporters(): void { for (const exporter of activeExporters) { void exporter.shutdown(); } activeExporters.length = 0;}export function getActiveExporters(): Array<LangfuseExporter | PhoenixExporter> { return activeExporters;}
The enrichWithTenantLabels function creates an OpenTelemetry context that associates every span with the tenant’s user ID, a session ID, and the budget.scope_type / budget.scope_key attributes that the SpanListener reads for budget enforcement.
Step 7: Build budget enforcement
src/lib/budget.ts implements an in-memory budget controller and wraps it with the @reaatech/agent-budget-otel-bridgeSpanListener. When OTel spans end, the listener reads the budget.scope_type and budget.scope_key attributes and records the spend against the matching tenant’s daily limit.
Create src/lib/budget.ts:
ts
import { SpanListener } from "@reaatech/agent-budget-otel-bridge";import { type BudgetStatus } from "../types.js";interface BudgetEntry { limit: number; used: number;}const budgets = new Map<string, BudgetEntry>();export function defineBudget(args: { scopeKey?: string; limit?: number }): void { const key = args.scopeKey ?? "default"; if (!budgets.has(key)) { budgets.set(key, { limit: args.limit ?? 10, used: 0 }); }}export function createBudgetListener(): SpanListener { budgets.clear(); const raw = process.env.TENANT_BUDGETS; if (raw) { try { const parsed: Record<string, number> = JSON.parse(raw) as Record<string, number>; for (const [tenantId, limit] of Object.entries(parsed)) { defineBudget({ scopeKey: tenantId, limit }); } } catch { console.debug("Invalid TENANT_BUDGETS JSON, using defaults"); } } const controller = { defineBudget, record: (entry: { scopeKey?: string; cost?: number }) => { const key = entry.scopeKey ?? "default"; const budget = budgets.get(key); if (budget) { budget.used += entry.cost ?? 0; } }, }; const listener = new SpanListener({ controller: controller as never }); return listener;}export function getBudgetStatus(tenant: string): BudgetStatus { const budget = budgets.get(tenant); if (!budget) { return { tenant, dailyUsed: 0, dailyLimit: 0, percentage: 0, allowed: true }; } const percentage = budget.limit > 0 ? (budget.used / budget.limit) * 100 : 0; return { tenant, dailyUsed: budget.used, dailyLimit: budget.limit, percentage, allowed: budget.used <= budget.limit, };}export function checkBudget(tenant: string): Promise<{ allowed: boolean; dailyUsed: number; dailyLimit: number }> { const budget = budgets.get(tenant); if (!budget) { return Promise.resolve({ allowed: true, dailyUsed: 0, dailyLimit: 0 }); } return Promise.resolve({ allowed: budget.used <= budget.limit, dailyUsed: budget.used, dailyLimit: budget.limit });}
The SpanListener uses its built-in default scope extractor — it reads budget.scope_type and budget.scope_key from span attributes (which were set by enrichWithTenantLabels). When onSpanEnd(attributes) is called, it records the cost from llm.cost.total_usd against the correct tenant’s budget.
Step 8: Write Express middleware
src/lib/middleware.ts provides the HTTP middleware pipeline: tenant extraction from headers, async error handling, request logging, and centralized error formatting. It also extends the Express Request type with custom properties your routes will use.
The middleware stack works in this order: requestLogger captures timing, tenantMiddleware validates the X-Tenant-Id header and attaches the tenant object to the request, then the route handler runs. If any handler throws, errorHandler catches it and returns a JSON error response.
Step 9: Create the API routes
src/api/routes.ts wires together everything you’ve built so far. It exposes five endpoints: health check, chat (with budget check and tenant context), streaming chat, Langfuse webhook receiver, and budget status query.
Create src/api/routes.ts:
ts
import { Router } from "express";import type { Request, Response } from "express";import { getInstrumentedClient } from "../lib/instrumentation.js";import { enrichWithTenantLabels, getActiveExporters } from "../lib/telemetry.js";import { checkBudget, getBudgetStatus } from "../lib/budget.js";import { asyncHandler } from "../lib/middleware.js";import { context } from "@opentelemetry/api";import { z } from "zod";const chatRequestSchema = z.object({ model: z.string(), messages: z.array( z.
The POST /api/chat route is the most important one. It:
Validates the request body with a Zod schema
Calls checkBudget(tenantId) — returns a 429 if the tenant’s daily budget is exhausted
Creates an enriched OTel context via enrichWithTenantLabels() — this sets the tenant’s user ID, session ID, and budget scope attributes on the current context
Runs the Anthropic API call inside context.with(enrichedCtx, ...) so the generated OTel span inherits those attributes
The instrumented client emits GenAI spans with request metadata, token counts, and cost tracking
Step 10: Wire everything together in the server
src/server.ts is the Express entry point. It loads environment variables, creates the Express app, sets up instrumentation, selects the exporter, mounts routes, and starts listening.
Create src/server.ts:
ts
import "dotenv/config";import express from "express";import { getInstrumentedClient, setupInstrumentation, createSpanBuilder } from "./lib/instrumentation.js";import { setupLangfuseExporter, setupPhoenixExporter, shutdownExporters } from "./lib/telemetry.js";import { createBudgetListener } from "./lib/budget.js";import { tenantMiddleware, errorHandler, requestLogger } from "./lib/middleware.js";import { createRouter } from "./api/routes.js";import { loadConfig } from "@reaatech/llm-cost-telemetry";function main(): void { loadConfig(); const app = express(); app.use(express.json()); app.use(requestLogger); app.use(tenantMiddleware); const client = getInstrumentedClient(); const spanBuilder = createSpanBuilder(); setupInstrumentation(client, spanBuilder); const exporterBackend = process.env.EXPORTER_BACKEND; if (exporterBackend === "phoenix") { setupPhoenixExporter(); } else { setupLangfuseExporter(); } createBudgetListener(); app.use("/", createRouter()); app.use(errorHandler); const port = Number(process.env.PORT ?? 3000); app.listen(port, () => { console.log(`listening on port ${String(port)}`); }); function shutdown(): void { shutdownExporters(); process.exit(0); } process.on("SIGTERM", shutdown); process.on("SIGINT", shutdown);}main();
Before any middleware runs, instrumentation is set up:
getInstrumentedClient() creates and caches the instrumented Anthropic client
The exporter is selected based on EXPORTER_BACKEND — Langfuse by default, Phoenix if set to phoenix
createBudgetListener() reads tenant budgets from the TENANT_BUDGETS env var and registers them with the SpanListener
The project also includes a Next.js frontend (app/page.tsx and app/layout.tsx). Running pnpm dev starts the Next.js dev server, which serves the landing page at port 3000. To run the Express API server instead, use npx tsx src/server.ts (or install tsx globally).
Step 11: Run the tests
The project includes a test suite that covers instrumentation, telemetry, budget enforcement, middleware, routes, and integration flows. Run the type checker first, then the tests:
terminal
pnpm typecheck
Expected output: No TypeScript errors.
terminal
pnpm test
Expected output: vitest runs 85 tests across 11 test files. All tests pass, and coverage exceeds 90% on all metrics (lines, branches, functions, statements).
The project has two run modes. Start the Express API server:
terminal
npx tsx src/server.ts
Expected output: The terminal prints listening on port 3000 (or whatever PORT you set). The server is now running with full instrumentation, budget enforcement, and telemetry export.
In another terminal, test each endpoint:
Health check:
terminal
curl http://localhost:3000/health
Expected output:
json
{"status":"ok","uptime":0.123,"version":"0.1.0"}
Chat (replace with your actual Anthropic API key in .env):
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -H "X-Tenant-Id: acme-corp" \ -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello, what is observability?"}],"maxTokens":200}'
Expected output: A JSON response with id, model, content, usage (input and output token counts), and durationMs. Behind the scenes, this call emitted a GenAI OTel span with all the gen_ai.* attributes, recorded the cost against the acme-corp budget, and exported the span to Langfuse (or Phoenix).
curl -X POST http://localhost:3000/api/chat/stream \ -H "Content-Type: application/json" \ -H "X-Tenant-Id: widget-co" \ -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Count to 5"}],"maxTokens":200}'
Expected output: Server-Sent Events stream with data: lines for each text delta, ending with a data: {"event":"done",...} event.
To see the Next.js landing page instead, run pnpm dev and visit http://localhost:3000 in your browser. The page lists the available API endpoints and shows a status indicator.
Next steps
Add more exporters — the @reaatech/otel-genai-semconv-exporters package also ships a CloudTraceExporter for Google Cloud Trace; wire it in alongside Langfuse or Phoenix
Persist budget state — replace the in-memory BudgetEntry map with a Redis, SQLite, or Postgres backend so budget limits survive server restarts
Add a webhook alert — extend the budget controller to fire a Slack or email notification when a tenant crosses 80% of their daily limit, giving the SMB owner proactive visibility
=
null
;
export function createAnthropicClient(): Anthropic {
const apiKey = process.env.ANTHROPIC_API_KEY;
if (!apiKey) {
throw new Error("ANTHROPIC_API_KEY environment variable is required");
}
return new Anthropic({ apiKey });
}
export function createSpanBuilder(): SpanBuilder {
return new SpanBuilder({
provider: "anthropic",
addMessageEvents: true,
addChoiceEvents: true,
});
}
export function handleInstrumentationStart(span: Span): void {