SMBs adopting LangChain for multi‑step LLM workflows have no built‑in way to see where latency piles up, which chain step costs the most, or why a particular prompt is bleeding tokens. They either fly blind or pay for a separate SaaS with complex setup.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build an Express sidecar that instruments any LangChain application with OpenTelemetry traces, cost tracking, and real-time metrics. By the end, you’ll have a standalone server exposing /health and /metrics endpoints, with traces exported to Langfuse and budget-attributed cost data aggregated per model and per chain. If you’re running LLM pipelines for internal tools or customer-facing features without visibility into where your spend goes, this recipe gives you that visibility in an afternoon.
A Langfuse account (free tier at cloud.langfuse.com) with public and secret keys
An OpenTelemetry collector endpoint (Langfuse’s OTLP endpoint works — you’ll configure it in .env)
Familiarity with TypeScript and Express routing
Step 1: Scaffold the project and install dependencies
Create a fresh directory, initialize the project, and set it to ESM. You’ll start with an empty package.json that declares all the dependencies, then install everything in one shot.
Expected output: pnpm downloads all packages and generates pnpm-lock.yaml. The node_modules directory appears with all runtime and dev dependencies.
Step 2: Configure TypeScript and linting
This project uses strict TypeScript with NodeNext module resolution (required because the project is ESM). Create the TypeScript config and the ESLint config.
Expected output: No errors (there are no source files yet, so tsc exits silently).
Step 3: Set environment variables
Create a directory src/ and an .env.example file. The recipe reads these variables to configure Langfuse credentials, the OTLP collector endpoint, and the server port.
Now open .env and replace the placeholders with real values. For Langfuse Cloud, the OTLP endpoint is https://cloud.langfuse.com/api/public/otel. Your Langfuse public and secret keys come from your project settings. LANGCHAIN_API_KEY and LANGCHAIN_PROJECT are for LangSmith — they are optional if you only use Langfuse.
Step 4: Create the utility helpers
Create src/utils.ts. This module provides environment validation, PII redaction, safe JSON parsing, string truncation, and duration formatting — small helpers that the rest of the codebase leans on.
ts
import { logger } from "@reaatech/agent-mesh-observability";import { redactSensitiveData } from "@reaatech/agent-runbook-observability";export function validateEnv(vars: string[]): void { for (const v of vars) { const value = process.env[v]; if (value === undefined || value === "") { throw new Error(`Missing required env: ${v}`); } }}export function redactPayload(data: Record<string, unknown>): Record<string, unknown> { return redactSensitiveData(data);}export function safeJsonParse(text: string): unknown { try { return JSON.parse(text); } catch { logger.warn("safeJsonParse failed", { text }); return undefined; }}export function truncateString(s: string, maxLen: number): string { if (s.length <= maxLen) { return s; } return s.slice(0, maxLen) + "…";}export function formatDuration(ms: number): string { if (ms < 1000) { return `${String(ms)}ms`; } const seconds = ms / 1000; if (seconds < 60) { return `${seconds.toFixed(1)}s`; } const minutes = Math.floor(seconds / 60); const remaining = seconds % 60; return `${String(minutes)}m ${remaining.toFixed(0)}s`;}
Expected output: The file should type-check cleanly. Run pnpm typecheck to confirm — you may see errors about missing type declarations for Express, which you’ll fix in the next step.
Step 5: Add type declarations
Express 5 and Supertest don’t ship their own TypeScript types in a way that works with strict ESM and noUncheckedIndexedAccess. Create src/ambient.d.ts to declare the shapes your app needs.
Run pnpm typecheck again — the Express-related errors from step 4 should now be resolved.
Step 6: Create the instrumentation layer
This is where the observability pipeline comes together. src/instrumentation.ts initialises OpenTelemetry, wires up the REAA budget bridge, and exports traces via OTLP.
Create src/instrumentation.ts:
ts
import { SpanListener } from "@reaatech/agent-budget-otel-bridge";import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { getTracingManager } from "@reaatech/agent-eval-harness-observability";import { initOtel, logger, shutdownOtel } from "@reaatech/agent-mesh-observability";import { initLogger } from "@reaatech/agent-runbook-observability";import { NodeTracerProvider, BatchSpanProcessor } from "@opentelemetry/sdk-trace-node";import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";let initialized = false;export function createSpanProcessor(listener: SpanListener) { return { onStart(): void { // No-op required by SpanProcessor interface }, onEnd(span: { attributes: Record<string, unknown> }): void { listener.onSpanEnd(span.attributes); }, forceFlush: () => Promise.resolve(), shutdown: () => Promise.resolve(), };}export async function register(): Promise<void> { if (initialized) { return; } try { initOtel(); const store = new SpendStore(); const controller = new BudgetController({ spendTracker: store }); const listener = new SpanListener({ controller }); const provider = new NodeTracerProvider({ spanProcessors: [ new BatchSpanProcessor(new OTLPTraceExporter()), createSpanProcessor(listener), ], }); provider.register(); const tracingManager = getTracingManager(); tracingManager.init(); await initLogger({ level: "info", service: "langchain-observability-sm" }); initialized = true; } catch (err) { logger.error("Instrumentation initialization failed", { err }); throw err; }}export { shutdownOtel };
The register() function is the single entry point. It creates a SpanListener that feeds span attributes to the budget controller, which tracks spend per scope. A BatchSpanProcessor attached to an OTLPTraceExporter sends traces to the collector endpoint defined by the OTEL_EXPORTER_OTLP_ENDPOINT env var. The createSpanProcessor function wraps SpanListener as an OTel span processor so that every completed span also updates the in-memory spend tracker.
The function is idempotent — calling register() twice is safe because initialized guards against double-setup.
Step 7: Create the metrics aggregator
The /metrics endpoint needs to produce a JSON summary that answers the questions every SMB operator asks: how much have we spent, on which models, and on which chain? Create src/metrics.ts:
The getMetrics function iterates every spend entry in the store and buckets costs by modelId and chainName. Token usage is summed across all entries. Latency percentiles come from the dashboard manager exposed by agent-eval-harness-observability. A 5-second cache prevents the metrics endpoint from re-aggregating on every request when traffic is high. resetMetrics clears the cache and is exported so tests can start from a clean slate.
Step 8: Create the LangChain callback handler
The callback handler wraps LangChain’s BaseCallbackHandler to capture every LLM, chain, and tool invocation as an OpenTelemetry span. Create src/langchain-callback.ts:
ts
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";import type { Serialized } from "@langchain/core/load/serializable";import type { LLMResult } from "@langchain/core/outputs";import type { ChainValues } from "@langchain/core/utils/types";import { getTracingManager } from "@reaatech/agent-eval-harness-observability";import { logger } from "@reaatech/agent-mesh-observability";import { truncateString } from "./utils.js";const MAX_ATTR_LEN = 4096;interface CallbackConfig { listener?: { onSpanEnd: (attributes: Record
Key details: every attribute value is truncated to 4096 characters so a single runaway prompt doesn’t blow up the span payload. The startEvalRunSpan method from agent-eval-harness-observability creates an OpenTelemetry span backed by the SDK internals. The onEnd handler passes span attributes to an optional listener — this is how the budget bridge’s SpanListener gets fed at runtime.
Step 9: Create the Express server
The server exposes two endpoints and wires up logging, graceful shutdown, and error handling. Create src/server.ts:
ts
import express, { type ExpressApp, type ExpressRequest, type ExpressResponse, type ExpressNext } from "express";import { createChildLogger, logger, shutdownOtel } from "@reaatech/agent-mesh-observability";import { getMetrics, resetMetrics } from "./metrics.js";import { register } from "./instrumentation.js";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { type Server } from "http";const store = new SpendStore();let server: Server | undefined;export function buildApp(): ExpressApp { const app = express(); app.use(express.json()); app.use((_req: ExpressRequest, _res: ExpressResponse, next: ExpressNext) => { const requestId = crypto.randomUUID(); const child = createChildLogger({ request_id: requestId }); child.info("Request started"); next(); }); app.get("/health", (_req: ExpressRequest, res: ExpressResponse) => { res.json({ status: "ok", uptime: process.uptime() }); }); app.get("/metrics", (_req: ExpressRequest, res: ExpressResponse) => { try { const data = getMetrics(store); res.setHeader("Content-Type", "application/json"); res.json(data); } catch (err: unknown) { const message = err instanceof Error ? err.message : "Unknown error"; res.status(503).json({ error: "Metrics collection failed", detail: message }); } }); return app;}export async function bootstrap(): Promise<Server | undefined> { await register(); const application = buildApp(); const requiredEnv = [ "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_HOST", "OTEL_EXPORTER_OTLP_ENDPOINT", ]; for (const key of requiredEnv) { if (!process.env[key]) { logger.warn(`Missing env var: ${key}`); } } const port = process.env.PORT ? Number(process.env.PORT) : 3000; server = application.listen(port, () => { logger.info(`Server listening on port ${String(port)}`); }); return server;}export function setupShutdownHandlers(): void { process.on("SIGTERM", () => { if (server) { server.close(() => { void shutdownOtel().then(() => { process.exit(0); }); }); } }); process.on("unhandledRejection", (reason) => { logger.error("Unhandled rejection", { err: reason }); });}export { resetMetrics };
buildApp() returns the Express application without starting it — this is the pattern for testability. bootstrap() calls register() from the instrumentation module, then starts listening. The SpendStore is a module-level singleton so both the instrumentation layer (which writes spend entries via the SpanListener) and the metrics endpoint (which reads them) share the same store.
The SIGTERM handler drains the OTel exporter before exiting so no in-flight traces are lost.
Step 10: Run the tests
Create the Vitest config and a test setup file, then write the test suite.
import { vi } from "vitest";// Stub env vars for test runsvi.stubEnv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318");vi.stubEnv("LANGFUSE_PUBLIC_KEY", "pk-test");vi.stubEnv("LANGFUSE_SECRET_KEY", "sk-test");vi.stubEnv("LANGFUSE_HOST", "cloud.langfuse.com");vi.stubEnv("PORT", "0");
The setup file stubs required environment variables so tests never need a real .env file. PORT=0 tells Express to pick an available port dynamically.
Now create the test files. Create tests/server.test.ts:
The full test suite also covers the callback handler, instrumentation registration, metrics aggregation, utility functions, and end-to-end integration with MSW-mocked OTLP endpoints. Create the remaining test files (tests/utils.test.ts, tests/metrics.test.ts, tests/langchain-callback.test.ts, tests/integration.test.ts, tests/instrumentation.test.ts, and tests/mocks.ts) from the downloadable artifact.
Run the tests:
terminal
pnpm test
Expected output: Vitest runs all test files across six spec files. All tests pass and the coverage report shows each metric (lines, branches, functions, statements) at or above the 90% threshold. A vitest-report.json file is written alongside a coverage/ directory with detailed coverage data.
Next steps
Wire this callback handler into your own LangChain chains by passing createLangChainCallback() in the callbacks array of any LLMChain, RetrievalQAChain, or custom RunnableSequence.
Add a /budget endpoint that exposes remaining budget per scope using BudgetController.getRemainingBudget(), letting you set hard caps and alert when a pipeline exceeds its allocation.
Deploy the Express sidecar alongside your application with Docker Compose — point the OTLP exporter at a self-hosted collector and visualize traces in Langfuse’s trace UI.
<
string
,
unknown
>, overrides
?:
Record
<
string
,
unknown
>)
=>
boolean
};
}
function extractName(serialized: Serialized): string {
const s = serialized as { id?: string[]; name?: string };
if (s.name) {
return s.name;
}
if (Array.isArray(s.id) && s.id.length > 0) {
return s.id[s.id.length - 1] ?? "unknown";
}
return "unknown";
}
class ObservabilityCallbackHandler extends BaseCallbackHandler {
name = "ObservabilityCallbackHandler";
private spans = new Map<string, ReturnType<ReturnType<typeof getTracingManager>["startEvalRunSpan"]>>();