Databricks LLM Observability for SMB Production AI
Drop-in OpenTelemetry tracing and cost attribution for every Databricks model call, visualized in Langfuse, so small teams can monitor LLM performance without building custom instrumentation.
Small businesses deploying Databricks-hosted LLMs lack visibility into latency, token usage, and spend across their applications, making it hard to debug slowdowns or control costs.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an observability pipeline for Databricks-hosted LLMs. When your team deploys models through Databricks Model Serving, you get latency, token usage, and cost data sent to Langfuse — without hand-rolling instrumentation. Every trackModelCall creates an OpenTelemetry GenAI span, attaches per-token cost breakdowns, and exports the trace to Langfuse where you can visualize performance and set alert thresholds.
You’ll build DatabricksWrapper for API calls, ModelSpan for OpenTelemetry span creation via the REAA GenAI semconv packages, CostTracker for token-based pricing, an alert service for threshold checks, and a GET/POST API route that surfaces metrics. The artifact is a Next.js 16 App Router project with full test coverage.
Prerequisites
Node.js >= 22 and pnpm 10 installed
A Databricks workspace with Model Serving enabled and a personal access token (PAT)
A Langfuse account (cloud or self-hosted) with public and secret keys
Basic familiarity with TypeScript, Next.js App Router, and OpenTelemetry concepts
Expected output: The dependencies and devDependencies entries in package.json are filled with exact semver pins.
Step 3: Configure environment variables
Create a .env file from the example template. These variables connect the pipeline to Databricks and Langfuse:
terminal
cp .env.example .env
The .env.example should contain these placeholders:
env
# Env vars used by databricks-llm-observability-for-smb-production-ai.# Keep placeholders only -- never commit real values.NODE_ENV=developmentDATABRICKS_HOST=<your-databricks-workspace-hostname>DATABRICKS_TOKEN=<your-databricks-pat-token>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=<your-langfuse-base-url>OTEL_EXPORTER_OTLP_ENDPOINT=<your-otlp-endpoint-url>OTEL_SERVICE_NAME=databricks-llm-observabilityP95_LATENCY_THRESHOLD_MS=5000ERROR_RATE_THRESHOLD=0.05COST_THRESHOLD_USD=10LOG_LEVEL=info
Expected output:.env exists with real values filled in. Never commit .env to version control.
Step 4: Define the TypeScript types
Create src/lib/types.ts — these types describe the Databricks model request/response shape and the observability metrics the pipeline produces:
Expected output:vitest.config.ts at the project root with 90% coverage thresholds across all four categories.
Step 6: Create the OpenTelemetry instrumentation
This file initializes the OpenTelemetry SDK when Next.js starts its Node.js runtime. It creates an OTLP trace exporter and a Langfuse exporter, then wires them together as a span processor.
Create src/instrumentation.ts:
ts
import { NodeSDK } from "@opentelemetry/sdk-node";import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";import { LangfuseExporter } from "@reaatech/otel-genai-semconv-exporters";import { SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base";export async function register(): Promise<void> { if (process.env.NEXT_RUNTIME !== "nodejs") { return; } try { const otlpExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT, }); const langfuseExporter = new LangfuseExporter({ publicKey: process.env.LANGFUSE_PUBLIC_KEY, secretKey: process.env.LANGFUSE_SECRET_KEY, baseUrl: process.env.LANGFUSE_BASE_URL, }); const sdk = new NodeSDK({ traceExporter: otlpExporter, spanProcessors: [new SimpleSpanProcessor(langfuseExporter)], serviceName: process.env.OTEL_SERVICE_NAME ?? "databricks-llm-observability", }); sdk.start(); await Promise.resolve(); const shutdown = (): void => { void sdk.shutdown().finally(() => process.exit(0)); }; process.on("SIGTERM", shutdown); process.on("SIGINT", shutdown); } catch (err) { console.error("Failed to initialize OTel SDK", err); }}
Expected output:src/instrumentation.ts with the register() function. The edge-runtime guard ensures this only fires in the Node.js runtime.
Now enable the instrumentation hook in next.config.ts so Next.js calls register() at startup:
Expected output:src/lib/databricks-wrapper.ts with 73 lines. The constructor strips https:// from the hostname to avoid double-protocol bugs in the SDK config.
Step 8: Create the span builder
The ModelSpan wraps the REAA SpanBuilder to create and complete OpenTelemetry GenAI spans. The LangfuseSpanManager exports those spans to Langfuse.
Create src/lib/span-builder.ts:
ts
import { SpanBuilder, GEN_AI_ATTRIBUTES } from "@reaatech/otel-genai-semconv-core";import type { LLMRequest, LLMResponse, CostData, ProviderType } from "@reaatech/otel-genai-semconv-core";import { LangfuseExporter } from "@reaatech/otel-genai-semconv-exporters";import { trace, type Span } from "@opentelemetry/api";import type { Span as SdkSpan } from "@opentelemetry/sdk-trace-base";void trace.getTracer("databricks-llm-observability");void (GEN_AI_ATTRIBUTES.REQUEST_MODEL);export class ModelSpan { private builder: SpanBuilder; constructor() { this.builder = new SpanBuilder({ provider: "databricks" as ProviderType, addMessageEvents: true }); } startSpan(request: LLMRequest, modelName: string): Span { return this.builder.startSpan(request, `gen_ai.chat.completion ${modelName}`); } endSpan(span: Span, response: LLMResponse): void { this.builder.addResponse(response); this.builder.setOk(); this.builder.endSpan(); } recordError(span: Span, error: Error): void { this.builder.recordError(error); this.builder.endSpan(); } addCost(span: Span, costData: CostData): void { this.builder.addCostAttributes(costData); }}export class LangfuseSpanManager { private exporter: LangfuseExporter; constructor() { const publicKey = process.env.LANGFUSE_PUBLIC_KEY ?? ""; const secretKey = process.env.LANGFUSE_SECRET_KEY ?? ""; const baseUrl = process.env.LANGFUSE_BASE_URL; const config: { publicKey: string; secretKey: string; baseUrl?: string } = { publicKey, secretKey }; if (baseUrl) { config.baseUrl = baseUrl; } this.exporter = new LangfuseExporter(config); } exportSpan(span: Span): void { this.exporter.export([span as SdkSpan], () => {}); } getFormattedTraces() { return this.exporter.getLangfuseFormat(); } shutdown(): void { void this.exporter.shutdown(); }}export function createSpanPipeline(): { modelSpan: ModelSpan; langfuseManager: LangfuseSpanManager } { return { modelSpan: new ModelSpan(), langfuseManager: new LangfuseSpanManager(), };}
Expected output:src/lib/span-builder.ts with 69 lines. The ModelSpan delegates to the REAA SpanBuilder imported from @reaatech/otel-genai-semconv-core, and LangfuseSpanManager exports spans via LangfuseExporter from @reaatech/otel-genai-semconv-exporters.
Step 9: Build the cost tracker
The CostTracker maintains a pricing table for common Databricks models and calculates token-based costs. It also builds CostSpan objects that the observability pipeline attaches to traces.
Expected output:src/lib/cost-tracker.ts with 102 lines. The tracker pre-loads pricing for four Databricks models and uses TOKENS_PER_UNIT from the REAA cost exporter to convert token counts to USD.
Step 10: Create the alert service
The AlertService compares observability metrics against configurable thresholds and returns a list of exceeded alerts.
Expected output:src/lib/alert-service.ts with 51 lines. The default thresholds (5s p95 latency, 5% error rate, $10 cost) come from DEFAULT_ALERT_THRESHOLDS in types.ts.
Step 11: Wire the observability service
The ObservabilityService is the main orchestrator. It connects every piece: the Databricks wrapper for API calls, the span builder for OTel spans, the cost tracker for pricing, the alert service for threshold checks, and the Langfuse client for fetching traces.
Create src/lib/observability-service.ts:
ts
import type { LLMResponse, CostData } from "@reaatech/otel-genai-semconv-core";import { LLMRequestSchema } from "@reaatech/otel-genai-semconv-core";import type { Span } from "@opentelemetry/api";import type { CostBreakdown, CostSpan } from "@reaatech/otel-cost-exporter-core";import { createDatabricksWrapper } from "./databricks-wrapper.js";import type { DatabricksWrapper } from "./databricks-wrapper.js";import { createSpanPipeline } from "./span-builder.js";import type { ModelSpan, LangfuseSpanManager } from "./span-builder.js";import { createCostTracker } from "./cost-tracker.js";import type { CostTracker } from
Expected output:src/lib/observability-service.ts with 139 lines. The trackModelCall method is the core pipeline: validate request, start span, call Databricks, calculate cost, attach cost, end span, export to Langfuse.
Step 12: Create the API route handler
The route at app/api/observability/route.ts provides two endpoints: GET returns aggregated metrics from Langfuse, and POST evaluates those metrics against alert thresholds.
Expected output:app/api/observability/route.ts with 33 lines. Both handlers use NextRequest and NextResponse from next/server.
Step 13: Create the public API barrel export
Create src/index.ts so consumers can import everything from a single entry point:
ts
export { createObservabilityService } from "./lib/observability-service.js";export { createDatabricksWrapper } from "./lib/databricks-wrapper.js";export { createCostTracker } from "./lib/cost-tracker.js";export { createAlertService } from "./lib/alert-service.js";export type { ObservabilityMetrics, AlertThreshold, AlertStatus, DatabricksModelRequest, DatabricksModelResponse } from "./lib/types.js";export { DEFAULT_ALERT_THRESHOLDS } from "./lib/types.js";
Expected output:src/index.ts re-exporting all factory functions and types.
Step 14: Run the tests
The project includes a full vitest test suite covering every module — Databricks wrapper, span builder, cost tracker, alert service, observability service, instrumentation, API route handlers, integration tests, types, and barrel exports. All external calls are mocked via MSW so no live network is needed.
terminal
pnpm test
Expected output: All 72 tests pass with 90%+ coverage across lines, branches, functions, and statements. The test runner processes these files:
Expected output: Both commands exit 0 with no errors. The next.config.ts has experimental.instrumentationHook: true so the register() function in src/instrumentation.ts is live.
Next steps
Add custom model pricing — call costTracker.setModelPrice("your-model", { inputTokenPrice: 1.5, outputTokenPrice: 5, effectiveDate: "2026-06-01" }) to extend the pricing table to models beyond the four defaults
Wire webhook alerts — extend the POST /api/observability endpoint to push exceeded thresholds to Slack, PagerDuty, or email
Build a metrics dashboard — replace app/page.tsx with a client component that calls GET /api/observability and renders live p95 latency, cost, and error-rate panels
Dual-export to Grafana — the existing OTLP exporter can also send spans to a Grafana OTLP endpoint for a second visualization layer alongside Langfuse