Databricks LLM Observability for SMB Production AI

Drop-in OpenTelemetry tracing and cost attribution for every Databricks model call, visualized in Langfuse, so small teams can monitor LLM performance without building custom instrumentation.

databricks llm-observability opentelemetry langfuse nextjs cost-tracking smb

The problem

Small businesses deploying Databricks-hosted LLMs lack visibility into latency, token usage, and spend across their applications, making it hard to debug slowdowns or control costs.

Built from

Intro

This recipe builds an observability pipeline for Databricks-hosted LLMs. When your team deploys models through Databricks Model Serving, you get latency, token usage, and cost data sent to Langfuse — without hand-rolling instrumentation. Every trackModelCall creates an OpenTelemetry GenAI span, attaches per-token cost breakdowns, and exports the trace to Langfuse where you can visualize performance and set alert thresholds.

You’ll build DatabricksWrapper for API calls, ModelSpan for OpenTelemetry span creation via the REAA GenAI semconv packages, CostTracker for token-based pricing, an alert service for threshold checks, and a GET/POST API route that surfaces metrics. The artifact is a Next.js 16 App Router project with full test coverage.

Prerequisites

Node.js >= 22 and pnpm 10 installed
A Databricks workspace with Model Serving enabled and a personal access token (PAT)
A Langfuse account (cloud or self-hosted) with public and secret keys
Basic familiarity with TypeScript, Next.js App Router, and OpenTelemetry concepts

Step 1: Scaffold the Next.js project

Create the project with the Next.js App Router:

terminal

npx create-next-app@latest databricks-llm-observability --typescript --eslint --app --import-alias "@/*"

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

170 kB·72 tests·97.9% coverage·vitest passing

SHA-256d8571b0f480068323bd4c59f6e2816cd3b769bf4bdb79a3d5c6ac94b4ce77e37

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 and pnpm 10 installed
A Databricks workspace with Model Serving enabled and a personal access token (PAT)
A Langfuse account (cloud or self-hosted) with public and secret keys
Basic familiarity with TypeScript, Next.js App Router, and OpenTelemetry concepts

Step 1: Scaffold the Next.js project

Create the project with the Next.js App Router:

terminal

npx create-next-app@latest databricks-llm-observability --typescript --eslint --app --import-alias "@/*"

import { Config, WorkspaceClient, ApiError } from "@databricks/sdk-experimental"; import { createLogger } from "@reaatech/otel-cost-exporter-core"; import type { DatabricksModelResponse } from "./types"; const logger = createLogger("info", "json"); export class DatabricksWrapper { private _client: WorkspaceClient | null = null; private host: string; private token: string; constructor() { const host = process.env.DATABRICKS_HOST; const token = process.env.DATABRICKS_TOKEN; if (!host || !token) { throw new Error("DATABRICKS_HOST and DATABRICKS_TOKEN must be set"); } this.host = host.replace(/^https?:\/\//, ""); this.token = token; } private get client(): WorkspaceClient { if (!this._client) { const config = new Config({ host: `https://${this.host}`, token: this.token }); this._client = new WorkspaceClient(config); } return this._client; } async verifyConnection(): Promise< { authenticated: true; user: string } | { authenticated: false; error: string } > { try { const me = await this.client.currentUser.me(); logger.info({ user: me.userName }, "Databricks connection verified"); return { authenticated: true, user: me.userName ?? me.displayName ?? "unknown" }; } catch (error) { if (error instanceof ApiError) { return { authenticated: false, error: error.message }; } return { authenticated: false, error: String(error) }; } } async queryServingEndpoint( endpointName: string, payload: Record<string, unknown>, ): Promise<{ response: DatabricksModelResponse; durationMs: number }> { const url = `https://${this.host}/api/2.0/serving-endpoints/${endpointName}/invocations`; const start = performance.now(); const res = await fetch(url, { method: "POST", headers: { Authorization: `Bearer ${this.token}`, "Content-Type": "application/json", }, body: JSON.stringify(payload), }); const durationMs = performance.now() - start; if (res.status === 429 || res.status === 503) { throw new Error(`Serving endpoint rate limited or unavailable: ${String(res.status)}`); } if (!res.ok) { throw new Error(`Serving endpoint error: ${String(res.status)}`); } const response = (await res.json()) as DatabricksModelResponse; return { response, durationMs }; } } export function createDatabricksWrapper(): DatabricksWrapper { return new DatabricksWrapper(); }

import { SpanBuilder, GEN_AI_ATTRIBUTES } from "@reaatech/otel-genai-semconv-core"; import type { LLMRequest, LLMResponse, CostData, ProviderType } from "@reaatech/otel-genai-semconv-core"; import { LangfuseExporter } from "@reaatech/otel-genai-semconv-exporters"; import { trace, type Span } from "@opentelemetry/api"; import type { Span as SdkSpan } from "@opentelemetry/sdk-trace-base"; void trace.getTracer("databricks-llm-observability"); void (GEN_AI_ATTRIBUTES.REQUEST_MODEL); export class ModelSpan { private builder: SpanBuilder; constructor() { this.builder = new SpanBuilder({ provider: "databricks" as ProviderType, addMessageEvents: true }); } startSpan(request: LLMRequest, modelName: string): Span { return this.builder.startSpan(request, `gen_ai.chat.completion ${modelName}`); } endSpan(span: Span, response: LLMResponse): void { this.builder.addResponse(response); this.builder.setOk(); this.builder.endSpan(); } recordError(span: Span, error: Error): void { this.builder.recordError(error); this.builder.endSpan(); } addCost(span: Span, costData: CostData): void { this.builder.addCostAttributes(costData); } } export class LangfuseSpanManager { private exporter: LangfuseExporter; constructor() { const publicKey = process.env.LANGFUSE_PUBLIC_KEY ?? ""; const secretKey = process.env.LANGFUSE_SECRET_KEY ?? ""; const baseUrl = process.env.LANGFUSE_BASE_URL; const config: { publicKey: string; secretKey: string; baseUrl?: string } = { publicKey, secretKey }; if (baseUrl) { config.baseUrl = baseUrl; } this.exporter = new LangfuseExporter(config); } exportSpan(span: Span): void { this.exporter.export([span as SdkSpan], () => {}); } getFormattedTraces() { return this.exporter.getLangfuseFormat(); } shutdown(): void { void this.exporter.shutdown(); } } export function createSpanPipeline(): { modelSpan: ModelSpan; langfuseManager: LangfuseSpanManager } { return { modelSpan: new ModelSpan(), langfuseManager: new LangfuseSpanManager(), }; }

import type { ObservabilityMetrics, AlertThreshold, AlertStatus } from "./types"; import { DEFAULT_ALERT_THRESHOLDS } from "./types"; import { createLogger, parseIntervalMs } from "@reaatech/otel-cost-exporter-core"; const logger = createLogger("info", "text"); const _defaultIntervalMin = parseIntervalMs("5m"); logger.info({ defaultIntervalMin: _defaultIntervalMin }, "AlertService loaded"); export class AlertService { private thresholds: AlertThreshold; constructor(thresholds?: AlertThreshold) { this.thresholds = { ...DEFAULT_ALERT_THRESHOLDS, ...thresholds }; } evaluate(metrics: ObservabilityMetrics): AlertStatus[] { const alerts: AlertStatus[] = []; if (metrics.p95LatencyMs > this.thresholds.p95LatencyMsThreshold) { alerts.push({ thresholdExceeded: true, currentValue: metrics.p95LatencyMs, threshold: this.thresholds.p95LatencyMsThreshold, metricName: "p95LatencyMs", message: `P95 latency ${String(metrics.p95LatencyMs)}ms exceeds threshold of ${String(this.thresholds.p95LatencyMsThreshold)}ms`, }); } if (metrics.errorRate > this.thresholds.errorRateThreshold) { alerts.push({ thresholdExceeded: true, currentValue: metrics.errorRate, threshold: this.thresholds.errorRateThreshold, metricName: "errorRate", message: `Error rate ${String(metrics.errorRate)} exceeds threshold of ${String(this.thresholds.errorRateThreshold)}`, }); } if (metrics.totalCostUsd > this.thresholds.costThresholdUsd) { alerts.push({ thresholdExceeded: true, currentValue: metrics.totalCostUsd, threshold: this.thresholds.costThresholdUsd, metricName: "totalCostUsd", message: `Total cost $${String(metrics.totalCostUsd)} exceeds threshold of $${String(this.thresholds.costThresholdUsd)}`, }); } return alerts; } } export function createAlertService(thresholds?: AlertThreshold): AlertService { return new AlertService(thresholds); }

Databricks LLM Observability for SMB Production AI

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Install dependencies

Step 3: Configure environment variables

Step 4: Define the TypeScript types

Step 5: Configure Vitest

Step 6: Create the OpenTelemetry instrumentation

Step 7: Build the Databricks wrapper

Step 8: Create the span builder

Step 9: Build the cost tracker

Step 10: Create the alert service

Step 11: Wire the observability service

Step 12: Create the API route handler

Step 13: Create the public API barrel export

Step 14: Run the tests

Step 15: Verify with typecheck and lint

Next steps