Ollama AI Observability with Cost Allocation for SMBs

Gain OpenTelemetry tracing and per-department cost attribution for your Ollama LLM deployments running on-prem or at the edge.

ollama observability opentelemetry cost-allocation langfuse traceloop nextjs typescript

The problem

On-prem LLM deployments lack visibility: IT teams can't tell which departments are consuming tokens, how much each call costs in terms of compute or proxy fees, or where bottlenecks occur. Without observability, they can't optimize or perform internal chargebacks.

Built from

Intro

This tutorial walks you through building an Ollama AI Observability system with per-department cost allocation for small-to-medium businesses. You’ll create a Next.js application that wraps every Ollama LLM call with OpenTelemetry tracing from the @reaatech/otel-genai-semconv-instrumentation package, calculates token costs using @reaatech/llm-cost-telemetry-calculator, aggregates usage by tenant and department via @reaatech/llm-cost-telemetry-aggregation, and exposes a dashboard endpoint for SMB admins to see who’s spending what. By the end, you’ll have a fully instrumented chat API and a cost dashboard running on your local Ollama instance.

Prerequisites

Node.js >= 22 and pnpm 10 installed on your machine
Ollama running locally (default: http://127.0.0.1:11434) with at least one model pulled (e.g., llama3.1)
A Langfuse account (free tier works) — get your public and secret keys from the Langfuse project settings
A Traceloop API key
Familiarity with TypeScript, Next.js App Router, and basic OpenTelemetry concepts

Step 1: Scaffold the Next.js project

Create the project with the Next.js App Router and install all dependencies with exact versions.

terminal

npx create-next-app@latest ollama-ai-observability --typescript --eslint --app --src-dir --import-alias

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

177 kB·73 tests·100.0% coverage·vitest passing

SHA-2567b88e37bd373cc34e2f24630cc4420ad615ff220916cb4261f7dea47aaae80df

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 and pnpm 10 installed on your machine
Ollama running locally (default: http://127.0.0.1:11434) with at least one model pulled (e.g., llama3.1)
A Langfuse account (free tier works) — get your public and secret keys from the Langfuse project settings
A Traceloop API key
Familiarity with TypeScript, Next.js App Router, and basic OpenTelemetry concepts

Step 1: Scaffold the Next.js project

Create the project with the Next.js App Router and install all dependencies with exact versions.

terminal

npx create-next-app@latest ollama-ai-observability --typescript --eslint --app --src-dir --import-alias

import { TracerManager, HookManager, ErrorHandler, CircuitBreakerRegistry, type CircuitBreaker, type RequestHookContext, type ResponseHookContext, } from "@reaatech/otel-genai-semconv-instrumentation"; import { createLangfuseExporter } from "@reaatech/otel-genai-semconv-exporters"; import { GEN_AI_ATTRIBUTES } from "@reaatech/otel-genai-semconv-core"; import { logger } from "./logger"; export const tracerManager = new TracerManager({ tracerName: "ollama-observability", tracerVersion: "0.1.0", }); export const hookManager = new HookManager(); export const errorHandler = new ErrorHandler(); export const circuitRegistry = new CircuitBreakerRegistry({ failureThreshold: 3, successThreshold: 2, recoveryTimeoutMs: 60000, }); export const langfuseExporter = createLangfuseExporter({ publicKey: process.env.LANGFUSE_PUBLIC_KEY, secretKey: process.env.LANGFUSE_SECRET_KEY, }); export const onStartHook = (ctx: RequestHookContext) => { const dept = (ctx.request as { department?: string }).department ?? "unknown"; ctx.span.setAttribute("custom.department", dept); }; export const onEndHook = (ctx: ResponseHookContext) => { logger.info({ traceId: ctx.span.spanContext().traceId }, "span ended"); }; hookManager.onStart(onStartHook); hookManager.onEnd(onEndHook); export function getProviderBreaker(provider: string): CircuitBreaker { return circuitRegistry.get(provider); } export function startChatSpan( model: string, metadata?: { department?: string; tenantId?: string } ) { const span = tracerManager.startSpan("gen_ai.chat.completion"); span.setAttribute(GEN_AI_ATTRIBUTES.REQUEST_MODEL, model); span.setAttribute("gen_ai.system", "ollama"); span.setAttribute("custom.department", metadata?.department ?? "unknown"); if (metadata?.tenantId) { span.setAttribute("custom.tenant_id", metadata.tenantId); } return span; } export function endChatSpan(span: { setAttribute: (k: string, v: string | number) => void; end: () => void }, costUsd: number, outputTokens?: number): void { span.setAttribute("llm.cost.total", costUsd); span.setAttribute("gen_ai.usage.output_tokens", outputTokens ?? 0); span.end(); } export function handleChatError(span: { setAttribute: (k: string, v: string | number) => void }, error: unknown) { const errorType = errorHandler.classifyError(error as Parameters<typeof errorHandler.classifyError>[0]); errorHandler.captureError(span as Parameters<typeof errorHandler.captureError>[0], error as Parameters<typeof errorHandler.captureError>[1]); const breaker = getProviderBreaker("ollama"); breaker.recordFailure(errorType, span as Parameters<typeof breaker.recordFailure>[1]); return errorType; }

import { type NextRequest, NextResponse } from "next/server"; import { z } from "zod"; import { instrumentedChat } from "@/src/lib/instrumentation"; import { errorHandler } from "@/src/lib/tracer"; const chatSchema = z.object({ model: z.string().min(1), messages: z .array( z.object({ role: z.enum(["user", "system", "assistant"]), content: z.string(), }) ) .min(1), stream: z.boolean().optional(), }); export async function POST(req: NextRequest) { try { const body: unknown = await req.json(); const result = chatSchema.safeParse(body); if (!result.success) { return NextResponse.json( { error: "Validation failed", details: result.error.issues }, { status: 400 } ); } const department = req.headers.get("x-department") ?? process.env.DEFAULT_DEPARTMENT ?? "unknown"; const tenantId = req.headers.get("x-tenant-id") ?? "default"; const response = await instrumentedChat({ model: result.data.model, messages: result.data.messages, department, tenantId, stream: result.data.stream, }); return NextResponse.json( { content: response.content, traceId: response.traceId, spanId: response.spanId, costUsd: response.costUsd, model: response.model, inputTokens: response.inputTokens, outputTokens: response.outputTokens, }, { status: 200 } ); } catch (error) { if (error instanceof Error && (error as Error & { statusCode?: number }).statusCode === 429) { return NextResponse.json( { error: "Budget exhausted", status: "budget_blocked" }, { status: 429 } ); } const errType = errorHandler.classifyError(error as Parameters<typeof errorHandler.classifyError>[0]); const statusMap: Record<string, number> = { RATE_LIMIT: 429, INVALID_REQUEST: 400, AUTHENTICATION: 401, AUTHORIZATION: 403, TIMEOUT: 504, }; const status = statusMap[errType as string] ?? 500; return NextResponse.json( { error: error instanceof Error ? error.message : "Internal server error" }, { status } ); } }

Ollama AI Observability with Cost Allocation for SMBs

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Configure environment variables

Step 3: Create the logger and types

Step 4: Build the tracer module

Step 5: Build the cost collector module

Step 6: Create the Ollama instrumentation wrapper

Step 7: Wire up Next.js server instrumentation

Step 8: Create the chat API route

Step 9: Create the dashboard API route

Step 10: Update the home page and create the barrel export

Step 11: Write and run the tests

Next steps