On-prem LLM deployments lack visibility: IT teams can't tell which departments are consuming tokens, how much each call costs in terms of compute or proxy fees, or where bottlenecks occur. Without observability, they can't optimize or perform internal chargebacks.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an Ollama AI Observability system with per-department cost allocation for small-to-medium businesses. You’ll create a Next.js application that wraps every Ollama LLM call with OpenTelemetry tracing from the @reaatech/otel-genai-semconv-instrumentation package, calculates token costs using @reaatech/llm-cost-telemetry-calculator, aggregates usage by tenant and department via @reaatech/llm-cost-telemetry-aggregation, and exposes a dashboard endpoint for SMB admins to see who’s spending what. By the end, you’ll have a fully instrumented chat API and a cost dashboard running on your local Ollama instance.
Prerequisites
Node.js >= 22 and pnpm 10 installed on your machine
Ollama running locally (default: http://127.0.0.1:11434) with at least one model pulled (e.g., llama3.1)
A Langfuse account (free tier works) — get your public and secret keys from the Langfuse project settings
A Traceloop API key
Familiarity with TypeScript, Next.js App Router, and basic OpenTelemetry concepts
Step 1: Scaffold the Next.js project
Create the project with the Next.js App Router and install all dependencies with exact versions.
Now open package.json and replace the dependencies and devDependencies sections with these exact pinned versions. The REAA (REAA Technologies) packages handle instrumentation, cost calculation, and aggregation. Third-party packages provide logging, validation, and observability backends.
Now enable the experimental.instrumentationHook flag in next.config.ts — this is required for Next.js to invoke your startup instrumentation when the server starts.
Replace the placeholder values with your actual Langfuse and Traceloop keys. The DEPARTMENT_BUDGETS JSON sets daily and monthly spending caps per department.
Expected output: A working .env file alongside .env.example (which you should keep as a template for other developers).
Step 3: Create the logger and types
Start with the foundational modules. The logger uses pino for structured JSON logging, and the types file defines the interfaces shared across the application.
Expected output: Two small modules. src/lib/logger.ts exports a pino logger instance. src/lib/types.ts exports the five interfaces.
Step 4: Build the tracer module
The tracer module creates the OTel tracer, lifecycle hooks, error handler, per-provider circuit breakers, and the Langfuse exporter. These are the building blocks for instrumenting every LLM call.
TracerManager creates and manages the OTel tracer. You give it a name and version.
HookManager registers onStart and onEnd lifecycle hooks — the start hook injects the department as a span attribute, the end hook logs the trace ID.
CircuitBreakerRegistry creates per-provider circuit breakers. Here it’s configured with a failure threshold of 3 and a 60-second recovery timeout.
startChatSpan creates a span named gen_ai.chat.completion with the model, provider, department, and tenant attributes.
endChatSpan records the cost and output tokens before ending the span.
handleChatError classifies the error using ErrorHandler, captures it on the span, and records the failure in the circuit breaker.
Expected output: A module that exports tracer components, lifecycle hooks, and helper functions for managing chat spans.
Step 5: Build the cost collector module
The cost collector combines cost calculation, multi-dimensional aggregation, and budget enforcement. It buffers cost spans, groups them by tenant and department, and checks budgets before allowing LLM calls.
Create src/lib/cost-collector.ts:
ts
import { CostCollector, CostAggregator, BudgetManager,} from "@reaatech/llm-cost-telemetry-aggregation";import type { CostSpan, BudgetStatus } from "@reaatech/llm-cost-telemetry";export { calculateCost, countText } from "@reaatech/llm-cost-telemetry-calculator";import { getPricing, addCustomPricing } from "@reaatech/llm-cost-telemetry-calculator";import { langfuseExporter } from "./tracer";import Langfuse from "langfuse";import type { DashboardResponse } from "./types";export const costCollector = new CostCollector({ maxBufferSize: 500,
The onFlush callback is the central pipeline: when CostCollector flushes its buffer, each span is fed to CostAggregator for multi-dimensional grouping and to BudgetManager for running totals. The spans are also exported to Langfuse through the Langfuse exporter.
Expected output: A cost collector module with five key exports: recordCostSpan, checkBudget, getDashboardData, configureDepartmentBudgets, and initOllamaPricing.
Step 6: Create the Ollama instrumentation wrapper
The instrumentation module is the heart of the system. It wraps Ollama’s chat() method with OpenTelemetry spans, budget checks, circuit breakers, retry logic (for non-streaming), and cost recording. It exports two functions: instrumentedChat for single-response completions (with optional internal streaming handling) and instrumentedChatStream for dedicated streaming use.
Create src/lib/instrumentation.ts:
ts
import { Ollama } from "ollama";import { RetryHandler, ChunkAggregator, instrumentStream, StreamingHandler,} from "@reaatech/otel-genai-semconv-instrumentation";import { startChatSpan, endChatSpan, handleChatError, getProviderBreaker, hookManager,} from "./tracer";import { checkBudget, recordCostSpan, calculateCost, countText } from "./cost-collector";import { logger } from "./logger";import type { InstrumentedChatResponse } from "./types";const OLLAMA_COST_PROVIDER = "ollama" as never
The instrumentedChat function handles both streaming and non-streaming completions in a single API. When stream: true is passed, it iterates through stream chunks with StreamingHandler and records time-to-first-token. The non-streaming path uses RetryHandler for automatic retries on transient failures.
The instrumentedChatStream function is the dedicated streaming path — it uses ChunkAggregator to reassemble streamed chunks into a complete response while still recording cost and trace metadata.
Both paths follow the same pattern: check the budget, start a chat span, execute lifecycle hooks, check the circuit breaker, make the Ollama call, calculate costs, record the cost span, and end the chat span with cost metadata.
Expected output: A module that exports instrumentedChat and instrumentedChatStream — the central functions that wrap Ollama chat completions with full observability.
Step 7: Wire up Next.js server instrumentation
Next.js supports a register() function in src/instrumentation.ts that runs when the Node.js server starts. Here you’ll initialize the Traceloop SDK, configure department budgets, and set up custom Ollama pricing.
The NEXT_RUNTIME guard ensures this code only runs in the Node.js runtime (not Edge). Each import uses dynamic import() because the modules depend on Node-only packages that would fail in the Edge runtime.
Expected output:src/instrumentation.ts with a register() function that initializes observability at server startup.
Step 8: Create the chat API route
The chat route accepts POST requests with a model name and messages array, validates the input with Zod, extracts department and tenant from HTTP headers, and returns the instrumented response with cost and trace metadata.
Notice the route handler uses NextRequest and NextResponse.json() (not bare Request/new Response(JSON.stringify(...))) so that the Content-Type: application/json header is set automatically.
Expected output: A POST endpoint at /api/chat that accepts JSON bodies, validates them, calls the instrumented Ollama client, and returns the response with cost and trace metadata.
Step 9: Create the dashboard API route
The dashboard endpoint returns aggregated cost data grouped by tenant, with optional filtering by tenant and time period.
Expected output: A GET endpoint at /api/dash that returns cost aggregations. Query it with ?period=month&tenant=engineering to filter by time window and tenant.
Step 10: Update the home page and create the barrel export
Replace the placeholder app/page.tsx with a landing page that links to the dashboard:
tsx
import styles from "./page.module.css";export default function Home() { return ( <div className={styles.page}> <main className={styles.main}> <h1>Ollama AI Observability</h1> <p> OpenTelemetry tracing and per-department cost attribution for on-prem Ollama LLM deployments. </p> <div className={styles.ctas}> <a className={styles.primary} href="/api/dash" target="_blank" rel="noopener noreferrer" > View Dashboard → </a> </div> </main> </div> );}
Update src/index.ts to export the public API surface:
ts
export { instrumentedChat } from "./lib/instrumentation.js";export { getDashboardData, checkBudget, configureDepartmentBudgets } from "./lib/cost-collector.js";export { tracerManager, errorHandler } from "./lib/tracer.js";export { logger } from "./lib/logger.js";
Expected output: A home page that links to the dashboard and a barrel module that exports the key library functions.
Step 11: Write and run the tests
The recipe includes a full test suite covering every module: cost-collector, instrumentation, tracer, logger, types, API routes, and the barrel export. Start with the cost-collector test, which verifies span recording, budget checking, dashboard data aggregation, and the flush pipeline.
The full recipe includes additional test files for the chat route, dashboard route, tracer, instrumentation, logger, types, and barrel export. Create those following the same vitest pattern. When you’re ready, run the full suite:
terminal
pnpm test
Expected output: All tests passing with no failures (numFailedTests=0) and at least 90% code coverage on the runtime code in src/ and app/api/ (your dashboard and chat route handlers).
Next steps
Add real-time alerting — wire the budget trigger alerts to a notification channel (Slack, email, PagerDuty) so you’re notified when a department hits 75% or 90% of its monthly budget
Extend to other providers — the REAA instrumentation framework supports OpenAI, Anthropic, and Google — add provider branches in the instrumentation module and hook up their cost models
Persist cost data — the in-memory CostCollector buffer resets on restart; replace it with a persistent store (PostgreSQL, SQLite, or a time-series database) so cost data survives server reboots
Build a visual dashboard — replace the JSON /api/dash endpoint with a real UI using charts (Recharts, Chart.js) showing daily/monthly spend per department, model breakdowns, and budget utilization gauges
flushIntervalMs:
60000
,
});
export const costAggregator = new CostAggregator({