SMBs adopt several AI agents (support bot, lead qualifier, appointment setter) but have no real visibility into their behavior, leading to silent failures, cost overruns, and distrust in the automation.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build a Next.js observability dashboard that tracks every AI agent your small business runs — support bots, lead qualifiers, appointment setters — in a single pane of glass. Each agent call is instrumented with OpenTelemetry spans, logged with Pino, and routed to Langfuse for aggregation. By the end you’ll have a working dashboard that shows cost, latency, failure rate, and lets you replay any past conversation to debug what went wrong.
Prerequisites
Node.js >= 22 (check with node --version)
pnpm 10.15.1 (check with pnpm --version; install with npm install -g pnpm@10.15.1)
A Langfuse account (cloud at langfuse.com or self-hosted) — you’ll need a public key, secret key, and host URL
An OpenTelemetry Collector endpoint — this recipe sends traces via OTLP/HTTP; you can use the Langfuse OTel endpoint directly or run a local collector
A Slack webhook URL (optional, for health check alerts)
Familiarity with TypeScript and Next.js App Router — you should know how src/app/ route handlers and pages work
Step 1: Scaffold the project
Create an empty directory, then add the project manifest and TypeScript configuration.
The @reaatech/* packages provide the observability hooks, Langfuse aggregates traces, OpenTelemetry handles distributed tracing, and Pino gives you structured JSON logging.
This tells Vitest to match the @/* import alias, run tests with Node, and enforce 90% coverage thresholds across all metrics. UI components under src/app/dashboard are excluded from coverage since they’re visual.
Create a .gitignore file to keep build artifacts out of version control:
Expected output: pnpm resolves and installs all packages. You’ll see a “Done” message along with the install time. The pnpm-lock.yaml is created automatically.
Step 3: Configure environment variables
Copy .env.example to .env and fill in your real values. The artifact ships with this template:
The app validates these at startup (you’ll write that validator in the next step). SLACK_WEBHOOK_URL is optional — if blank, health-check failures are logged but not sent to Slack.
Step 4: Write the observability primitives
Create the src/lib/ directory and add four foundational modules: OpenTelemetry span helpers, a Pino logger, the Langfuse API client, and an environment validator.
Create src/lib/otel.ts:
ts
import { trace, Span } from "@opentelemetry/api";const TRACER_NAME = "multi-agent-obs";export function createSpan(name: string): Span { return trace.getTracer(TRACER_NAME).startSpan(name);}export function endSpan(span: Span, error?: Error): void { if (error) { span.recordException(error); span.setStatus
createSpan starts a named span using the OpenTelemetry API. endSpan records any thrown error as an exception and marks the span as errored (status code 2) or OK (status code 1).
In development mode (NODE_ENV=development) Pino uses pino-pretty for human-readable output. In production it emits raw JSON — ideal for log aggregation systems.
This module creates a Langfuse SDK instance for writing traces and provides two REST helpers — fetchTraces for listing and fetchTrace for retrieving a single trace by ID — using HTTP Basic auth with your Langfuse project keys.
Create src/lib/validate-env.ts:
ts
const REQUIRED_VARS = [ "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_HOST", "OTEL_EXPORTER_OTLP_ENDPOINT", "CRON_SECRET",] as const;export function validateEnv(): void { for (const varName of REQUIRED_VARS) { const val = process.env[varName]; if (!val || val.trim() === "") { throw
This runs at import time inside instrumentation.ts (you’ll write that next) to fail fast if any required variable is missing or empty.
Step 5: Wire up REAA observability and instrumentation
Now add the wrapper modules that bridge the REAA packages to your app, plus the instrumentation.ts entrypoint that Next.js loads on startup.
Create src/lib/runbook-obs.ts:
ts
import { initLogger, initTracing, initMetrics, recordAgentCost as reaaRecordAgentCost, recordGeneration as reaaRecordGeneration,} from "@reaatech/agent-runbook-observability";export function initRunbookObs(): void { initLogger({ level: process.env.LOG_LEVEL ?? "info", service: "multi-agent-obs" }); initTracing({ serviceName: "multi-agent-obs", otlpEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT
initRunbookObs bootstraps the REAA runbook observability layer — logging, tracing, and metrics — all pointed at your OTLP endpoint.
This exposes singletons from the eval harness package: a logger, tracing manager, metrics manager, and dashboard manager. API routes call evalMetrics.recordRun and evalDashboard.getSummary to track response quality and surface it in the dashboard.
Create src/lib/mesh-obs.ts:
ts
import { logger, createChildLogger, initOtel, shutdownOtel, recordAgentDispatchDuration as meshRecordDispatchDuration, recordAgentDispatchError as meshRecordDispatchError,} from "@reaatech/agent-mesh-observability";export function initMeshObs(): void { initOtel();}export function createRequestLogger( requestId: string,
The mesh package tracks inter-agent communication. createRequestLogger produces a child logger with request and session IDs attached, so you can trace a conversation across multiple agent dispatches.
Create src/lib/budget-bridge.ts:
ts
import { SpanListener } from "@reaatech/agent-budget-otel-bridge";export class InMemoryBudgetStore { private store: Map<string, number> = new Map(); recordSpend(scopeKey: string, amount: number): void { const current = this.store.get(scopeKey) ?? 0; this.store.set(scopeKey, current
InMemoryBudgetStore keeps a running tally of spend per agent scope. The SpanListener from the budget bridge hooks into completed spans and records costs automatically.
Create src/lib/replay.ts:
ts
import { RecordingEngine, ReplayEngine, LocalFileStorage, DiffEngine,} from "@reaatech/agent-replay-core";export function createRecordingEngine(): RecordingEngine { return new RecordingEngine();}export function createReplayEngine(): ReplayEngine { return new ReplayEngine
recordInteraction starts a recording session, opens an LLM span, captures the human message as an event, and stops the recording — producing a serialized trace you can replay later without consuming LLM tokens.
Create src/instrumentation.ts:
ts
import { NodeSDK } from "@opentelemetry/sdk-node";import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";import { validateEnv } from "./lib/validate-env";import { initRunbookObs } from "./lib/runbook-obs";import { initEvalObs } from "./lib/eval-harness-obs";import { initMeshObs } from "./lib/mesh-obs";validateEnv();const otlpExporter = new OTLPTraceExporter
Next.js automatically loads src/instrumentation.ts on startup. This file: validates all required env vars, configures the OTLP trace exporter, starts the OpenTelemetry SDK, and initializes the three REAA observability layers. On SIGTERM it shuts down cleanly so no in-flight spans are lost.
Step 6: Build the API routes
With the observability layer in place, you can now write the route handlers that agents and the dashboard call.
Create src/app/api/support/chat/route.ts:
ts
import { NextResponse } from "next/server";import { createSpan, endSpan } from "../../../../lib/otel"
This is the main support agent endpoint. Each request: validates input (message required, sessionId required, 4000-char cap), records the interaction via the replay engine, emits runbook metrics (generation status, agent cost), records eval metrics (run count, P99 latency), pushes the trace to Langfuse, and logs the outcome with Pino. On failure it records a failure metric and returns a 500.
Create src/app/api/metrics/route.ts:
ts
import { NextResponse } from "next/server";import { createSpan, endSpan } from "../../../lib/otel";import { recordGeneration, recordAgentCost } from "../../../lib/runbook-obs";import { evalDashboard } from "../../../lib/eval-harness-obs";export async function GET(): Promise<NextResponse> { const span =
This endpoint reads the eval dashboard summary (total runs, cost per task, P99 latency, quality scores, active alerts, trends) and returns it as JSON. Downstream monitoring systems or the dashboard UI can poll this.
Create src/app/api/replay/route.ts:
ts
import { NextResponse } from "next/server";import { fetchTrace } from "../../../lib/langfuse";import { log } from "../../../lib/logger";export async function GET(request: Request): Promise<NextResponse> { const url = new URL(request.url); const traceId = url.searchParams.get("traceId");
Given a traceId query parameter, this fetches the full trace (including observations) from Langfuse. The replay viewer page calls this to load trace data.
Create src/app/api/cron/health/route.ts:
ts
import { NextResponse } from "next/server";import { createSpan, endSpan } from "../../../../lib/otel";import { log } from "../../../../lib/logger";import { langfuse } from "../../../../lib/langfuse"
This is designed to be hit by an external cron scheduler (e.g., Vercel Cron Jobs, GitHub Actions). It authenticates with a Bearer token from CRON_SECRET, runs the agent runbook health check, pushes a health-check trace to Langfuse, records metrics, and on failure posts an alert to Slack.
Step 7: Build the dashboard UI
The dashboard queries Langfuse, computes aggregate metrics, and renders summary cards, a cost-by-agent breakdown, and a recent-traces table. The replay viewer lets you step through individual trace spans.
Create src/lib/dashboard-data.ts:
ts
import { fetchTraces, type LangfuseTraceData } from "./langfuse";
This module fetches the 100 most recent traces from Langfuse and computes: average latency, failure rate as a percentage, P50/P95/P99 latencies (sorted, no interpolation for simplicity), estimated cost (duration × a configurable rate), trace counts grouped by agent name, and the 10 most recent traces for the table view.
Create src/app/dashboard/page.tsx:
This is a React Server Component (no "use client" directive) with dynamic = "force-dynamic" so Next.js re-fetches data on every request. DashboardContent (an async component) calls getDashboardData(), then renders summary cards (total traces, average latency, failure rate, estimated cost), a cost-by-agent breakdown grid, and a recent-traces table. React Suspense wraps it all with skeleton cards while the Langfuse API is loading.
Create src/app/dashboard/replay/[traceId]/page.tsx (this file is long — the complete source is in the downloadable artifact; the excerpt below shows the core structure):
tsx
"use client";import { useEffect, useState, useCallback } from "react";import { fetchTrace } from "@/lib/langfuse";import type { LangfuseTraceData } from
This is a client component. It resolves the dynamic [traceId] segment, calls fetchTrace() to pull the full trace from Langfuse, parses observations into timeline steps, and renders a split-pane UI: a step list on the left, event details on the right. A “Run CI/CD Check” button runs a regression diff using the replay engine and displays the result. The full file (388 lines) is available in the downloadable artifact — copy it verbatim into place.
Step 8: Run the tests
The project includes 64 tests across 32 suites covering every module. Run them with:
terminal
pnpm test
Expected output: all 64 tests pass with zero failures. You’ll see output like:
For coverage (with 90% thresholds on lines, branches, functions, and statements):
terminal
pnpm test:coverage
Step 9: Start the dev server and verify
Launch the development server:
terminal
pnpm dev
Expected output:
code
▲ Next.js 15.2.6
- Local: http://localhost:3000
Test the support chat endpoint. In another terminal, send a message:
terminal
curl -X POST http://localhost:3000/api/support/chat \ -H "Content-Type: application/json" \ -d '{"message": "Hi, I need help with my billing.", "sessionId": "sess-001"}'
Expected output:
json
{"reply":"Thanks for your message about: \"Hi, I need help with my billing.\".... A support agent will follow up soon.","sessionId":"sess-001"}
Check the metrics endpoint:
terminal
curl http://localhost:3000/api/metrics
Expected output: a JSON object with runs, cost, latency, and quality fields.
View the dashboard at http://localhost:3000/dashboard. After sending a few chat messages, you’ll see summary cards with total traces, average latency, failure rate, and estimated cost, plus a recent-traces table.
Replay a conversation by clicking a trace ID in the table or navigating directly to http://localhost:3000/dashboard/replay/<traceId>.
Next steps
Add more agent types. The src/app/api/support/chat/route.ts pattern works for any agent — copy it for a lead qualifier, appointment setter, or knowledge-base bot, each with its own trace name and cost tracking.
Deploy the health check as a real cron. Point Vercel Cron Jobs or a GitHub Actions scheduled workflow at GET /api/cron/health with the Authorization: Bearer <CRON_SECRET> header so you get Slack alerts when agents go down.
Connect the metrics endpoint to a monitoring system.GET /api/metrics returns JSON — pipe it into Grafana, Datadog, or a simple internal status page to get real-time visibility without opening the dashboard.