Skip to content
reaatechREAATECH

@reaatech/mcp-gateway-observability

npm v1.0.0

Provides OpenTelemetry instrumentation, pre-configured metrics, and health check utilities for the MCP Gateway. It exports functions for registering custom health probes and initializes standard OTel tracing and Pino-based structured logging.

@reaatech/mcp-gateway-observability

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

OpenTelemetry tracing, metrics, health checks, and structured logging for the MCP Gateway. Provides auto-configured OTel SDK initialization, pre-built gateway metrics (counters, histograms, gauges), liveness/readiness/deep-health endpoints, and structured JSON logging via Pino.

Installation

terminal
npm install @reaatech/mcp-gateway-observability
# or
pnpm add @reaatech/mcp-gateway-observability

Feature Overview

  • Auto-configured OpenTelemetry — initializes SDK if OTEL_EXPORTER_OTLP_ENDPOINT is set
  • Pre-built metrics — counters for requests, auth attempts, cache hits/misses, rate limits, upstream errors, and fan-out; histograms for request and upstream latency
  • Distributed tracing — spans for auth, rate limiting, cache, validation, allowlist, upstream, and fan-out operations
  • Health checksGET /health (liveness), GET /health/deep (deep probes with component-level status)
  • Pluggable probes — register custom health probes for Redis, upstreams, or any dependency
  • Structured logging — Pino-based JSON logger re-exported from @reaatech/mcp-gateway-core
  • Prometheus-style metrics format — all metrics follow standard conventions with tenant_id, status, method labels
  • Dual ESM/CJS output — works with import and require

Quick Start

typescript
import {
  getLiveness,
  getDeepHealth,
  registerProbe,
  createRedisProbe,
} from "@reaatech/mcp-gateway-observability";
import express from "express";
 
const app = express();
 
// Liveness — quick check, always returns 200 if process is alive
app.get("/health", (req, res) => res.json(getLiveness()));
 
// Deep health — runs all registered probes
app.get("/health/deep", async (req, res) => {
  const status = await getDeepHealth();
  res.status(status.status === "unhealthy" ? 503 : 200).json(status);
});
 
// Register a custom probe
registerProbe("redis", createRedisProbe(() => redis.ping()));

API Reference

Health Checks

ExportDescription
registerProbe(name, probe)Register a named health probe function
unregisterProbe(name)Remove a registered probe
resetProbes()Clear all registered probes (for testing)
getLiveness()Returns { status: 'healthy', version, uptimeSeconds } — always succeeds
getReadiness()Returns combined readiness status from all probes
getDeepHealth()Returns per-component health with individual latency timings
createRedisProbe(pingFn, timeoutMs?)Factory for Redis ping-based health probe
createUpstreamProbe(url, timeoutMs?)Factory for HTTP GET-based upstream health probe
HealthProbe() => Promise<ComponentHealth>
HealthStatus{ status, version?, uptimeSeconds?, components? }
ComponentHealth{ status, message?, latencyMs? }

Metrics

All metrics use OpenTelemetry API. Meter name is SERVICE_NAME (mcp-gateway).

Counters

MetricLabelsDescription
gateway.requests.totaltenant_id, statusTotal requests processed
gateway.auth.attemptsmethod, resultAuth attempts by type and outcome
gateway.auth.failuresreasonFailed auth attempts
gateway.cache.hitstoolCache hit count per tool
gateway.cache.missestoolCache miss count per tool
gateway.rate_limit.exceededtenant_idRate limit exceeded count
gateway.allowlist.deniedtenant_id, toolAllowlist denial count
gateway.upstream.requestsupstream, methodUpstream request count
gateway.upstream.errorsupstream, error_typeUpstream error count
gateway.fanout.upstreamsstrategyFan-out upstream count
gateway.validation.errorstypeValidation error count
gateway.audit.eventsevent_typeAudit event count

Histograms

MetricLabelsDescription
gateway.requests.duration_mstenant_id, methodRequest processing time
gateway.upstream.latency_msupstreamUpstream call latency

Gauges

MetricLabelsDescription
gateway.cache.sizeCurrent cache entry count
gateway.rate_limit.remainingtenant_idRemaining rate limit tokens

Utility

ExportDescription
resetMetricsState()Reset all metric values (for testing only)

Tracing

ExportDescription
getTracer()Get the OpenTelemetry tracer
startSpan(name, options?)Start a new span with standard attributes
endSpan(span, status?)End a span with optional status code

OpenTelemetry Lifecycle

ExportDescription
setupOTel()Initialize the OTel SDK (called automatically on import)
shutdownOTel()Gracefully shut down OTel (flush pending telemetry)

Auto-Init Behavior

The SDK initializes automatically when the package is imported, but only if OTEL_EXPORTER_OTLP_ENDPOINT environment variable is set. In production without the endpoint, it logs a warning and skips initialization. Raw JSON is always used (no pretty-printing in production).

Usage Patterns

Enabling OpenTelemetry

terminal
# Set the OTLP endpoint (e.g., Jaeger, OpenTelemetry Collector)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
 
# The SDK auto-initializes on import — no code needed
node dist/index.js

Custom health probe

typescript
import { registerProbe } from "@reaatech/mcp-gateway-observability";
 
registerProbe("database", async () => {
  const start = Date.now();
  try {
    await db.query("SELECT 1");
    return { status: "healthy", latencyMs: Date.now() - start };
  } catch (err) {
    return { status: "unhealthy", message: err.message };
  }
});

Graceful shutdown with OTel flush

typescript
import { shutdownOTel } from "@reaatech/mcp-gateway-observability";
 
process.on("SIGTERM", async () => {
  await shutdownOTel();
  process.exit(0);
});

License

MIT