Files · vLLM Observability Suite for SMB AI Operations
70 (1 binary, 598.9 kB total)attempt 1
README.md·4506 B·markdown
markdown
# vLLM Observability Suite for SMB AI Operations
Prebuilt observability stack with OpenTelemetry traces and dashboards for any AI agent using vLLM as the inference backend. Automatically instruments all LLM calls to vLLM's OpenAI-compatible endpoint, exports spans to Langfuse, tracks per-model costs, and displays real-time dashboards.
## Features
- **Automatic OTel span instrumentation** — wraps the OpenAI-compatible vLLM client via `@reaatech/otel-genai-semconv-openai`; every `chat.completions.create()` call emits GenAI semantic convention spans with request metadata, token usage, and streaming metrics
- **Span export to Langfuse** — converts OTel spans to Langfuse trace/observation format via `@reaatech/otel-genai-semconv-exporters` and pushes them to the Langfuse API
- **Per-model cost tracking** — calculates LLM API costs across models using `@reaatech/llm-cost-telemetry-calculator` with built-in pricing tables and custom model pricing support
- **Structured logging** — Pino-based logging with PII redaction via `@reaatech/llm-cost-telemetry-observability`
- **SQLite aggregation** — background job collects span metrics and stores them in SQLite via Drizzle ORM + libSQL for historical queries and dashboard rendering
- **Real-time dashboard** — Next.js App Router page at `/dashboard` showing total spend, token usage, model cost breakdown, and recent spans
## Architecture
```
src/instrumentation.ts → @opentelemetry/sdk-node → OTLPTraceExporter (OTLP endpoint)
+ LangfuseExporter (buffers GenAI spans)
↓
src/services/span-aggregator.ts ← reads LangfuseExporter buffer
→ pushes to Langfuse API via langfuse SDK
→ persists to SQLite via Drizzle ORM + libSQL (local .db file)
↓
app/api/spans/route.ts → GET /api/spans?limit=&offset=&model=&status=
app/api/costs/route.ts → GET /api/costs?groupBy=model|day&from=&to=
app/dashboard/page.tsx → server component fetching /api/costs + /api/spans
```
## Prerequisites
- Node.js >=22
- pnpm (install via `corepack enable && corepack prepare pnpm@10 --activate`)
- A running vLLM instance with an OpenAI-compatible endpoint (default: `http://localhost:8000/v1`)
- A Langfuse account (cloud at https://langfuse.com or self-hosted) with public/secret API keys
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `VLLM_BASE_URL` | No | `http://localhost:8000/v1` | vLLM OpenAI-compatible API endpoint |
| `VLLM_API_KEY` | No | `""` | API key for vLLM (if configured) |
| `LANGFUSE_PUBLIC_KEY` | Yes | — | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | Yes | — | Langfuse project secret key |
| `LANGFUSE_BASE_URL` | No | `https://cloud.langfuse.com` | Langfuse API base URL |
| `OTLP_ENDPOINT` | No | `http://localhost:4318/v1/traces` | OTLP HTTP trace collector endpoint |
| `DATABASE_URL` | No | `file:local.db` | libSQL database URL |
## Getting Started
```bash
# Copy and fill in environment variables
cp .env.example .env
# Edit .env with your vLLM endpoint and Langfuse credentials
# Start the dev server
pnpm dev
# Visit http://localhost:3000/dashboard to see metrics
# Make a vLLM chat call to generate span data
```
## API Reference
### `GET /api/health`
Returns service health status.
**Response:** `{ status: "ok", service: "vllm-observability", timestamp: "<ISO-8601>" }`
### `GET /api/spans`
Returns paginated aggregated span records from SQLite.
**Query params:** `limit` (default 50), `offset` (default 0), `model` (optional filter), `status` (optional filter)
**Response:** `{ spans: SpanRow[], total: number }`
### `GET /api/costs`
Returns aggregated cost data grouped by model or day.
**Query params:** `from` (ISO date), `to` (ISO date), `groupBy` (`"model"` or `"day"`)
**Response:** `{ costs: Array<{ key, costUsd, inputTokens, outputTokens }>, totalCostUsd, period }`
## Tech Stack
- **Framework:** Next.js 16+ (App Router)
- **Language:** TypeScript (strict, NodeNext module resolution)
- **Observability:** OpenTelemetry, @reaatech/otel-genai-semconv-core, @reaatech/otel-genai-semconv-openai, @reaatech/otel-genai-semconv-exporters
- **Cost tracking:** @reaatech/llm-cost-telemetry-calculator, @reaatech/llm-cost-telemetry-observability
- **Database:** Drizzle ORM, @libsql/client (libSQL/SQLite)
- **Validation:** Zod
- **Testing:** Vitest, MSW
## License
MIT — see [LICENSE](./LICENSE).