Files · Vercel AI Gateway Reliability Suite for SMB AI Operations

58 (0 binary, 272.2 kB total)attempt 2

README.md·1890 B·markdown

markdown

# Vercel AI Gateway Reliability Suite for SMB AI Operations
 
A self-serve reliability dashboard that monitors, replays, and self-heals AI workflows running through Vercel AI Gateway, so small teams can keep LLM apps running 24/7.
 
## Problem
 
SMBs deploying LLM features on Vercel have no visibility into why a response was slow, failed, or drifted. Without replays, health checks, and incident runbooks, a weekend spike in errors means lost revenue and frantic debugging on Monday.
 
## Solution
 
This suite leans on REAA's operational packages:
- `agent-runbook-health-checks` — health probe generation and aggregation
- `agent-runbook-incident` — incident workflow and escalation management
- `agent-runbook-service-map` — service dependency analysis and graph export
- `agent-runbook-rollback` — rollback procedure generation per platform
- `agent-replay` / `agent-replay-core` — LLM trace recording, deterministic replay, and diff
 
It continually pings live endpoints with health probes, records every LLM interaction for deterministic replay, and automatically triggers incident diagnostics when anomalies are detected. A Next.js dashboard surfaces health status, incident workflows, replay traces, service maps, and rollback controls.
 
## Getting Started
 
```bash
pnpm install
cp .env.example .env
# Fill in your environment variables
pnpm dev
```
 
## Scripts
 
- `pnpm typecheck` — TypeScript type checking
- `pnpm lint` — ESLint linting
- `pnpm test` — Vitest test suite with coverage
 
## Architecture
 
```
src/
  app/             Next.js App Router pages and API routes
  lib/             Core library modules
  types/           Shared TypeScript types
  workers/         Scheduled job runners
  instrumentation.ts  Next.js instrumentation hook
tests/             Test files (co-located tests)
```
 
See [DEV_PLAN.md](./DEV_PLAN.md) for the full implementation plan.