Files · vLLM Agent Quality Gate for On-Prem SMB Support Bots

61 (1 binary, 574.4 kB total)attempt 1

README.md·4952 B·markdown

markdown

# vLLM Agent Quality Gate for On-Prem SMB Support Bots
 
> Automated regression testing for self‑hosted vLLM agent pipelines, with CI gates that block deployment when support‑bot quality drops.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Description
 
This recipe provides a quality-gate pipeline for on-premise vLLM-hosted support bots. It loads golden trajectory datasets, runs evaluations against a vLLM endpoint, compares agent trajectories against golden references, enforces configurable quality thresholds, and exports metrics to Langfuse for observability.
 
## Prerequisites
 
- **Node.js** >= 22
- **pnpm** 10.x
- A running **vLLM instance** with OpenAI-compatible API enabled
- A **Langfuse** account (optional — metric tracking works without it)
 
## Installation
 
```bash
pnpm install
cp .env.example .env
# Edit .env with your vLLM endpoint, API key, model, and Langfuse credentials
```
 
## Configuration
 
| Variable | Default | Description |
|----------|---------|-------------|
| `VLLM_ENDPOINT` | `http://localhost:8000/v1` | vLLM OpenAI-compatible base URL |
| `VLLM_API_KEY` | — | API key for vLLM authentication |
| `VLLM_MODEL` | `deepseek-v4-flash` | Model name deployed on vLLM |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse project secret key |
| `LANGFUSE_BASE_URL` | `https://cloud.langfuse.com` | Langfuse host |
| `GOLDEN_DATA_DIR` | `./golden` | Directory for golden JSONL trajectory files |
| `EVAL_RESULTS_DIR` | `./results` | Output directory for evaluation results |
| `GATE_PRESET` | `standard` | Gate preset name (`standard`, `strict`, `lenient`) |
| `QUALITY_THRESHOLD` | `0.9` | Minimum pass rate for CI gate (0–1) |
| `CI_MODE` | `false` | Set `true` to call `process.exit` in CI pipelines |
 
## Usage
 
```bash
# Run the evaluation pipeline locally
pnpm tsx src/index.ts
 
# Or with specific directories
GOLDEN_DATA_DIR=./my-golden EVAL_RESULTS_DIR=./my-results pnpm tsx src/index.ts
```
 
## CI Integration
 
### GitLab CI
 
```yaml
eval-quality-gate:
  stage: test
  script:
    - pnpm install
    - pnpm tsx src/index.ts
  variables:
    CI_MODE: "true"
    QUALITY_THRESHOLD: "0.85"
  only:
    - merge_requests
```
 
## Architecture
 
```
Golden Trajectories (JSONL)
       │
       ▼
  ┌─────────────┐
  │ Load Golden  │
  │ Trajectories │
  └──────┬──────┘
         │
         ▼
  ┌──────────────┐     ┌──────────────────┐
  │ vLLM Agent   │────▶│ Compare Against  │
  │ Evaluation   │     │ Golden Reference │
  └──────────────┘     └────────┬─────────┘
                                │
         ┌──────────────────────┤
         ▼                      ▼
  ┌──────────────┐     ┌──────────────┐
  │ Quality Gate │     │ Export       │
  │ Check        │     │ Metrics to   │
  │ (pass/fail)  │     │ Langfuse     │
  └──────────────┘     └──────────────┘
```
 
## Golden Dataset Format
 
Golden trajectories are stored as JSONL (one JSON object per line):
 
```jsonl
{"turn_id": "turn-1", "role": "user", "content": "How do I reset my password?", "tool_calls": [], "timestamp": "2026-01-01T00:00:00Z"}
{"turn_id": "turn-2", "role": "assistant", "content": "You can reset your password by visiting the account settings page.", "tool_calls": [{"name": "get_account_settings", "args": {}}], "timestamp": "2026-01-01T00:00:01Z"}
```
 
Each turn contains `turn_id`, `role`, `content`, `tool_calls`, and `timestamp`.
 
## Project Structure
 
```
src/
  eval/
    types.ts       — TypeScript interfaces for config and results
    config.ts      — Zod-validated config parser from environment
    run.ts         — Main evaluation pipeline orchestrator
  lib/
    langfuse.ts    — Langfuse metrics export wrapper
    golden.ts      — Golden-trajectory comparison orchestration
    gate.ts        — CI gate evaluation wrapper
  index.ts         — CLI entry point
  instrumentation.ts  — Next.js instrumentation hook
tests/
  eval/
    config.test.ts — Config parsing tests
    run.test.ts    — Pipeline orchestration tests
  lib/
    langfuse.test.ts
    golden.test.ts
    gate.test.ts
  index.test.ts
app/
  api/eval/route.ts — HTTP endpoint for CI webhook triggers
```
 
## Development
 
```bash
pnpm typecheck   # TypeScript type checking
pnpm lint        # ESLint
pnpm test        # Vitest with coverage
```
 
## License
 
MIT — see [LICENSE](./LICENSE).