Files · vLLM Agent Quality Gate for On-Prem SMB Support Bots
61 (1 binary, 574.4 kB total)attempt 1
README.md·4952 B·markdown
markdown
# vLLM Agent Quality Gate for On-Prem SMB Support Bots
> Automated regression testing for self‑hosted vLLM agent pipelines, with CI gates that block deployment when support‑bot quality drops.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
## Description
This recipe provides a quality-gate pipeline for on-premise vLLM-hosted support bots. It loads golden trajectory datasets, runs evaluations against a vLLM endpoint, compares agent trajectories against golden references, enforces configurable quality thresholds, and exports metrics to Langfuse for observability.
## Prerequisites
- **Node.js** >= 22
- **pnpm** 10.x
- A running **vLLM instance** with OpenAI-compatible API enabled
- A **Langfuse** account (optional — metric tracking works without it)
## Installation
```bash
pnpm install
cp .env.example .env
# Edit .env with your vLLM endpoint, API key, model, and Langfuse credentials
```
## Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| `VLLM_ENDPOINT` | `http://localhost:8000/v1` | vLLM OpenAI-compatible base URL |
| `VLLM_API_KEY` | — | API key for vLLM authentication |
| `VLLM_MODEL` | `deepseek-v4-flash` | Model name deployed on vLLM |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse project secret key |
| `LANGFUSE_BASE_URL` | `https://cloud.langfuse.com` | Langfuse host |
| `GOLDEN_DATA_DIR` | `./golden` | Directory for golden JSONL trajectory files |
| `EVAL_RESULTS_DIR` | `./results` | Output directory for evaluation results |
| `GATE_PRESET` | `standard` | Gate preset name (`standard`, `strict`, `lenient`) |
| `QUALITY_THRESHOLD` | `0.9` | Minimum pass rate for CI gate (0–1) |
| `CI_MODE` | `false` | Set `true` to call `process.exit` in CI pipelines |
## Usage
```bash
# Run the evaluation pipeline locally
pnpm tsx src/index.ts
# Or with specific directories
GOLDEN_DATA_DIR=./my-golden EVAL_RESULTS_DIR=./my-results pnpm tsx src/index.ts
```
## CI Integration
### GitLab CI
```yaml
eval-quality-gate:
stage: test
script:
- pnpm install
- pnpm tsx src/index.ts
variables:
CI_MODE: "true"
QUALITY_THRESHOLD: "0.85"
only:
- merge_requests
```
## Architecture
```
Golden Trajectories (JSONL)
│
▼
┌─────────────┐
│ Load Golden │
│ Trajectories │
└──────┬──────┘
│
▼
┌──────────────┐ ┌──────────────────┐
│ vLLM Agent │────▶│ Compare Against │
│ Evaluation │ │ Golden Reference │
└──────────────┘ └────────┬─────────┘
│
┌──────────────────────┤
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Quality Gate │ │ Export │
│ Check │ │ Metrics to │
│ (pass/fail) │ │ Langfuse │
└──────────────┘ └──────────────┘
```
## Golden Dataset Format
Golden trajectories are stored as JSONL (one JSON object per line):
```jsonl
{"turn_id": "turn-1", "role": "user", "content": "How do I reset my password?", "tool_calls": [], "timestamp": "2026-01-01T00:00:00Z"}
{"turn_id": "turn-2", "role": "assistant", "content": "You can reset your password by visiting the account settings page.", "tool_calls": [{"name": "get_account_settings", "args": {}}], "timestamp": "2026-01-01T00:00:01Z"}
```
Each turn contains `turn_id`, `role`, `content`, `tool_calls`, and `timestamp`.
## Project Structure
```
src/
eval/
types.ts — TypeScript interfaces for config and results
config.ts — Zod-validated config parser from environment
run.ts — Main evaluation pipeline orchestrator
lib/
langfuse.ts — Langfuse metrics export wrapper
golden.ts — Golden-trajectory comparison orchestration
gate.ts — CI gate evaluation wrapper
index.ts — CLI entry point
instrumentation.ts — Next.js instrumentation hook
tests/
eval/
config.test.ts — Config parsing tests
run.test.ts — Pipeline orchestration tests
lib/
langfuse.test.ts
golden.test.ts
gate.test.ts
index.test.ts
app/
api/eval/route.ts — HTTP endpoint for CI webhook triggers
```
## Development
```bash
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
pnpm test # Vitest with coverage
```
## License
MIT — see [LICENSE](./LICENSE).