Files · Perplexity RAG Eval Suite for SMB Knowledge Bases
71 (1 binary, 570.6 kB total)attempt 1
README.md·1975 B·markdown
markdown
# Perplexity RAG Eval Suite for SMB Knowledge Bases
## What It Does
This CLI evaluation harness runs a configurable RAG evaluation pipeline that scores answer faithfulness, relevance, context precision, and context recall using heuristic metrics and optional LLM-as-judge via Perplexity. Results are gated against configurable thresholds and output as JSON or JUnit XML for CI integration.
## Quick Start
```bash
pnpm install
export PERPLEXITY_API_KEY=<your-key>
pnpm tsx src/cli/eval.ts --dataset eval-dataset.jsonl
```
## CLI Usage
| Flag | Description | Default |
|------|-------------|---------|
| --dataset | Path to evaluation dataset (JSONL/JSON/YAML) | (required) |
| --config | Path to eval config YAML | ./eval-config.yaml |
| --fidelity | Evaluation fidelity: heuristic-only or full-judge | heuristic-only |
| --output | Output format: json or junit | json |
| --baseline | Path to baseline results JSON for regression gates | — |
## Architecture
Dataset → Heuristic Scorer → LLM Judge (high-ambiguity only) → Cost Tracker → Gate Checker → Formatted Output
## Configuration
See eval-config.yaml for the full configuration structure.
The YAML file supports:
- metrics: which metrics to evaluate (faithfulness, relevance, context_precision, context_recall)
- judge: LLM judge configuration (model, enabled)
- cost: budget limits
- gates: quality gates with type (threshold/baseline-comparison), metric, operator, and threshold
## CI Integration
```bash
pnpm tsx src/cli/eval.ts --dataset eval-dataset.jsonl --fidelity full-judge || exit 1
```
## Environment Variables
| Variable | Description |
|----------|-------------|
| PERPLEXITY_API_KEY | Perplexity API key for LLM judge |
| LANGFUSE_PUBLIC_KEY | Langfuse public key (optional) |
| LANGFUSE_SECRET_KEY | Langfuse secret key (optional) |
| LANGFUSE_HOST | Langfuse host URL |
| DEFAULT_DAILY_BUDGET | Daily budget cap for judge costs |
## License
MIT — see [LICENSE](./LICENSE).