Skip to content
reaatechREAATECH

reaatech/rag-eval-pack

0Last commit: Jun 4, 2026GitHub →

These packages give you a full RAG evaluation pipeline—heuristic scorers for faithfulness, relevance, context precision, and context recall, plus an LLM-as-judge with multi-provider support, cost tracking with budget enforcement, and CI quality gates that can fail a build. You'd adopt them to catch regressions in a RAG system before deployment, whether that's a pre-commit smoke check or a nightly regression suite. The distinctive design is that every metric can run at three fidelity levels—free lexical scoring, embedding-based semantic scoring, or LLM judging—so you can trade cost for accuracy per use case without changing the evaluation interface.

Packages

10 packages

@reaatech/rag-eval-cli

v0.1.0
A CLI that runs RAG evaluation suites, quality gates, run comparisons, cost breakdowns, markdown reports, LLM-based judging, and an MCP server, exposed as the `rag-eval-pack` command. It also re-exports the full programmatic API from all `@reaatech/rag-eval-*` packages as a single importable library.
status
published
published
14 days ago

@reaatech/rag-eval-core

v0.1.0
Canonical TypeScript types and Zod schemas for RAG evaluation data shapes. Exports 18+ types (`EvaluationSample`, `EvalSuiteConfig`, `SampleEvalResult`, `GateConfig`, `JudgeConfig`, etc.) and two Zod schemas (`EvaluationSampleSchema`, `EvalSuiteConfigSchema`) for runtime validation, with zero runtime dependencies beyond `zod`.
status
published
published
14 days ago

@reaatech/rag-eval-cost

v0.1.0
Cost tracking, pricing, budgeting, and reporting infrastructure for RAG evaluations, providing `CostTracker`, `Pricing`, `BudgetManager`, and `CostReporter` classes that track per-sample token consumption, enforce budget limits with configurable alert thresholds, and generate cost reports in JSON and JUnit XML formats.
status
published
published
14 days ago

@reaatech/rag-eval-dataset

v0.1.0
A Zod-validated dataset loader and validator for RAG evaluation samples, supporting JSONL, JSON, and YAML formats with duplicate detection, synthetic generation from templates, and version tracking. Exports `DatasetLoader`, `DatasetValidator`, and `loadEvalConfig` functions.
status
published
published
14 days ago

@reaatech/rag-eval-gate

v0.1.0
A quality gate engine for RAG evaluation pipelines that enforces threshold-based metric checks and baseline regression detection, returning a `GateResult` object with pass/fail status and per-gate failure messages. It pairs with `@reaatech/rag-eval-core` for evaluation result types and is designed for CI/CD integration with formatted output and configurable exit codes.
status
published
published
14 days ago

@reaatech/rag-eval-judge

v0.1.0
A TypeScript class (`JudgeEngine`) that uses an LLM (Anthropic, OpenAI, or Google) to score RAG outputs on metrics like faithfulness and relevance, with optional consensus voting across multiple models and calibration against human labels.
status
published
published
14 days ago

@reaatech/rag-eval-mcp-server

v0.1.0
An MCP server that exposes RAG evaluation tools as a three-layer API of atomic judge operations, orchestrated suite runs, and CI-style regression gates, providing `createMcpServer()` and `startMcpServer()` functions for integration with MCP clients like Claude Desktop or Cursor.
status
published
published
14 days ago

@reaatech/rag-eval-metrics

v0.1.0
Provides four heuristic metric scorers (faithfulness, relevance, context precision, context recall) for evaluating RAG outputs, plus a `MetricsEngine` orchestrator that runs them in parallel with configurable concurrency. Each scorer is a class with a `score` method that returns a numeric score and supporting details, using only NLP libraries (`compromise`, `natural`) with no LLM calls.
status
published
published
14 days ago

@reaatech/rag-eval-observability

v0.1.0
Provides structured JSON logging via Pino, OpenTelemetry tracing, and OpenTelemetry metrics specifically for RAG evaluation pipelines, exporting functions like `createLogger`, `traceEvalRun`, and `recordEvalRun`.
status
published
published
14 days ago

@reaatech/rag-eval-suite

v0.1.0
A class (`EvaluationSuite`) that orchestrates RAG evaluation runs by executing heuristic metrics, optional LLM judge scoring, cost tracking, and quality gates against a dataset, returning a `SuiteRunResult` with aggregated metrics and gate pass/fail status.
status
published
published
14 days ago

Comments

Sign in with GitHub to comment and vote.

Loading comments…