reaatech/rag-eval-pack

★ 0Last commit: May 11, 2026GitHub →

These packages provide a modular toolkit for evaluating RAG systems using heuristic scorers, LLM-as-judge, and automated quality gates. They help teams measure retrieval and generation performance while enforcing cost budgets and CI/CD regression thresholds. The system is built as a composable suite where an orchestration engine coordinates data loading, metric calculation, and observability across independent, type-safe packages.

agentic-ai answer-relevance ci-cd context-precision context-recall evaluation-metrics faithfulness llm-eval mlops rag rag-evaluation retrieval-augmented-generation testing-tools typescript

Packages

Sort

10 packages

rag-eval-cli

@reaatech/rag-eval-cli

pending npm

Provides a CLI for executing, gating, and comparing RAG evaluation suites, while also acting as a barrel package that re-exports the entire `@reaatech/rag-eval-*` library for programmatic use.

View package

status: awaiting publish

rag-eval-core

@reaatech/rag-eval-core

pending npm

Provides TypeScript types and Zod schemas for defining RAG evaluation suites, including configurations for judges, cost tracking, and quality gates. It serves as a shared schema library for the `@reaatech/rag-eval-*` ecosystem, requiring only `zod` as a runtime dependency.

View package

status: awaiting publish

rag-eval-cost

@reaatech/rag-eval-cost

pending npm

Tracks token consumption and enforces budget limits for RAG evaluations using a set of classes for cost accounting, model pricing lookups, and report generation. It provides utilities to record per-sample costs and export results in JSON or JUnit XML formats for CI integration.

View package

status: awaiting publish

rag-eval-dataset

@reaatech/rag-eval-dataset

pending npm

Manages RAG evaluation datasets by providing classes to load, validate, and version-track samples from JSON, JSONL, and YAML files. It relies on Zod for schema enforcement and integrates with @reaatech/rag-eval-core for sample definitions.

View package

status: awaiting publish

rag-eval-gate

@reaatech/rag-eval-gate

pending npm

Enforces quality standards on RAG evaluation metrics using a `GateEngine` class that validates results against fixed thresholds or historical baselines. It provides CI-friendly output and configurable exit codes, typically paired with evaluation data structures from `@reaatech/rag-eval-core`.

View package

status: awaiting publish

rag-eval-judge

@reaatech/rag-eval-judge

pending npm

Evaluates RAG pipeline outputs using LLM-as-a-judge with support for multi-model consensus, provider fallbacks, and human-label calibration. It provides a `JudgeEngine` class that executes pre-defined prompt templates for metrics like faithfulness and relevance, returning structured scores and reasoning.

View package

status: awaiting publish

rag-eval-mcp-server

@reaatech/rag-eval-mcp-server

pending npm

Exposes RAG evaluation tools—including atomic judges, test suites, and regression gates—as an MCP server for integration with clients like Claude Desktop or Cursor. It provides a set of tool handler functions and server initialization utilities that rely on the `@modelcontextprotocol/sdk` to execute evaluation tasks via stdio.

View package

status: awaiting publish

rag-eval-metrics

@reaatech/rag-eval-metrics

pending npm

Calculates heuristic-based RAG evaluation metrics including faithfulness, relevance, context precision, and context recall without requiring LLM API calls. It provides individual scorer classes and a `MetricsEngine` orchestrator for executing these evaluations in parallel.

View package

status: awaiting publish

rag-eval-observability

@reaatech/rag-eval-observability

pending npm

Provides structured logging via Pino and OpenTelemetry instrumentation for tracing and metrics specific to RAG evaluation workflows. It exports a set of wrapper functions for tracing evaluation runs, judge calls, and metric calculations, alongside a factory function for pre-configured loggers.

View package

status: awaiting publish

rag-eval-suite

@reaatech/rag-eval-suite

pending npm

Orchestrates RAG pipeline evaluations by combining metric computation, LLM-based judging, cost tracking, and quality gate enforcement. It provides an `EvaluationSuite` class that executes these tasks against datasets to generate aggregated performance reports and regression analysis.