Skip to content
reaatechREAATECH

rag-eval-pack · packages

Every package shipped from reaatech/rag-eval-pack, published or pending.

10 packages

@reaatech/rag-eval-cli

pending npm
Provides a CLI for executing, gating, and comparing RAG evaluation suites, while also acting as a barrel package that re-exports the entire `@reaatech/rag-eval-*` library for programmatic use.
status
awaiting publish

@reaatech/rag-eval-core

pending npm
Provides TypeScript types and Zod schemas for defining RAG evaluation suites, including configurations for judges, cost tracking, and quality gates. It serves as a shared schema library for the `@reaatech/rag-eval-*` ecosystem, requiring only `zod` as a runtime dependency.
status
awaiting publish

@reaatech/rag-eval-cost

pending npm
Tracks token consumption and enforces budget limits for RAG evaluations using a set of classes for cost accounting, model pricing lookups, and report generation. It provides utilities to record per-sample costs and export results in JSON or JUnit XML formats for CI integration.
status
awaiting publish

@reaatech/rag-eval-dataset

pending npm
Manages RAG evaluation datasets by providing classes to load, validate, and version-track samples from JSON, JSONL, and YAML files. It relies on Zod for schema enforcement and integrates with @reaatech/rag-eval-core for sample definitions.
status
awaiting publish

@reaatech/rag-eval-gate

pending npm
Enforces quality standards on RAG evaluation metrics using a `GateEngine` class that validates results against fixed thresholds or historical baselines. It provides CI-friendly output and configurable exit codes, typically paired with evaluation data structures from `@reaatech/rag-eval-core`.
status
awaiting publish

@reaatech/rag-eval-judge

pending npm
Evaluates RAG pipeline outputs using LLM-as-a-judge with support for multi-model consensus, provider fallbacks, and human-label calibration. It provides a `JudgeEngine` class that executes pre-defined prompt templates for metrics like faithfulness and relevance, returning structured scores and reasoning.
status
awaiting publish

@reaatech/rag-eval-mcp-server

pending npm
Exposes RAG evaluation tools—including atomic judges, test suites, and regression gates—as an MCP server for integration with clients like Claude Desktop or Cursor. It provides a set of tool handler functions and server initialization utilities that rely on the `@modelcontextprotocol/sdk` to execute evaluation tasks via stdio.
status
awaiting publish

@reaatech/rag-eval-metrics

pending npm
Calculates heuristic-based RAG evaluation metrics including faithfulness, relevance, context precision, and context recall without requiring LLM API calls. It provides individual scorer classes and a `MetricsEngine` orchestrator for executing these evaluations in parallel.
status
awaiting publish

@reaatech/rag-eval-observability

pending npm
Provides structured logging via Pino and OpenTelemetry instrumentation for tracing and metrics specific to RAG evaluation workflows. It exports a set of wrapper functions for tracing evaluation runs, judge calls, and metric calculations, alongside a factory function for pre-configured loggers.
status
awaiting publish

@reaatech/rag-eval-suite

pending npm
Orchestrates RAG pipeline evaluations by combining metric computation, LLM-based judging, cost tracking, and quality gate enforcement. It provides an `EvaluationSuite` class that executes these tasks against datasets to generate aggregated performance reports and regression analysis.
status
awaiting publish