Skip to content
reaatech

rag-eval-pack · packages

Every package shipped from reaatech/rag-eval-pack, published or pending.

10 packages

@reaatech/rag-eval-cli

v0.1.0
A CLI that runs RAG evaluation suites, quality gates, run comparisons, cost breakdowns, markdown reports, LLM-based judging, and an MCP server, exposed as the `rag-eval-pack` command. It also re-exports the full programmatic API from all `@reaatech/rag-eval-*` packages as a single importable library.
status
published
published
1 month ago

@reaatech/rag-eval-core

v0.1.0
Canonical TypeScript types and Zod schemas for RAG evaluation data shapes. Exports 18+ types (`EvaluationSample`, `EvalSuiteConfig`, `SampleEvalResult`, `GateConfig`, `JudgeConfig`, etc.) and two Zod schemas (`EvaluationSampleSchema`, `EvalSuiteConfigSchema`) for runtime validation, with zero runtime dependencies beyond `zod`.
status
published
published
1 month ago

@reaatech/rag-eval-cost

v0.1.0
Cost tracking, pricing, budgeting, and reporting infrastructure for RAG evaluations, providing `CostTracker`, `Pricing`, `BudgetManager`, and `CostReporter` classes that track per-sample token consumption, enforce budget limits with configurable alert thresholds, and generate cost reports in JSON and JUnit XML formats.
status
published
published
1 month ago

@reaatech/rag-eval-dataset

v0.1.0
A Zod-validated dataset loader and validator for RAG evaluation samples, supporting JSONL, JSON, and YAML formats with duplicate detection, synthetic generation from templates, and version tracking. Exports `DatasetLoader`, `DatasetValidator`, and `loadEvalConfig` functions.
status
published
published
1 month ago

@reaatech/rag-eval-gate

v0.1.0
A quality gate engine for RAG evaluation pipelines that enforces threshold-based metric checks and baseline regression detection, returning a `GateResult` object with pass/fail status and per-gate failure messages. It pairs with `@reaatech/rag-eval-core` for evaluation result types and is designed for CI/CD integration with formatted output and configurable exit codes.
status
published
published
1 month ago

@reaatech/rag-eval-judge

v0.1.0
A TypeScript class (`JudgeEngine`) that uses an LLM (Anthropic, OpenAI, or Google) to score RAG outputs on metrics like faithfulness and relevance, with optional consensus voting across multiple models and calibration against human labels.
status
published
published
1 month ago

@reaatech/rag-eval-mcp-server

v0.1.0
An MCP server that exposes RAG evaluation tools as a three-layer API of atomic judge operations, orchestrated suite runs, and CI-style regression gates, providing `createMcpServer()` and `startMcpServer()` functions for integration with MCP clients like Claude Desktop or Cursor.
status
published
published
1 month ago

@reaatech/rag-eval-metrics

v0.1.0
Provides four heuristic metric scorers (faithfulness, relevance, context precision, context recall) for evaluating RAG outputs, plus a `MetricsEngine` orchestrator that runs them in parallel with configurable concurrency. Each scorer is a class with a `score` method that returns a numeric score and supporting details, using only NLP libraries (`compromise`, `natural`) with no LLM calls.
status
published
published
1 month ago

@reaatech/rag-eval-observability

v0.1.0
Provides structured JSON logging via Pino, OpenTelemetry tracing, and OpenTelemetry metrics specifically for RAG evaluation pipelines, exporting functions like `createLogger`, `traceEvalRun`, and `recordEvalRun`.
status
published
published
1 month ago

@reaatech/rag-eval-suite

v0.1.0
A class (`EvaluationSuite`) that orchestrates RAG evaluation runs by executing heuristic metrics, optional LLM judge scoring, cost tracking, and quality gates against a dataset, returning a `SuiteRunResult` with aggregated metrics and gate pass/fail status.
status
published
published
1 month ago