rag-eval-pack · packages
Every package shipped from reaatech/rag-eval-pack, published or pending.
10 packages
@reaatech/rag-eval-cli
A CLI that runs RAG evaluation suites, quality gates, run comparisons, cost breakdowns, markdown reports, LLM-based judging, and an MCP server, exposed as the `rag-eval-pack` command. It also re-exports the full programmatic API from all `@reaatech/rag-eval-*` packages as a single importable library.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-core
Canonical TypeScript types and Zod schemas for RAG evaluation data shapes. Exports 18+ types (`EvaluationSample`, `EvalSuiteConfig`, `SampleEvalResult`, `GateConfig`, `JudgeConfig`, etc.) and two Zod schemas (`EvaluationSampleSchema`, `EvalSuiteConfigSchema`) for runtime validation, with zero runtime dependencies beyond `zod`.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-cost
Cost tracking, pricing, budgeting, and reporting infrastructure for RAG evaluations, providing `CostTracker`, `Pricing`, `BudgetManager`, and `CostReporter` classes that track per-sample token consumption, enforce budget limits with configurable alert thresholds, and generate cost reports in JSON and JUnit XML formats.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-dataset
A Zod-validated dataset loader and validator for RAG evaluation samples, supporting JSONL, JSON, and YAML formats with duplicate detection, synthetic generation from templates, and version tracking. Exports `DatasetLoader`, `DatasetValidator`, and `loadEvalConfig` functions.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-gate
A quality gate engine for RAG evaluation pipelines that enforces threshold-based metric checks and baseline regression detection, returning a `GateResult` object with pass/fail status and per-gate failure messages. It pairs with `@reaatech/rag-eval-core` for evaluation result types and is designed for CI/CD integration with formatted output and configurable exit codes.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-judge
A TypeScript class (`JudgeEngine`) that uses an LLM (Anthropic, OpenAI, or Google) to score RAG outputs on metrics like faithfulness and relevance, with optional consensus voting across multiple models and calibration against human labels.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-mcp-server
An MCP server that exposes RAG evaluation tools as a three-layer API of atomic judge operations, orchestrated suite runs, and CI-style regression gates, providing `createMcpServer()` and `startMcpServer()` functions for integration with MCP clients like Claude Desktop or Cursor.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-metrics
Provides four heuristic metric scorers (faithfulness, relevance, context precision, context recall) for evaluating RAG outputs, plus a `MetricsEngine` orchestrator that runs them in parallel with configurable concurrency. Each scorer is a class with a `score` method that returns a numeric score and supporting details, using only NLP libraries (`compromise`, `natural`) with no LLM calls.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-observability
Provides structured JSON logging via Pino, OpenTelemetry tracing, and OpenTelemetry metrics specifically for RAG evaluation pipelines, exporting functions like `createLogger`, `traceEvalRun`, and `recordEvalRun`.
- status
- published
- published
- 1 month ago
@reaatech/rag-eval-suite
A class (`EvaluationSuite`) that orchestrates RAG evaluation runs by executing heuristic metrics, optional LLM judge scoring, cost tracking, and quality gates against a dataset, returning a `SuiteRunResult` with aggregated metrics and gate pass/fail status.
- status
- published
- published
- 1 month ago