Skip to content
reaatechREAATECH

@reaatech/rag-eval-core

pending npm

Provides TypeScript types and Zod schemas for defining RAG evaluation suites, including configurations for judges, cost tracking, and quality gates. It serves as a shared schema library for the `@reaatech/rag-eval-*` ecosystem, requiring only `zod` as a runtime dependency.

@reaatech/rag-eval-core

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Canonical TypeScript types and Zod schemas for RAG (Retrieval-Augmented Generation) evaluation. This package is the single source of truth for all evaluation shapes used throughout the @reaatech/rag-eval-* ecosystem.

Installation

terminal
npm install @reaatech/rag-eval-core
# or
pnpm add @reaatech/rag-eval-core

Feature Overview

  • 18+ exported typesEvaluationSample, EvalSuiteConfig, SampleEvalResult, EvalResults, GateConfig, JudgeConfig, and more
  • 2 Zod schemasEvaluationSampleSchema and EvalSuiteConfigSchema for runtime validation
  • Full cost accounting typesCostBreakdown, TokenUsage, PricingConfig for per-evaluation cost tracking
  • Gate configuration typesThresholdGateConfig, BaselineGateConfig with full operator support
  • Zero runtime dependencies beyond zod — lightweight and tree-shakeable
  • Dual ESM/CJS output — works with import and require

Quick Start

typescript
import {
  EvaluationSampleSchema,
  EvalSuiteConfigSchema,
  type EvaluationSample,
  type EvalSuiteConfig,
} from "@reaatech/rag-eval-core";
 
// Validate an evaluation sample at the boundary
const rawSample = JSON.parse(incomingJson);
const sample: EvaluationSample = EvaluationSampleSchema.parse(rawSample);
 
// Validate a suite configuration
const rawConfig = JSON.parse(configJson);
const config: EvalSuiteConfig = EvalSuiteConfigSchema.parse(rawConfig);

Exports

Core Types

ExportDescription
EvaluationSampleInput sample: query, context, ground_truth, generated_answer, optional retrieved_chunk_ids and metadata
EvalSuiteConfigFull evaluation suite configuration: metrics, judge, cost, execution, gates
SampleEvalResultPer-sample result with heuristic and judge scores for each metric
EvalResultsAggregated results: run_id, metrics, samples, total_cost, cost_breakdown, duration_ms
AggregatedMetricsAggregated metric scores: overall_score, avg_faithfulness, avg_relevance, mean + std dev per metric

Judge Types

ExportDescription
JudgeConfigJudge configuration: model, enabled, consensus, calibration, cost
ConsensusConfigMulti-model consensus: enabled, models with weights, voting_strategy, tie_breaker
CalibrationConfigCalibration settings: enabled, human_labels path, calibration_method
JudgeMetricMetric union: faithfulness | relevance | context_precision | context_recall | overall

Gate Types

ExportDescription
GateConfigUnion of ThresholdGateConfig | BaselineGateConfig
ThresholdGateConfig{ name, type: "threshold", metric, operator, threshold }
BaselineGateConfig{ name, type: "baseline-comparison", metric, allow_regression }
GateResultGate evaluation result: passed, gates[], failures[]
GateEvalResultPer-gate result: passed, gate_name, actual_value, message

Cost Types

ExportDescription
CostBreakdown{ total, by_metric, by_provider, per_sample }
TokenUsage{ input, output, total }
PricingConfigPer-model pricing: { model, provider, inputCostPerMillion, outputCostPerMillion }

Schemas

ExportDescription
EvaluationSampleSchemaZod schema for validating evaluation samples (query, context, ground_truth, generated_answer)
EvalSuiteConfigSchemaZod schema for validating suite configuration (metrics, judge, cost, gates, execution)

Usage Pattern

Every schema export has a matching type export. Use the schema for runtime validation and the type for compile-time checking:

typescript
import { EvaluationSampleSchema, type EvaluationSample } from "@reaatech/rag-eval-core";
 
function processSample(raw: unknown): EvaluationSample {
  // Parse at the boundary — throws ZodError on invalid data
  return EvaluationSampleSchema.parse(raw);
}

License

MIT