Skip to content
reaatechREAATECH

@reaatech/rag-eval-dataset

pending npm

Manages RAG evaluation datasets by providing classes to load, validate, and version-track samples from JSON, JSONL, and YAML files. It relies on Zod for schema enforcement and integrates with @reaatech/rag-eval-core for sample definitions.

@reaatech/rag-eval-dataset

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Dataset management utilities for RAG evaluation. Loads evaluation samples from JSONL, JSON, and YAML files; validates samples against Zod schemas; generates synthetic datasets from templates; and tracks dataset versioning.

Installation

terminal
npm install @reaatech/rag-eval-dataset
# or
pnpm add @reaatech/rag-eval-dataset

Feature Overview

  • Multi-format loading — read evaluation samples from JSONL, JSON array, and YAML files
  • Schema validation — validate every sample against EvaluationSampleSchema with detailed error reporting
  • Duplicate detection — identify duplicate queries and context sets across samples
  • Synthetic generation — generate evaluation datasets from templates with configurable difficulty
  • Version tracking — maintain dataset changelogs and version identifiers
  • Config loading — load evaluation suite configurations from YAML or JSON files

Quick Start

typescript
import { DatasetLoader, DatasetValidator } from "@reaatech/rag-eval-dataset";
 
const loader = new DatasetLoader();
 
// Load samples from any supported format
const samples = await loader.load("datasets/eval-samples.jsonl");
console.log(`Loaded ${samples.length} samples`);
 
// Validate the dataset
const validator = new DatasetValidator();
const result = validator.validate(samples);
 
if (!result.valid) {
  for (const error of result.errors) {
    console.error(`[${error.field}] ${error.message}`);
  }
}

API Reference

DatasetLoader

Loads evaluation datasets from files or strings.

typescript
import { DatasetLoader } from "@reaatech/rag-eval-dataset";
 
const loader = new DatasetLoader();

Loading Methods

MethodReturnsDescription
load(path: string)Promise<EvaluationSample[]>Auto-detect format from file extension and load
loadFromString(content, format)Promise<EvaluationSample[]>Parse content string in specified format (jsonl" | "json)

Supported Formats

FormatExtensionStructure
JSONL.jsonlOne JSON object per line
JSON.jsonArray of sample objects
YAML.yaml, .ymlArray of sample objects

Each sample is validated against EvaluationSampleSchema from @reaatech/rag-eval-core. Invalid lines in JSONL files are skipped with a warning.

Config Loading

typescript
import { loadEvalConfig } from "@reaatech/rag-eval-dataset";
 
const config = await loadEvalConfig("eval-config.yaml");
// → EvalSuiteConfig with metrics, judge, cost, gates, execution
ExportDescription
loadEvalConfig(path)Load and validate an EvalSuiteConfig from YAML or JSON

DatasetValidator

Validates datasets for structural correctness and quality issues.

typescript
import { DatasetValidator } from "@reaatech/rag-eval-dataset";
 
const validator = new DatasetValidator();
const result = validator.validate(samples);

ValidationResult

PropertyTypeDescription
validbooleanWhether the dataset passed all checks
errorsValidationError[]Errors found (empty if valid)
warningsValidationWarning[]Non-blocking warnings

ValidationError

PropertyTypeDescription
fieldstringField name or sample index
messagestringHuman-readable error description

Validations Performed

  • Schema compliance — every sample matches EvaluationSampleSchema
  • Required fieldsquery, context, ground_truth, generated_answer
  • Non-empty context — context arrays must contain at least one chunk
  • Non-empty dataset — dataset must contain at least one sample
  • Duplicate detection — identical queries with matching context triggers a warning

DatasetGenerator

Generates synthetic evaluation datasets from templates.

typescript
import { DatasetGenerator } from "@reaatech/rag-eval-dataset";
 
const generator = new DatasetGenerator();
const samples = generator.generate({
  templates: myTemplates,
  count: 100,
  difficulty: "medium",
  domain: "customer-support",
});

GeneratorConfig

PropertyTypeDefaultDescription
templatesDatasetTemplate[](required)Templates for sample generation
countnumber10Number of samples to generate
difficultyeasy" | "medium" | "hardmediumDifficulty level
domainstringDomain label for metadata

DatasetVersioning

Tracks dataset version history and changelogs.

typescript
import { DatasetVersioning } from "@reaatech/rag-eval-dataset";
 
const versioning = new DatasetVersioning();
 
// Record a new version
versioning.addVersion({
  version: "v1.1.0",
  description: "Added 50 new e-commerce samples",
  timestamp: new Date().toISOString(),
});
 
// Get version history
const history = versioning.getHistory();

Usage Patterns

Loading and Validating a Dataset

typescript
import { DatasetLoader, DatasetValidator } from "@reaatech/rag-eval-dataset";
 
const loader = new DatasetLoader();
const validator = new DatasetValidator();
 
try {
  const samples = await loader.load("eval-dataset.jsonl");
  const result = validator.validate(samples);
 
  if (!result.valid) {
    console.error("Dataset validation failed:");
    for (const error of result.errors) {
      console.error(`  - ${error.message}`);
    }
    process.exit(1);
  }
 
  console.log(`Ready to evaluate ${samples.length} samples`);
} catch (err) {
  console.error("Failed to load dataset:", err);
  process.exit(1);
}

Loading Config from YAML

yaml
# eval-config.yaml
metrics:
  - faithfulness
  - relevance
  - context_precision
  - context_recall
 
judge:
  model: claude-opus
  enabled: true
 
cost:
  budget_limit: 10.00
 
gates:
  - name: min-faithfulness
    type: threshold
    metric: avg_faithfulness
    operator: ">="
    threshold: 0.85
typescript
import { loadEvalConfig } from "@reaatech/rag-eval-dataset";
 
const config = await loadEvalConfig("eval-config.yaml");

License

MIT