Manages reference agent trajectories for regression testing through a collection of utility functions and a `GoldenCurator` class. It provides tools to create, annotate, and validate golden datasets, and includes a comparison engine to detect regressions by diffing candidate trajectories against these references.
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Golden trajectory management for agent evaluation regression testing. Create, annotate, validate, and curate reference trajectories, then compare candidate agent runs against them with detailed diff analysis and regression detection.
Installation
terminal
npm install @reaatech/agent-eval-harness-golden
Feature Overview
Golden trajectory CRUD — load, create, update, and filter reference trajectories by tags and scenarios
Annotation workflow — mark expected turns, add quality notes, tag golden trajectories for organization
Batch comparison — compare multiple candidates against a library of golden references
Quick Start
typescript
import { createGolden, compareAgainstGolden, quickCreateGolden } from '@reaatech/agent-eval-harness-golden';import type { Trajectory } from '@reaatech/agent-eval-harness-types';// Quick creation for simple scenariosconst golden = quickCreateGolden(trajectory, 'password-reset', ['auth', 'critical']);// Compare a new run against the goldenconst result = compareAgainstGolden(golden, candidateTrajectory, { similarityThreshold: 0.85 });console.log(`Similarity: ${result.similarity}, Regressions: ${result.regressions.length}`);
API Reference
Golden Manager
Name
Type
Description
loadGoldenTrajectories(jsonlContent)
function
Parse JSONL string into an array of GoldenTrajectory objects
validateGolden(golden)
function
Validate a golden trajectory structure; returns { valid, errors, warnings, score }
goldenToJSONL(golden)
function
Serialize a golden trajectory back to JSONL string format
createGolden(trajectory, options)
function
Create a new golden trajectory from a candidate trajectory with metadata options
updateGolden(golden, changes)
function
Update a golden trajectory’s metadata and bump the updatedAt timestamp
filterByTags(goldens, tags)
function
Filter golden trajectories by tag intersection
getByScenario(goldens, scenario)
function
Search golden trajectories by scenario name (description or trajectory ID match)
Comparison Engine
Name
Type
Description
compareAgainstGolden(golden, candidate, config?)
function
Compare a candidate trajectory against a golden; returns TrajectoryComparisonResult
batchCompare(golden, candidates, config?)
function
Compare multiple candidates against a single golden in one call
findBestGolden(candidate, goldens, config?)
function
Find the best-matching golden for a candidate across a golden library