Skip to content
reaatechREAATECH

@reaatech/agent-eval-harness-trajectory

npm v0.1.0

Provides utilities for loading, validating, and evaluating agent conversation trajectories from JSONL files. It exports functions for parsing data, calculating coherence and goal completion metrics, and comparing candidate trajectories against golden references, requiring `@reaatech/agent-eval-harness-types` for schema validation.

@reaatech/agent-eval-harness-trajectory

npm version License CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Trajectory loading, multi-turn evaluation, and golden-comparison utilities for agent conversation analysis. Parses JSONL turn files, scores coherence and goal completion, and diffs candidate trajectories against golden references.

Installation

terminal
npm install @reaatech/agent-eval-harness-trajectory
# or
pnpm add @reaatech/agent-eval-harness-trajectory

Feature Overview

  • JSONL loader — parse, validate, and serialize trajectory files with full Zod schema validation via @reaatech/agent-eval-harness-types
  • Multi-turn evaluation — coherence analysis, goal completion scoring, and conversation flow metrics
  • Golden comparison — diff candidate trajectories against reference with regression detection and improvement tracking
  • Directory batch load — load and validate all .jsonl files in a directory in a single call
  • Custom error typesTrajectoryLoadError with file path and cause tracking for precise debugging

Quick Start

typescript
import { loadFromContent, evaluate, compare } from '@reaatech/agent-eval-harness-trajectory';
import type { Trajectory } from '@reaatech/agent-eval-harness-types';
 
const jsonl =
  '{"turn_id":1,"role":"user","content":"Reset password","timestamp":"2026-04-15T00:00:00Z"}\n' +
  '{"turn_id":1,"role":"agent","content":"What\'s your email?","tool_calls":[],"timestamp":"2026-04-15T00:00:01Z"}';
 
const trajectory = loadFromContent(jsonl);
const result = evaluate(trajectory);
console.log(`Score: ${result.overall_score}, Coherence: ${result.metrics.coherence}`);

API Reference

Loader Functions

NameTypeDescription
parseTurn(line, lineNumber)(string, number) => TurnParse a single JSONL line into a validated Turn object
loadFromContent(content, options?)(string, LoadOptions?) => TrajectoryLoad a trajectory from a JSONL content string
loadFromFile(filePath, options?)(string, LoadOptions?) => Promise<Trajectory>Load a trajectory from a .jsonl file on disk
loadFromDirectory(dirPath, options?)(string, LoadOptions?) => Promise<Trajectory[]>Load all .jsonl files in a directory
serializeToJsonl(trajectory)(Trajectory) => stringSerialize a trajectory to JSONL string format
saveToFile(trajectory, filePath)(Trajectory, string) => Promise<void>Save a trajectory to a .jsonl file

Loader Types

NameTypeDescription
LoadOptionsinterfaceOptions with validate (boolean, default true) and generateId (boolean, default true)
TrajectoryLoadErrorclass extends ErrorCustom error with cause, filePath, and descriptive message

Evaluator Functions

NameTypeDescription
evaluate(trajectory, options?)(Trajectory, EvaluateOptions?) => EvalResultFull trajectory evaluation returning overall score, per-metric scores, and issues
analyzeCoherence(trajectory)(Trajectory) => CoherenceResultMulti-turn coherence analysis with per-transition scoring
analyzeGoalCompletion(trajectory)(Trajectory) => GoalCompletionResultDetermine if the agent completed the user’s goal with confidence scoring
analyzeConversationFlow(trajectory)(Trajectory) => FlowAnalysisConversation flow analysis with topic changes, interruptions, and flow score

Evaluator Types

NameTypeDescription
EvaluateOptionsinterfaceOptions with checkCoherence, checkGoalCompletion, analyzeFlow, and coherenceThreshold
CoherenceResultinterfaceCoherence score, issues list, and per-transition analysis
TurnTransitioninterfaceSingle turn transition with from, to, coherent, and optional reason
GoalCompletionResultinterfaceCompletion status, confidence, evidence array, and unresolved turn IDs
FlowAnalysisinterfaceFlow metrics: avg turns per topic, topic changes, interruptions, and flow score

Comparator

NameTypeDescription
compare(candidate, golden, options?)(Trajectory, GoldenTrajectory | Trajectory, CompareOptions?) => ComparisonResultCompare a candidate trajectory against a golden reference

Comparator Types

NameTypeDescription
CompareOptionsinterfaceOptions with similarityThreshold (default 0.85), compareTools, compareLatency, compareCosts, and strict
ComparisonResultinterfaceOverall similarity, pass/fail, diff, regressions, improvements, and per-turn comparisons
TrajectoryDiffinterfaceDetailed diff with missing turns, extra turns, modified turns, and tool differences
TurnDiffinterfaceDifference in a single turn field with expected and actual values
ToolDiffinterfaceTool call difference with expected/actual tool names and argument differences
ArgumentDiffinterfaceSingle argument difference with expected and actual values
RegressioninterfaceRegression record with type, severity, description, turn ID, and impact score
ImprovementinterfaceImprovement record with type, description, turn ID, and benefit score
TurnComparisoninterfacePer-turn comparison with similarity, match status, and differences
PackageDescription
@reaatech/agent-eval-harness-typesShared domain types and Zod schemas
@reaatech/agent-eval-harness-trajectoryTrajectory loading, evaluation, and golden comparison
@reaatech/agent-eval-harness-tool-useTool-use validation and schema compliance
@reaatech/agent-eval-harness-costCost tracking, budgets, and reporting
@reaatech/agent-eval-harness-latencyLatency monitoring, SLA enforcement, and optimization
@reaatech/agent-eval-harness-judgeLLM-as-judge with calibration and consensus
@reaatech/agent-eval-harness-goldenGolden trajectory management and curation
@reaatech/agent-eval-harness-suiteSuite runner, results aggregation, and comparison
@reaatech/agent-eval-harness-gateCI regression gates with JUnit and GitHub output
@reaatech/agent-eval-harness-mcp-serverMCP server with three-layer tool architecture
@reaatech/agent-eval-harness-cliCommand-line interface
@reaatech/agent-eval-harness-observabilityOTel tracing, metrics, structured logging, and dashboards

License

MIT