Skip to content
reaatech

@reaatech/agent-eval-harness-types

npm v0.1.0

Canonical TypeScript domain types, Zod schemas, and interfaces for the agent-eval-harness ecosystem. Exports 19 interfaces (`Turn`, `Trajectory`, `EvalResult`, `JudgeScore`, `CostBreakdown`, `LatencyBudget`, `GoldenTrajectory`, `RegressionGate`, and more) plus 20 Zod schemas with full type inference, with no runtime dependencies beyond `zod`.

@reaatech/agent-eval-harness-types

npm version License CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Canonical TypeScript domain types, Zod schemas, and interfaces for the agent-eval-harness ecosystem. This package is the foundational dependency of every other package in the monorepo.

Installation

terminal
npm install @reaatech/agent-eval-harness-types
# or
pnpm add @reaatech/agent-eval-harness-types

Feature Overview

  • 19 domain type interfacesTurn, Trajectory, EvalResult, JudgeScore, CostBreakdown, LatencyBudget, GoldenTrajectory, RegressionGate, and more
  • 20 Zod schemas — runtime validation for every domain type with full type inference via z.infer
  • Zero runtime dependencies beyond zod
  • Dual ESM/CJS output — works with import and require
  • Golden trajectory markersgolden, expected, and quality_notes fields on every turn
  • CI gate types — threshold, baseline-comparison, and distribution gates with regression tracking
  • Suite runner types — configuration, run status, comparison, and metric regression interfaces

Quick Start

typescript
import { TurnSchema, type Trajectory, type EvalResult } from '@reaatech/agent-eval-harness-types';
 
const turn = TurnSchema.parse({
  turn_id: 1,
  role: 'user',
  content: 'Hello',
  timestamp: '2026-04-15T00:00:00Z',
});
 
const trajectory: Trajectory = { turns: [turn], metadata: { total_turns: 1 } };

API Reference

Core Types

NameTypeDescription
TurninterfaceSingle turn in a trajectory with role, content, timestamp, and optional tool calls, latency, and cost
ToolCallinterfaceTool invocation with name, arguments, and optional result
CostDatainterfaceToken usage and cost for a single turn
TrajectoryinterfaceComplete agent execution with turns array and optional metadata
EvalResultinterfaceEvaluation result with overall score, per-metric scores, and issues
EvalIssueinterfaceIssue found during evaluation with type, severity, and description

Judge Types

NameTypeDescription
JudgeScoreinterfaceLLM judge scoring result with score, explanation, confidence, and calibration status

Cost Types

NameTypeDescription
CostBreakdowninterfaceFull cost breakdown for a trajectory with LLM, tool, and per-turn costs
TurnCostinterfaceCost breakdown for a single turn with token counts

Latency Types

NameTypeDescription
LatencyBudgetinterfaceLatency SLA budget with P50, P90, P99 thresholds and component breakdowns
LatencyResultinterfaceLatency measurement result with percentiles, violations, and SLA status
LatencyViolationinterfaceSLA violation record with turn ID, actual vs threshold values

Golden Types

NameTypeDescription
GoldenTrajectoryinterfaceGolden reference trajectory with versioning and quality markers

Gate Types

NameTypeDescription
RegressionGateinterfaceGate definition with threshold, baseline-comparison, or distribution types
GateResultinterfaceSingle gate evaluation result with pass/fail and actual vs expected values

Suite Types

NameTypeDescription
EvalSuiteConfiginterfaceSuite configuration with metrics, judge model, budgets, gates, and parallelism
EvalRunStatusinterfaceSuite run progress with status, completion counts, and timing
RunComparisoninterfaceComparison of two evaluation runs with metric diffs and significance testing
MetricRegressioninterfaceSingle regression with baseline and candidate values and change percentage

Schemas

NameTypeDescription
ToolCallSchemaZodObjectValidates tool invocation structure
CostDataSchemaZodObjectValidates token counts and cost data
TurnSchemaZodObjectValidates turn structure with optional tool calls, latency, and golden markers
TrajectoryMetadataSchemaZodObjectValidates trajectory metadata
TrajectorySchemaZodObjectValidates complete trajectory (minimum one turn, optional metadata)
EvalIssueSchemaZodObjectValidates evaluation issue records
EvalResultSchemaZodObjectValidates evaluation results with metrics and issues
JudgeScoreSchemaZodObjectValidates judge scoring output
CostBreakdownSchemaZodObjectValidates cost breakdowns with per-turn cost arrays
LatencyBudgetSchemaZodObjectValidates latency budget configuration
LatencyViolationSchemaZodObjectValidates latency SLA violations
LatencyResultSchemaZodObjectValidates latency measurement results
QualityMarkersSchemaZodObjectValidates golden trajectory quality markers
GoldenTrajectorySchemaZodObjectValidates golden trajectories with nested trajectory and quality markers
RegressionGateSchemaZodObjectValidates regression gate definitions
GateResultSchemaZodObjectValidates gate evaluation results
EvalSuiteConfigSchemaZodObjectValidates suite configuration with nested latency budget and gates
EvalRunStatusSchemaZodObjectValidates suite run status
MetricRegressionSchemaZodObjectValidates metric regression records
RunComparisonSchemaZodObjectValidates run comparison results with statistical significance arrays
PackageDescription
@reaatech/agent-eval-harness-typesShared domain types and Zod schemas
@reaatech/agent-eval-harness-trajectoryTrajectory loading, evaluation, and golden comparison
@reaatech/agent-eval-harness-tool-useTool-use validation and schema compliance
@reaatech/agent-eval-harness-costCost tracking, budgets, and reporting
@reaatech/agent-eval-harness-latencyLatency monitoring, SLA enforcement, and optimization
@reaatech/agent-eval-harness-judgeLLM-as-judge with calibration and consensus
@reaatech/agent-eval-harness-goldenGolden trajectory management and curation
@reaatech/agent-eval-harness-suiteSuite runner, results aggregation, and comparison
@reaatech/agent-eval-harness-gateCI regression gates with JUnit and GitHub output
@reaatech/agent-eval-harness-mcp-serverMCP server with three-layer tool architecture
@reaatech/agent-eval-harness-cliCommand-line interface
@reaatech/agent-eval-harness-observabilityOTel tracing, metrics, structured logging, and dashboards

License

MIT