@reaatech/agent-eval-harness-types
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Canonical TypeScript domain types, Zod schemas, and interfaces for the agent-eval-harness ecosystem. This package is the foundational dependency of every other package in the monorepo.
Installation
npm install @reaatech/agent-eval-harness-types
# or
pnpm add @reaatech/agent-eval-harness-types
Feature Overview
19 domain type interfaces — Turn, Trajectory, EvalResult, JudgeScore, CostBreakdown, LatencyBudget, GoldenTrajectory, RegressionGate, and more
20 Zod schemas — runtime validation for every domain type with full type inference via z.infer
Zero runtime dependencies beyond zod
Dual ESM/CJS output — works with import and require
Golden trajectory markers — golden, expected, and quality_notes fields on every turn
CI gate types — threshold, baseline-comparison, and distribution gates with regression tracking
Suite runner types — configuration, run status, comparison, and metric regression interfaces
Quick Start
import { TurnSchema, type Trajectory, type EvalResult } from '@reaatech/agent-eval-harness-types' ;
const turn = TurnSchema. parse ({
turn_id: 1 ,
role: 'user' ,
content: 'Hello' ,
timestamp: '2026-04-15T00:00:00Z' ,
});
const trajectory : Trajectory = { turns: [turn], metadata: { total_turns: 1 } };
API Reference
Core Types
Name Type Description TurninterfaceSingle turn in a trajectory with role, content, timestamp, and optional tool calls, latency, and cost ToolCallinterfaceTool invocation with name, arguments, and optional result CostDatainterfaceToken usage and cost for a single turn TrajectoryinterfaceComplete agent execution with turns array and optional metadata EvalResultinterfaceEvaluation result with overall score, per-metric scores, and issues EvalIssueinterfaceIssue found during evaluation with type, severity, and description
Judge Types
Name Type Description JudgeScoreinterfaceLLM judge scoring result with score, explanation, confidence, and calibration status
Cost Types
Name Type Description CostBreakdowninterfaceFull cost breakdown for a trajectory with LLM, tool, and per-turn costs TurnCostinterfaceCost breakdown for a single turn with token counts
Latency Types
Name Type Description LatencyBudgetinterfaceLatency SLA budget with P50, P90, P99 thresholds and component breakdowns LatencyResultinterfaceLatency measurement result with percentiles, violations, and SLA status LatencyViolationinterfaceSLA violation record with turn ID, actual vs threshold values
Golden Types
Name Type Description GoldenTrajectoryinterfaceGolden reference trajectory with versioning and quality markers
Gate Types
Name Type Description RegressionGateinterfaceGate definition with threshold, baseline-comparison, or distribution types GateResultinterfaceSingle gate evaluation result with pass/fail and actual vs expected values
Suite Types
Name Type Description EvalSuiteConfiginterfaceSuite configuration with metrics, judge model, budgets, gates, and parallelism EvalRunStatusinterfaceSuite run progress with status, completion counts, and timing RunComparisoninterfaceComparison of two evaluation runs with metric diffs and significance testing MetricRegressioninterfaceSingle regression with baseline and candidate values and change percentage
Schemas
Name Type Description ToolCallSchemaZodObjectValidates tool invocation structure CostDataSchemaZodObjectValidates token counts and cost data TurnSchemaZodObjectValidates turn structure with optional tool calls, latency, and golden markers TrajectoryMetadataSchemaZodObjectValidates trajectory metadata TrajectorySchemaZodObjectValidates complete trajectory (minimum one turn, optional metadata) EvalIssueSchemaZodObjectValidates evaluation issue records EvalResultSchemaZodObjectValidates evaluation results with metrics and issues JudgeScoreSchemaZodObjectValidates judge scoring output CostBreakdownSchemaZodObjectValidates cost breakdowns with per-turn cost arrays LatencyBudgetSchemaZodObjectValidates latency budget configuration LatencyViolationSchemaZodObjectValidates latency SLA violations LatencyResultSchemaZodObjectValidates latency measurement results QualityMarkersSchemaZodObjectValidates golden trajectory quality markers GoldenTrajectorySchemaZodObjectValidates golden trajectories with nested trajectory and quality markers RegressionGateSchemaZodObjectValidates regression gate definitions GateResultSchemaZodObjectValidates gate evaluation results EvalSuiteConfigSchemaZodObjectValidates suite configuration with nested latency budget and gates EvalRunStatusSchemaZodObjectValidates suite run status MetricRegressionSchemaZodObjectValidates metric regression records RunComparisonSchemaZodObjectValidates run comparison results with statistical significance arrays
Related Packages
Package Description @reaatech/agent-eval-harness-types Shared domain types and Zod schemas @reaatech/agent-eval-harness-trajectory Trajectory loading, evaluation, and golden comparison @reaatech/agent-eval-harness-tool-use Tool-use validation and schema compliance @reaatech/agent-eval-harness-cost Cost tracking, budgets, and reporting @reaatech/agent-eval-harness-latency Latency monitoring, SLA enforcement, and optimization @reaatech/agent-eval-harness-judge LLM-as-judge with calibration and consensus @reaatech/agent-eval-harness-golden Golden trajectory management and curation @reaatech/agent-eval-harness-suite Suite runner, results aggregation, and comparison @reaatech/agent-eval-harness-gate CI regression gates with JUnit and GitHub output @reaatech/agent-eval-harness-mcp-server MCP server with three-layer tool architecture @reaatech/agent-eval-harness-cli Command-line interface @reaatech/agent-eval-harness-observability OTel tracing, metrics, structured logging, and dashboards
License
MIT