Skip to content
reaatechREAATECH

agent-eval-harness · packages

Every package shipped from reaatech/agent-eval-harness, published or pending.

13 packages · page 1 of 2

@reaatech/agent-eval-harness-cli

v0.1.0
This CLI provides a suite of commands for executing agent evaluation pipelines, managing golden trajectories, and enforcing CI quality gates. It also functions as an MCP server in stdio mode, exposing its evaluation tools to other AI agents.
status
published
published
7 days ago

@reaatech/agent-eval-harness-cost

v0.1.0
Calculates and enforces spending limits for AI agent trajectories by providing functions to compute token-based costs, compare performance, and trigger budget alerts. It exports a suite of utility functions that operate on trajectory objects to generate granular cost breakdowns and optimization recommendations across major LLM providers.
status
published
published
8 days ago

@reaatech/agent-eval-harness-gate

v0.1.0
Enforces CI/CD regression thresholds for AI agent performance, cost, and quality metrics. It provides a `GateEngine` class to evaluate agent results against configurable gates and generates JUnit XML, GitHub Actions annotations, and JSON summaries.
status
published
published
8 days ago

@reaatech/agent-eval-harness-golden

v0.1.0
Manages reference agent trajectories for regression testing through a collection of utility functions and a `GoldenCurator` class. It provides tools to create, annotate, and validate golden datasets, and includes a comparison engine to detect regressions by diffing candidate trajectories against these references.
status
published
published
8 days ago

@reaatech/agent-eval-harness-infra

pending npm
Provides a collection of Terraform modules and environment configurations for deploying the agent-eval-harness across AWS, Azure, GCP, OCI, Vercel, and Netlify. It requires Terraform 1.0+ and cloud-specific provider credentials to provision the necessary compute, database, and storage infrastructure.
status
awaiting publish

@reaatech/agent-eval-harness-judge

v0.1.0
Evaluates agent responses using LLM-as-a-judge patterns with support for multi-model consensus, automated calibration, and cost tracking. It provides a `JudgeEngine` class that interfaces with OpenAI-compatible providers to score faithfulness, relevance, and tool correctness.
status
published
published
8 days ago

@reaatech/agent-eval-harness-latency

v0.1.0
Computes latency metrics, enforces SLA budgets, and identifies performance bottlenecks for AI agent trajectories. It provides a suite of utility functions and a `LatencyTracker` class to analyze turn-level and component-specific timing data.
status
published
published
8 days ago

@reaatech/agent-eval-harness-mcp-server

v0.1.0
Exposes 13 evaluation tools for AI agents via the Model Context Protocol (MCP) using stdio transport. It provides a factory function to instantiate a server that handles atomic judgments, suite orchestration, and CI gate operations.
status
published
published
8 days ago

@reaatech/agent-eval-harness-observability

v0.1.0
Provides OpenTelemetry instrumentation, Pino-based structured logging with PII redaction, and an in-memory dashboard manager for tracking agent evaluation pipelines. It exposes a set of singleton managers for recording metrics, tracing execution spans, and aggregating performance trends.
status
published
published
8 days ago

@reaatech/agent-eval-harness-suite

v0.1.0
Executes batch evaluations of agent trajectories using a YAML-configured runner class that aggregates multi-metric scores and performs statistical regression analysis between runs. It requires an external evaluator function and trajectory data to process concurrent test suites.
status
published
published
8 days ago

@reaatech/agent-eval-harness-tool-use

v0.1.0
Validates agent tool-use trajectories by checking schema compliance, argument accuracy, and result integration. It provides a set of utility functions to evaluate individual tool calls or full conversation turns against defined tool schemas.
status
published
published
8 days ago

@reaatech/agent-eval-harness-trajectory

v0.1.0
Provides utilities for loading, validating, and evaluating agent conversation trajectories from JSONL files. It exports functions for parsing data, calculating coherence and goal completion metrics, and comparing candidate trajectories against golden references, requiring `@reaatech/agent-eval-harness-types` for schema validation.
status
published
published
8 days ago