agent-eval-harness · packages

Every package shipped from reaatech/agent-eval-harness, published or pending.

Sort

13 packages · page 1 of 2

@reaatech/agent-eval-harness-cli

v0.1.0

This CLI provides a suite of commands for executing agent evaluation pipelines, managing golden trajectories, and enforcing CI quality gates. It also functions as an MCP server in stdio mode, exposing its evaluation tools to other AI agents.

View package View on npm

status: published
published: 7 days ago

agent-eval-harness-cost

@reaatech/agent-eval-harness-cost

v0.1.0

Calculates and enforces spending limits for AI agent trajectories by providing functions to compute token-based costs, compare performance, and trigger budget alerts. It exports a suite of utility functions that operate on trajectory objects to generate granular cost breakdowns and optimization recommendations across major LLM providers.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-gate

@reaatech/agent-eval-harness-gate

v0.1.0

Enforces CI/CD regression thresholds for AI agent performance, cost, and quality metrics. It provides a `GateEngine` class to evaluate agent results against configurable gates and generates JUnit XML, GitHub Actions annotations, and JSON summaries.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-golden

@reaatech/agent-eval-harness-golden

v0.1.0

Manages reference agent trajectories for regression testing through a collection of utility functions and a `GoldenCurator` class. It provides tools to create, annotate, and validate golden datasets, and includes a comparison engine to detect regressions by diffing candidate trajectories against these references.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-infra

@reaatech/agent-eval-harness-infra

pending npm

Provides a collection of Terraform modules and environment configurations for deploying the agent-eval-harness across AWS, Azure, GCP, OCI, Vercel, and Netlify. It requires Terraform 1.0+ and cloud-specific provider credentials to provision the necessary compute, database, and storage infrastructure.

View package

status: awaiting publish

agent-eval-harness-judge

@reaatech/agent-eval-harness-judge

v0.1.0

Evaluates agent responses using LLM-as-a-judge patterns with support for multi-model consensus, automated calibration, and cost tracking. It provides a `JudgeEngine` class that interfaces with OpenAI-compatible providers to score faithfulness, relevance, and tool correctness.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-latency

@reaatech/agent-eval-harness-latency

v0.1.0

Computes latency metrics, enforces SLA budgets, and identifies performance bottlenecks for AI agent trajectories. It provides a suite of utility functions and a `LatencyTracker` class to analyze turn-level and component-specific timing data.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-mcp-server

@reaatech/agent-eval-harness-mcp-server

v0.1.0

Exposes 13 evaluation tools for AI agents via the Model Context Protocol (MCP) using stdio transport. It provides a factory function to instantiate a server that handles atomic judgments, suite orchestration, and CI gate operations.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-observability

@reaatech/agent-eval-harness-observability

v0.1.0

Provides OpenTelemetry instrumentation, Pino-based structured logging with PII redaction, and an in-memory dashboard manager for tracking agent evaluation pipelines. It exposes a set of singleton managers for recording metrics, tracing execution spans, and aggregating performance trends.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-suite

@reaatech/agent-eval-harness-suite

v0.1.0

Executes batch evaluations of agent trajectories using a YAML-configured runner class that aggregates multi-metric scores and performs statistical regression analysis between runs. It requires an external evaluator function and trajectory data to process concurrent test suites.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-tool-use

@reaatech/agent-eval-harness-tool-use

v0.1.0

Validates agent tool-use trajectories by checking schema compliance, argument accuracy, and result integration. It provides a set of utility functions to evaluate individual tool calls or full conversation turns against defined tool schemas.

View package View on npm

status: published
published: 8 days ago

agent-eval-harness-trajectory

@reaatech/agent-eval-harness-trajectory

v0.1.0

Provides utilities for loading, validating, and evaluating agent conversation trajectories from JSONL files. It exports functions for parsing data, calculating coherence and goal completion metrics, and comparing candidate trajectories against golden references, requiring `@reaatech/agent-eval-harness-types` for schema validation.

View package View on npm

status: published
published: 8 days ago

Back to agent-eval-harness