/products/evals-quality/agent-eval-harness
reaatech/agent-eval-harness
SEED
Packages
@reaatech/agent-eval-harness-gate
pnpm add @reaatech/agent-eval-harness-gate
@reaatech/agent-eval-harness-latency
@reaatech/agent-eval-harness-latency provides turn-level and trajectory-level latency monitoring for AI agents, computing P50/P90/P99 percentiles, detecting anomalies, and generating optimization recommendations with SLA enforcement. It depends on standard npm runtime libraries for statistical calculations and configuration management.
pnpm add @reaatech/agent-eval-harness-latency
@reaatech/agent-eval-harness-tool-use
# @reaatech/agent-eval-harness-tool-use
[](https://www.npmjs.com/package/@reaatech/agent-eval-harness-tool-use)
[](https://opensource.org/licenses/MI
pnpm add @reaatech/agent-eval-harness-tool-use
@reaatech/agent-eval-harness-cli
pnpm add @reaatech/agent-eval-harness-cli
@reaatech/agent-eval-harness-cost
This package provides per-task cost calculation, budget enforcement, and cost reporting for AI agent trajectories, tracking LLM token usage and tool invocation costs across 8 supported models with configurable pricing and 3-tier budget alerting. It depends on `@reaatech/agent-eval-harness-types` for trajectory type definitions.
pnpm add @reaatech/agent-eval-harness-cost
@reaatech/agent-eval-harness-golden
This package provides tools for creating, annotating, curating, and comparing golden reference trajectories against candidate agent runs with diff analysis and regression detection, depending on `@reaatech/agent-eval-harness-types` for trajectory type definitions.
pnpm add @reaatech/agent-eval-harness-golden
@reaatech/agent-eval-harness-infra
Provides Terraform configurations for deploying the agent-eval-harness across multiple cloud providers (AWS, Azure, GCP, OCI, Netlify, Vercel), with reusable modules for compute, database, cache, storage,
pnpm add @reaatech/agent-eval-harness-infra
@reaatech/agent-eval-harness-judge
@reaatech/agent-eval-harness-judge is a provider-agnostic LLM-as-judge engine that scores agent responses on faithfulness, relevance, tool correctness, and overall quality, supporting Claude, GPT-4, Gemini, and any OpenAI-compatible provider. It depends on the corresponding LLM SDKs (Anthropic, OpenAI, Google Generative AI) and provides calibration, multi-model consensus, rate limiting, and cost tracking.
pnpm add @reaatech/agent-eval-harness-judge
@reaatech/agent-eval-harness-mcp-server
pnpm add @reaatech/agent-eval-harness-mcp-server
@reaatech/agent-eval-harness-observability
pnpm add @reaatech/agent-eval-harness-observability
@reaatech/agent-eval-harness-suite
pnpm add @reaatech/agent-eval-harness-suite
@reaatech/agent-eval-harness-trajectory
# @reaatech/agent-eval-harness-trajectory
[](https://www.npmjs.com/package/@reaatech/agent-eval-harness-trajectory)
[](https://github.com/reaatech/agent-eval-
pnpm add @reaatech/agent-eval-harness-trajectory
@reaatech/agent-eval-harness-types
# @reaatech/agent-eval-harness-types
[](https://www.npmjs.com/package/@reaatech/agent-eval-harness-types)
[](https://github.com/reaatech/agent-eval-harness/blob/ma
pnpm add @reaatech/agent-eval-harness-types
