/products/evals-quality/agent-eval-harness

reaatech/agent-eval-harness

★ 0Last commit: Apr 10, 2026MITTypeScriptGitHub →

SEED

Packages

@reaatech/agent-eval-harness-gate

pnpm add @reaatech/agent-eval-harness-gate

@reaatech/agent-eval-harness-latency

@reaatech/agent-eval-harness-latency provides turn-level and trajectory-level latency monitoring for AI agents, computing P50/P90/P99 percentiles, detecting anomalies, and generating optimization recommendations with SLA enforcement. It depends on standard npm runtime libraries for statistical calculations and configuration management.

pnpm add @reaatech/agent-eval-harness-latency

@reaatech/agent-eval-harness-tool-use

# @reaatech/agent-eval-harness-tool-use [![npm version](https://img.shields.io/npm/v/@reaatech/agent-eval-harness-tool-use)](https://www.npmjs.com/package/@reaatech/agent-eval-harness-tool-use) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MI

pnpm add @reaatech/agent-eval-harness-tool-use

@reaatech/agent-eval-harness-cli

pnpm add @reaatech/agent-eval-harness-cli

@reaatech/agent-eval-harness-cost

This package provides per-task cost calculation, budget enforcement, and cost reporting for AI agent trajectories, tracking LLM token usage and tool invocation costs across 8 supported models with configurable pricing and 3-tier budget alerting. It depends on `@reaatech/agent-eval-harness-types` for trajectory type definitions.

pnpm add @reaatech/agent-eval-harness-cost

@reaatech/agent-eval-harness-golden

This package provides tools for creating, annotating, curating, and comparing golden reference trajectories against candidate agent runs with diff analysis and regression detection, depending on `@reaatech/agent-eval-harness-types` for trajectory type definitions.

pnpm add @reaatech/agent-eval-harness-golden

@reaatech/agent-eval-harness-infra

Provides Terraform configurations for deploying the agent-eval-harness across multiple cloud providers (AWS, Azure, GCP, OCI, Netlify, Vercel), with reusable modules for compute, database, cache, storage,

pnpm add @reaatech/agent-eval-harness-infra

@reaatech/agent-eval-harness-judge

@reaatech/agent-eval-harness-judge is a provider-agnostic LLM-as-judge engine that scores agent responses on faithfulness, relevance, tool correctness, and overall quality, supporting Claude, GPT-4, Gemini, and any OpenAI-compatible provider. It depends on the corresponding LLM SDKs (Anthropic, OpenAI, Google Generative AI) and provides calibration, multi-model consensus, rate limiting, and cost tracking.

pnpm add @reaatech/agent-eval-harness-judge

@reaatech/agent-eval-harness-mcp-server

pnpm add @reaatech/agent-eval-harness-mcp-server

@reaatech/agent-eval-harness-observability

pnpm add @reaatech/agent-eval-harness-observability

@reaatech/agent-eval-harness-suite

pnpm add @reaatech/agent-eval-harness-suite

@reaatech/agent-eval-harness-trajectory

# @reaatech/agent-eval-harness-trajectory [![npm version](https://img.shields.io/npm/v/@reaatech/agent-eval-harness-trajectory)](https://www.npmjs.com/package/@reaatech/agent-eval-harness-trajectory) [![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/reaatech/agent-eval-

pnpm add @reaatech/agent-eval-harness-trajectory

@reaatech/agent-eval-harness-types

# @reaatech/agent-eval-harness-types [![npm version](https://img.shields.io/npm/v/@reaatech/agent-eval-harness-types)](https://www.npmjs.com/package/@reaatech/agent-eval-harness-types) [![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/reaatech/agent-eval-harness/blob/ma

pnpm add @reaatech/agent-eval-harness-types

← Back to evals-quality