Skip to content
reaatechREAATECH

@reaatech/agent-eval-harness-latency

npm v0.1.0

Computes latency metrics, enforces SLA budgets, and identifies performance bottlenecks for AI agent trajectories. It provides a suite of utility functions and a `LatencyTracker` class to analyze turn-level and component-specific timing data.

@reaatech/agent-eval-harness-latency

npm version license CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Turn-level and trajectory-level latency monitoring with SLA enforcement and optimization analysis. Computes P50/P90/P99 percentiles, detects anomalies, and provides actionable bottleneck recommendations for AI agent latency budgets.

Installation

terminal
npm install @reaatech/agent-eval-harness-latency

Feature Overview

  • Percentile computation — P50, P90, P99 latency metrics computed per turn and aggregated across the full trajectory
  • Component breakdown — Separates LLM call latency from tool invocation latency and system overhead for targeted optimization
  • SLA enforcement — Configurable per-turn and per-trajectory latency thresholds with severity-graded violation detection and early-warning signals
  • Three latency presetsstrict (P50: 500ms, P90: 1000ms, P99: 2000ms), moderate (P50: 1000ms, P90: 2000ms, P99: 5000ms), lenient (P50: 2000ms, P90: 4000ms, P99: 10000ms)
  • Anomaly detection — Identifies outlier turns whose latency exceeds a configurable multiplier of the average, with a minimum 1000ms floor
  • Optimization analysis — Ranked bottleneck identification (LLM call, tool invocation, overhead, total) with priority-ordered recommendations covering model selection, batching, streaming, caching, prompt shortening, and turn reduction
  • Latency trend trackingLatencyTracker class records history and computes improvement trends across evaluation runs

Quick Start

typescript
import { monitorLatency, enforceBudget, createLatencyBudget } from '@reaatech/agent-eval-harness-latency';
import type { Trajectory } from '@reaatech/agent-eval-harness-types';
 
// Assume trajectory loaded from JSONL
const result = monitorLatency(trajectory);
console.log(`P50: ${result.p50Ms}ms, P99: ${result.p99Ms}ms, Total: ${result.totalLatencyMs}ms`);
 
const budget = createLatencyBudget('moderate');
const enforcement = enforceBudget(result, budget);
console.log(`Within SLA: ${enforcement.passed}, Violations: ${enforcement.violations.length}`);

API Reference

Monitoring Functions

ExportSignatureDescription
monitorLatency(trajectory: Trajectory) => LatencyResultExtracts per-turn latency from agent turns, computes P50/P90/P99 percentiles, total, average, min, and max latency
getComponentBreakdown(result: LatencyResult) => ComponentBreakdownBreaks down latency into average and total LLM call, tool invocation, and overhead components
compareLatency(baseline: LatencyResult, candidate: LatencyResult) => { avgDiffMs, p99DiffMs, faster, percentageChange }Compares two latency results and returns differences with directional indication
detectAnomalies(result: LatencyResult, thresholdMultiplier?: number) => TurnLatency[]Returns turns where latency exceeds avgLatencyMs * thresholdMultiplier (default 2x) and is above 1000ms

Budget Enforcement Functions

ExportSignatureDescription
enforceBudget(result: LatencyResult, budget: LatencyBudget) => BudgetEnforcementResultValidates latency result against budget thresholds, returns violations, warnings, and a composite score (0–1)
createLatencyBudget(preset: 'strict' | 'moderate' | 'lenient') => LatencyBudgetReturns a pre-configured budget with P50/P90/P99 max turn, trajectory total, and component thresholds
formatLatency(ms: number) => stringFormats milliseconds into human-readable strings: ms, s, or m

Optimization Functions

ExportSignatureDescription
analyzeOptimization(result: LatencyResult, trajectory?: Trajectory) => OptimizationResultIdentifies bottlenecks, generates ranked recommendations with estimated improvement, and computes an optimization score
LatencyTrackerclassMaintains latency history, computes trends (getTrend()), average scores (getAverageScore()), and history retrieval (getHistory())

Types

LatencyBudget

FieldTypeDescription
p50number?Maximum allowed P50 latency in ms
p90number?Maximum allowed P90 latency in ms
p99number?Maximum allowed P99 latency in ms
maxTurnnumber?Maximum allowed per-turn latency in ms
totalnumber?Maximum allowed total trajectory latency in ms
componentsComponentBudget?Per-component budget thresholds

LatencyResult

FieldTypeDescription
turnsTurnLatency[]Per-turn latency breakdown
totalLatencyMsnumberSum of all agent turn latencies
avgLatencyMsnumberMean latency across agent turns
p50Msnumber50th percentile
p90Msnumber90th percentile
p99Msnumber99th percentile
maxLatencyMsnumberMaximum single-turn latency
minLatencyMsnumberMinimum single-turn latency
turnCountnumberNumber of agent turns evaluated

LatencyViolation

FieldTypeDescription
typeViolationTypeCategory (p50_exceeded, p90_exceeded, p99_exceeded, max_turn_exceeded, total_exceeded, llm_call_exceeded, tool_invocation_exceeded, overhead_exceeded)
severitylow' | 'medium' | 'high' | 'criticalImpact level of the violation
descriptionstringHuman-readable violation description
actualnumberMeasured value in ms
thresholdnumberBudget threshold in ms
turnIdnumber?Affected turn (for max_turn violations)

ComponentBreakdown

FieldTypeDescription
avgLlmCallMsnumberAverage LLM call latency across turns
avgToolInvocationMsnumberAverage tool invocation latency across turns
avgOverheadMsnumberAverage system overhead across turns
totalLlmCallMsnumberSum of all LLM call latencies
totalToolInvocationMsnumberSum of all tool invocation latencies
totalOverheadMsnumberSum of all overhead latencies

Latency Presets

PresetP50P90P99Max TurnTrajectory Total
strict500ms1000ms2000ms3000ms15000ms
moderate1000ms2000ms5000ms8000ms30000ms
lenient2000ms4000ms10000ms15000ms60000ms

Advanced: Component-Level SLA Enforcement

Each preset also includes per-component budgets. Pass a custom LatencyBudget with a components field to enforce LLM call, tool invocation, and overhead thresholds independently:

typescript
import { enforceBudget } from '@reaatech/agent-eval-harness-latency';
 
const budget = createLatencyBudget('strict');
// budget.components = { llmCall: 400, toolInvocation: 100, overhead: 50 }
 
const result = monitorLatency(trajectory);
const enforcement = enforceBudget(result, budget);
 
for (const v of enforcement.violations) {
  console.log(`[${v.severity.toUpperCase()}] ${v.type}: ${v.description}`);
}
 
// Enforcement score: 1.0 = perfect, deducts 0.4 for critical, 0.25 for high, etc.
console.log(`Enforcement score: ${enforcement.score}`);

Advanced: Optimization Analysis

The optimizer identifies the most impactful bottlenecks and generates actionable, priority-ranked recommendations:

typescript
import { analyzeOptimization, LatencyTracker } from '@reaatech/agent-eval-harness-latency';
 
const optimization = analyzeOptimization(latencyResult, trajectory);
 
console.log(`Bottlenecks: ${optimization.bottlenecks.length}`);
for (const b of optimization.bottlenecks) {
  console.log(`  ${b.type}: severity=${b.severity.toFixed(2)}, ${b.description}`);
}
 
console.log(`Top recommendations:`);
for (const r of optimization.recommendations.slice(0, 3)) {
  console.log(`  [${r.priority}] ${r.description} (effort: ${r.effort}, est. gain: ${r.expectedImprovementMs}ms)`);
}
 
// Track latency across multiple evaluation runs
const tracker = new LatencyTracker();
tracker.record(result);
console.log(`Trend: ${tracker.getTrend().improving ? 'improving' : 'degrading'}`);
console.log(`Average score: ${tracker.getAverageScore()}`);

License

MIT