Skip to content
reaatechREAATECH

@reaatech/agent-eval-harness-tool-use

npm v0.1.0

Validates agent tool-use trajectories by checking schema compliance, argument accuracy, and result integration. It provides a set of utility functions to evaluate individual tool calls or full conversation turns against defined tool schemas.

@reaatech/agent-eval-harness-tool-use

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Tool-call validation and result verification for agent trajectories. Validates tool selection against schemas, checks argument compliance, detects hallucinated results, and verifies proper result integration into agent responses.

Installation

terminal
npm install @reaatech/agent-eval-harness-tool-use

Feature Overview

  • Tool selection validation — checks that the agent picked the right tool for the task
  • Schema compliance — validates tool arguments against JSON Schema or custom ToolSchema definitions
  • Result verification — detects hallucinated results that don’t match actual tool output
  • Integration checking — verifies tool results are properly used in agent responses
  • 13 issue types — structured categorization of tool-use problems from critical (missing tool name) to low (result unused)
  • Trajectory-wide summarization — aggregate result verification across all tool calls

Quick Start

typescript
import { validateToolCall, createToolSchema, verifyResult } from '@reaatech/agent-eval-harness-tool-use';
import type { ToolCall, Turn } from '@reaatech/agent-eval-harness-types';
 
const schema = createToolSchema('send_email', {
  properties: { to: { type: 'string', format: 'email' }, subject: { type: 'string' } },
  required: ['to']
});
 
const call: ToolCall = { name: 'send_email', arguments: { to: 'user@example.com', subject: 'Hi' }, result: { status: 'sent' } };
const turn: Turn = { turn_id: 2, role: 'agent', content: 'Email sent!', timestamp: '2026-04-15T00:00:00Z', tool_calls: [call] };
 
const validation = validateToolCall(call, schema);
console.log(`Valid: ${validation.valid}, Score: ${validation.score}`);
 
const verification = verifyResult(call, turn);
console.log(`Hallucinated: ${verification.hallucinated}, Integrated: ${verification.integrated}`);

API Reference

Validation Functions

ExportSignatureDescription
validateTrajectory(trajectory: Trajectory, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult[]Validates all tool calls across every agent turn in a trajectory. Returns one ValidationResult per agent turn with tool calls.
validateTurn(turn: Turn, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResultValidates all tool calls in a single turn. Handles missing_tool_name, unknown_tool, deprecated_tool, missing_arguments, missing_result, schema violations, and hallucination detection.
validateToolCall(toolCall: ToolCall, schema?: ToolSchema, options?: ValidateOptions) => ValidationResultValidates a single tool call against an optional schema. Convenience wrapper that creates a synthetic turn internally.

Schema Functions

ExportSignatureDescription
validateSchema(toolCall: ToolCall, schema: ToolSchema) => SchemaValidationResultDeep schema validation of tool arguments against a ToolSchema. Checks required fields, types, enums, formats (email, uri, date, date-time), and nested object/array properties.
createToolSchema(name: string, jsonSchema: Record<string, unknown>, description?: string) => ToolSchemaCreates a ToolSchema from a JSON Schema-like definition. Converts properties and required arrays into the internal ToolSchema parameter structure.

Result Verification Functions

ExportSignatureDescription
verifyResult(toolCall: ToolCall, turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResultVerifies a single tool call’s result against the agent’s response. Checks for hallucination, result integration, contradictions, and missing/empty/error results. Accepts optional full trajectory for cross-turn usage detection.
verifyTurnResults(turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult[]Runs verifyResult on every tool call in a turn. Returns an array of verification results.
summarizeResultVerification(trajectory: Trajectory, options?: VerifyOptions) => { totalTools, validResults, hallucinatedResults, integratedResults, averageScore, issues }Aggregates result verification across an entire trajectory. Returns counts for total tools, valid results, hallucinated results, integrated results, average score, and all issues.

Types

ToolSchema

typescript
interface ToolSchema {
  name: string;
  description?: string;
  parameters: {
    type: 'object';
    properties: Record<string, ParameterSchema>;
    required?: string[];
  };
  deprecated?: boolean;
  replacedBy?: string;
}
 
interface ParameterSchema {
  type: 'string' | 'number' | 'boolean' | 'object' | 'array';
  description?: string;
  enum?: unknown[];
  format?: string;
  items?: ParameterSchema;
  properties?: Record<string, ParameterSchema>;
}

ValidationResult

typescript
interface ValidationResult {
  valid: boolean;          // true if no critical issues
  issues: ToolUseIssue[];  // all detected issues
  suggestions: string[];   // remediation suggestions (e.g., deprecated tool replacement)
  score: number;           // 0.0–1.0 weighted by issue severity
}
 
interface ToolUseIssue {
  type: ToolUseIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

ValidateOptions

typescript
interface ValidateOptions {
  allowUnknownTools?: boolean;   // default: false — set true to skip unknown tool errors
  validateSchemas?: boolean;     // default: true — enable parameter-level schema checks
  checkResultUsage?: boolean;    // default: true — check for unused tool results
  detectHallucination?: boolean; // default: true — check for fabricated result usage
  strict?: boolean;              // default: false — when true, score drops to 0.0 if any high/critical issue
}

SchemaValidationResult

typescript
interface SchemaValidationResult {
  valid: boolean;
  issues: SchemaIssue[];
  score: number;
}
 
interface SchemaIssue {
  type: string;         // e.g., 'missing_arguments', 'type_error', 'invalid_format', 'required_field_missing'
  severity: 'low' | 'medium' | 'high' | 'critical';
  path: string;         // dot-notation path to the problematic parameter
  message: string;
  expected?: unknown;
  actual?: unknown;
}

ResultVerificationResult

typescript
interface ResultVerificationResult {
  valid: boolean;
  issues: ResultIssue[];
  score: number;
  hallucinated: boolean;  // true if hallucination score exceeds threshold
  integrated: boolean;    // true if result values appear in the agent response
}
 
interface ResultIssue {
  type: ResultIssueType;
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;
  turnId?: number;
  toolName?: string;
  details?: Record<string, unknown>;
}

VerifyOptions

typescript
interface VerifyOptions {
  checkUsage?: boolean;             // default: true — verify result usage in response
  detectHallucination?: boolean;    // default: true — detect fabricated result content
  checkContradictions?: boolean;    // default: true — catch result/response contradictions
  hallucinationThreshold?: number;  // default: 0.3 — score above this triggers hallucinated flag
}

Enums

ToolUseIssueType (13 values)

ValueSeverityDescription
missing_tool_namecriticalTool call has no name field
missing_argumentshighTool call has no arguments field
invalid_argumentsmediumArgument value not in allowed enum
tool_not_foundhighTool name not in provided schemas
tool_misusemediumTool used incorrectly for the context
missing_resultmediumTool was called but no result returned
result_unusedlowTool result fields not found in agent response
hallucinated_resulthighAgent response references data not in the actual tool result
schema_violationhighArguments fail schema-level validation
type_mismatchhighArgument type does not match schema (e.g., string for number)
missing_required_paramhighRequired parameter missing from arguments
unknown_toolhigh/mediumTool name not recognized; severity depends on strict mode
deprecated_toolmediumTool is marked as deprecated; suggestion includes replacement

ResultIssueType (8 values)

ValueSeverityDescription
missing_resultmediumTool call has no result object
empty_resultlowTool returned an empty result ({})
error_resulthighResult status is error
hallucinated_contenthighResponse contains fabricated data not in the result
unused_resultmediumResult values not referenced in agent response
contradicts_responsehighResult indicates success but response says failure (or vice versa)
incomplete_integrationmediumOnly partial result data used in response
malformed_resulthighResult structure is unexpected or invalid

License

MIT