@reaatech/agent-eval-harness-tool-use
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Tool-call validation and result verification for agent trajectories. Validates tool selection against schemas, checks argument compliance, detects hallucinated results, and verifies proper result integration into agent responses.
Installation
terminal
npm install @reaatech/agent-eval-harness-tool-useFeature Overview
- Tool selection validation — checks that the agent picked the right tool for the task
- Schema compliance — validates tool arguments against JSON Schema or custom ToolSchema definitions
- Result verification — detects hallucinated results that don’t match actual tool output
- Integration checking — verifies tool results are properly used in agent responses
- 13 issue types — structured categorization of tool-use problems from critical (missing tool name) to low (result unused)
- Trajectory-wide summarization — aggregate result verification across all tool calls
Quick Start
typescript
import { validateToolCall, createToolSchema, verifyResult } from '@reaatech/agent-eval-harness-tool-use';
import type { ToolCall, Turn } from '@reaatech/agent-eval-harness-types';
const schema = createToolSchema('send_email', {
properties: { to: { type: 'string', format: 'email' }, subject: { type: 'string' } },
required: ['to']
});
const call: ToolCall = { name: 'send_email', arguments: { to: 'user@example.com', subject: 'Hi' }, result: { status: 'sent' } };
const turn: Turn = { turn_id: 2, role: 'agent', content: 'Email sent!', timestamp: '2026-04-15T00:00:00Z', tool_calls: [call] };
const validation = validateToolCall(call, schema);
console.log(`Valid: ${validation.valid}, Score: ${validation.score}`);
const verification = verifyResult(call, turn);
console.log(`Hallucinated: ${verification.hallucinated}, Integrated: ${verification.integrated}`);API Reference
Validation Functions
| Export | Signature | Description |
|---|---|---|
validateTrajectory | (trajectory: Trajectory, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult[] | Validates all tool calls across every agent turn in a trajectory. Returns one ValidationResult per agent turn with tool calls. |
validateTurn | (turn: Turn, toolSchemas?: Record<string, ToolSchema>, options?: ValidateOptions) => ValidationResult | Validates all tool calls in a single turn. Handles missing_tool_name, unknown_tool, deprecated_tool, missing_arguments, missing_result, schema violations, and hallucination detection. |
validateToolCall | (toolCall: ToolCall, schema?: ToolSchema, options?: ValidateOptions) => ValidationResult | Validates a single tool call against an optional schema. Convenience wrapper that creates a synthetic turn internally. |
Schema Functions
| Export | Signature | Description |
|---|---|---|
validateSchema | (toolCall: ToolCall, schema: ToolSchema) => SchemaValidationResult | Deep schema validation of tool arguments against a ToolSchema. Checks required fields, types, enums, formats (email, uri, date, date-time), and nested object/array properties. |
createToolSchema | (name: string, jsonSchema: Record<string, unknown>, description?: string) => ToolSchema | Creates a ToolSchema from a JSON Schema-like definition. Converts properties and required arrays into the internal ToolSchema parameter structure. |
Result Verification Functions
| Export | Signature | Description |
|---|---|---|
verifyResult | (toolCall: ToolCall, turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult | Verifies a single tool call’s result against the agent’s response. Checks for hallucination, result integration, contradictions, and missing/empty/error results. Accepts optional full trajectory for cross-turn usage detection. |
verifyTurnResults | (turn: Turn, trajectory?: Trajectory, options?: VerifyOptions) => ResultVerificationResult[] | Runs verifyResult on every tool call in a turn. Returns an array of verification results. |
summarizeResultVerification | (trajectory: Trajectory, options?: VerifyOptions) => { totalTools, validResults, hallucinatedResults, integratedResults, averageScore, issues } | Aggregates result verification across an entire trajectory. Returns counts for total tools, valid results, hallucinated results, integrated results, average score, and all issues. |
Types
ToolSchema
typescript
interface ToolSchema {
name: string;
description?: string;
parameters: {
type: 'object';
properties: Record<string, ParameterSchema>;
required?: string[];
};
deprecated?: boolean;
replacedBy?: string;
}
interface ParameterSchema {
type: 'string' | 'number' | 'boolean' | 'object' | 'array';
description?: string;
enum?: unknown[];
format?: string;
items?: ParameterSchema;
properties?: Record<string, ParameterSchema>;
}ValidationResult
typescript
interface ValidationResult {
valid: boolean; // true if no critical issues
issues: ToolUseIssue[]; // all detected issues
suggestions: string[]; // remediation suggestions (e.g., deprecated tool replacement)
score: number; // 0.0–1.0 weighted by issue severity
}
interface ToolUseIssue {
type: ToolUseIssueType;
severity: 'low' | 'medium' | 'high' | 'critical';
description: string;
turnId?: number;
toolName?: string;
details?: Record<string, unknown>;
}ValidateOptions
typescript
interface ValidateOptions {
allowUnknownTools?: boolean; // default: false — set true to skip unknown tool errors
validateSchemas?: boolean; // default: true — enable parameter-level schema checks
checkResultUsage?: boolean; // default: true — check for unused tool results
detectHallucination?: boolean; // default: true — check for fabricated result usage
strict?: boolean; // default: false — when true, score drops to 0.0 if any high/critical issue
}SchemaValidationResult
typescript
interface SchemaValidationResult {
valid: boolean;
issues: SchemaIssue[];
score: number;
}
interface SchemaIssue {
type: string; // e.g., 'missing_arguments', 'type_error', 'invalid_format', 'required_field_missing'
severity: 'low' | 'medium' | 'high' | 'critical';
path: string; // dot-notation path to the problematic parameter
message: string;
expected?: unknown;
actual?: unknown;
}ResultVerificationResult
typescript
interface ResultVerificationResult {
valid: boolean;
issues: ResultIssue[];
score: number;
hallucinated: boolean; // true if hallucination score exceeds threshold
integrated: boolean; // true if result values appear in the agent response
}
interface ResultIssue {
type: ResultIssueType;
severity: 'low' | 'medium' | 'high' | 'critical';
description: string;
turnId?: number;
toolName?: string;
details?: Record<string, unknown>;
}VerifyOptions
typescript
interface VerifyOptions {
checkUsage?: boolean; // default: true — verify result usage in response
detectHallucination?: boolean; // default: true — detect fabricated result content
checkContradictions?: boolean; // default: true — catch result/response contradictions
hallucinationThreshold?: number; // default: 0.3 — score above this triggers hallucinated flag
}Enums
ToolUseIssueType (13 values)
| Value | Severity | Description |
|---|---|---|
missing_tool_name | critical | Tool call has no name field |
missing_arguments | high | Tool call has no arguments field |
invalid_arguments | medium | Argument value not in allowed enum |
tool_not_found | high | Tool name not in provided schemas |
tool_misuse | medium | Tool used incorrectly for the context |
missing_result | medium | Tool was called but no result returned |
result_unused | low | Tool result fields not found in agent response |
hallucinated_result | high | Agent response references data not in the actual tool result |
schema_violation | high | Arguments fail schema-level validation |
type_mismatch | high | Argument type does not match schema (e.g., string for number) |
missing_required_param | high | Required parameter missing from arguments |
unknown_tool | high/medium | Tool name not recognized; severity depends on strict mode |
deprecated_tool | medium | Tool is marked as deprecated; suggestion includes replacement |
ResultIssueType (8 values)
| Value | Severity | Description |
|---|---|---|
missing_result | medium | Tool call has no result object |
empty_result | low | Tool returned an empty result ({}) |
error_result | high | Result status is error |
hallucinated_content | high | Response contains fabricated data not in the result |
unused_result | medium | Result values not referenced in agent response |
contradicts_response | high | Result indicates success but response says failure (or vice versa) |
incomplete_integration | medium | Only partial result data used in response |
malformed_result | high | Result structure is unexpected or invalid |
Related Packages
| Package | Description |
|---|---|
| @reaatech/agent-eval-harness-types | Shared domain types and schemas |
| @reaatech/agent-eval-harness-trajectory | Trajectory evaluation |
| @reaatech/agent-eval-harness-tool-use | Tool-use validation |
| @reaatech/agent-eval-harness-cost | Cost tracking |
| @reaatech/agent-eval-harness-latency | Latency monitoring |
| @reaatech/agent-eval-harness-judge | LLM-as-judge |
| @reaatech/agent-eval-harness-golden | Golden trajectories |
| @reaatech/agent-eval-harness-suite | Suite runner |
| @reaatech/agent-eval-harness-gate | CI gates |
| @reaatech/agent-eval-harness-mcp-server | MCP server |
| @reaatech/agent-eval-harness-cli | CLI |
| @reaatech/agent-eval-harness-observability | Observability |
License
MIT
