@reaatech/pi-bench-mcp-server

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

MCP (Model Context Protocol) server for prompt-injection-bench, plus report data normalization and deterministic reproducibility tools. Exposes benchmark operations as MCP tools consumable by any MCP client.

Installation

terminal

npm install @reaatech/pi-bench-mcp-server
# or
pnpm add @reaatech/pi-bench-mcp-server

Feature Overview

4 MCP tools — run_benchmark, compare_defenses, generate_report, submit_results
Stdio transport — Connect to any MCP-compatible client via standard I/O
Report generation — HTML and Markdown reports with category breakdowns
Path traversal protection — Validates all file paths in tool operations
Deterministic seed manager — LCG-based PRNG with SHA-256 proof hashes for reproducible benchmarks
Dual ESM/CJS output — works with import and require

Quick Start

typescript

import { createMCPServer } from "@reaatech/pi-bench-mcp-server";
 
const server = createMCPServer();
await server.start();

Or connect via MCP client configuration:

json

{
  "mcpServers": {
    "prompt-injection-bench": {
      "command": "npx",
      "args": ["@reaatech/pi-bench-mcp-server"]
    }
  }
}

API Reference

BenchmarkMCPServer

Method	Description
`start()`	Start the MCP server on stdio transport
`stop()`	Stop the server

`createMCPServer(config?)`

Factory function.

MCP Tools

`run_benchmark`

Execute a full benchmark against a defense:

json

{
  "name": "run_benchmark",
  "arguments": {
    "defense": "rebuff",
    "corpus": "default",
    "categories": ["direct-injection", "prompt-leaking", "role-playing"],
    "parallel": 10,
    "timeout_ms": 30000
  }
}

`compare_defenses`

Compare multiple defense results:

json

{
  "name": "compare_defenses",
  "arguments": {
    "results": ["results/rebuff.json", "results/lakera.json"],
    "significance_level": 0.05
  }
}

`generate_report`

Generate HTML/JSON/Markdown reports:

json

{
  "name": "generate_report",
  "arguments": {
    "results": "results/latest.json",
    "format": "html",
    "include_categories": true,
    "output": "reports/benchmark-report.html"
  }
}

`submit_results`

Submit results to the public leaderboard:

json

{
  "name": "submit_results",
  "arguments": {
    "results": "results/latest.json",
    "defense_name": "my-custom-defense",
    "defense_version": "1.0.0",
    "reproducibility_proof": {
      "seed": "abc123",
      "corpus_version": "2026.04",
      "adapter_versions": { "rebuff": "1.2.0" }
    }
  }
}

Report Data Normalization

typescript

import { normalizeReportData } from "@reaatech/pi-bench-mcp-server";
 
const data = normalizeReportData(rawResults);
// Returns normalized { defense, score, overallMetrics } regardless of input format

normalizeReportData accepts multiple result formats:

Full BenchmarkResult objects
Pre-computed DefenseScore objects
JSON file paths (auto-reads and parses)
Mixed arrays of any of the above

SeedManager

Deterministic PRNG for reproducible benchmarks:

typescript

import { createSeedManager } from "@reaatech/pi-bench-mcp-server";
 
const seed = createSeedManager({ seed: "my-benchmark-v1" });
 
const nextInt = seed.nextInt(1, 1000);
const shuffled = seed.shuffle(samples);
const proofHash = seed.generateProof(corpusVersion, adapterVersions);

Method	Description
`next()`	Next float in [0, 1)
`nextInt(min, max)`	Next integer in [min, max]
`shuffle(array)`	Deterministic Fisher-Yates shuffle
`generateProof(corpusVersion, adapterVersions)`	SHA-256 proof hash
`reset()`	Reset to initial seed

`createSeedManager(config?)`

Factory function. Accepts optional SeedConfig with seed (string) and algorithm (default: "lcg").

Usage Patterns

MCP Tool Invocation from an Agent

typescript

// From an MCP client (e.g., Claude Desktop, agent-mesh):
const result = await mcpClient.callTool("run_benchmark", {
  defense: "mock",
  corpus: "default",
  categories: ["direct-injection", "role-playing"],
  parallel: 10,
});
 
console.log(`Defense: ${result.defense}`);
console.log(`Detection rate: ${result.detectionRate}`);
console.log(`Score: ${result.overallScore}`);

Generating Reports Programmatically

typescript

import { normalizeReportData } from "@reaatech/pi-bench-mcp-server";
 
const data = normalizeReportData("results/benchmark.json");
 
// Build a Markdown report
let md = `# Benchmark Report\n\n`;
md += `**Defense:** ${data.defense}\n`;
md += `**Overall Score:** ${data.score.overallScore.toFixed(3)}\n`;
md += `**Detection Rate:** ${(1 - data.score.attackSuccessRate) * 100}%\n`;
 
for (const [category, catData] of Object.entries(data.score.categoryScores)) {
  md += `- **${category}:** ${(catData.detectionRate * 100).toFixed(1)}%\n`;
}

Reproducible Runs

typescript

const seed = createSeedManager({ seed: "my-benchmark-v1" });
 
// Use seed for deterministic corpus generation and result ordering
const shuffled = seed.shuffle(corpus);
const proof = seed.generateProof("2026.04", { mock: "1.0.0" });
console.log(`Reproducibility proof: ${proof}`);

@reaatech/pi-bench-runner — Benchmark execution engine
@reaatech/pi-bench-leaderboard — Leaderboard management
prompt-injection-bench — CLI and umbrella package

License

MIT

@reaatech/pi-bench-mcp-server

@reaatech/pi-bench-mcp-server

Installation

Feature Overview

Quick Start

API Reference

BenchmarkMCPServer

createMCPServer(config?)

MCP Tools

run_benchmark

compare_defenses

generate_report

submit_results

Report Data Normalization

SeedManager

createSeedManager(config?)

Usage Patterns

MCP Tool Invocation from an Agent

Generating Reports Programmatically

Reproducible Runs

Related Packages

License

`createMCPServer(config?)`

`run_benchmark`

`compare_defenses`

`generate_report`

`submit_results`

`createSeedManager(config?)`