@reaatech/pi-bench-mcp-server
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
MCP (Model Context Protocol) server for prompt-injection-bench, plus report data normalization and deterministic reproducibility tools. Exposes benchmark operations as MCP tools consumable by any MCP client.
Installation
npm install @reaatech/pi-bench-mcp-server
# or
pnpm add @reaatech/pi-bench-mcp-serverFeature Overview
- 4 MCP tools —
run_benchmark,compare_defenses,generate_report,submit_results - Stdio transport — Connect to any MCP-compatible client via standard I/O
- Report generation — HTML and Markdown reports with category breakdowns
- Path traversal protection — Validates all file paths in tool operations
- Deterministic seed manager — LCG-based PRNG with SHA-256 proof hashes for reproducible benchmarks
- Dual ESM/CJS output — works with
importandrequire
Quick Start
import { createMCPServer } from "@reaatech/pi-bench-mcp-server";
const server = createMCPServer();
await server.start();Or connect via MCP client configuration:
{
"mcpServers": {
"prompt-injection-bench": {
"command": "npx",
"args": ["@reaatech/pi-bench-mcp-server"]
}
}
}API Reference
BenchmarkMCPServer
| Method | Description |
|---|---|
start() | Start the MCP server on stdio transport |
stop() | Stop the server |
createMCPServer(config?)
Factory function.
MCP Tools
run_benchmark
Execute a full benchmark against a defense:
{
"name": "run_benchmark",
"arguments": {
"defense": "rebuff",
"corpus": "default",
"categories": ["direct-injection", "prompt-leaking", "role-playing"],
"parallel": 10,
"timeout_ms": 30000
}
}compare_defenses
Compare multiple defense results:
{
"name": "compare_defenses",
"arguments": {
"results": ["results/rebuff.json", "results/lakera.json"],
"significance_level": 0.05
}
}generate_report
Generate HTML/JSON/Markdown reports:
{
"name": "generate_report",
"arguments": {
"results": "results/latest.json",
"format": "html",
"include_categories": true,
"output": "reports/benchmark-report.html"
}
}submit_results
Submit results to the public leaderboard:
{
"name": "submit_results",
"arguments": {
"results": "results/latest.json",
"defense_name": "my-custom-defense",
"defense_version": "1.0.0",
"reproducibility_proof": {
"seed": "abc123",
"corpus_version": "2026.04",
"adapter_versions": { "rebuff": "1.2.0" }
}
}
}Report Data Normalization
import { normalizeReportData } from "@reaatech/pi-bench-mcp-server";
const data = normalizeReportData(rawResults);
// Returns normalized { defense, score, overallMetrics } regardless of input formatnormalizeReportData accepts multiple result formats:
- Full
BenchmarkResultobjects - Pre-computed
DefenseScoreobjects - JSON file paths (auto-reads and parses)
- Mixed arrays of any of the above
SeedManager
Deterministic PRNG for reproducible benchmarks:
import { createSeedManager } from "@reaatech/pi-bench-mcp-server";
const seed = createSeedManager({ seed: "my-benchmark-v1" });
const nextInt = seed.nextInt(1, 1000);
const shuffled = seed.shuffle(samples);
const proofHash = seed.generateProof(corpusVersion, adapterVersions);| Method | Description |
|---|---|
next() | Next float in [0, 1) |
nextInt(min, max) | Next integer in [min, max] |
shuffle(array) | Deterministic Fisher-Yates shuffle |
generateProof(corpusVersion, adapterVersions) | SHA-256 proof hash |
reset() | Reset to initial seed |
createSeedManager(config?)
Factory function. Accepts optional SeedConfig with seed (string) and algorithm (default: "lcg").
Usage Patterns
MCP Tool Invocation from an Agent
// From an MCP client (e.g., Claude Desktop, agent-mesh):
const result = await mcpClient.callTool("run_benchmark", {
defense: "mock",
corpus: "default",
categories: ["direct-injection", "role-playing"],
parallel: 10,
});
console.log(`Defense: ${result.defense}`);
console.log(`Detection rate: ${result.detectionRate}`);
console.log(`Score: ${result.overallScore}`);Generating Reports Programmatically
import { normalizeReportData } from "@reaatech/pi-bench-mcp-server";
const data = normalizeReportData("results/benchmark.json");
// Build a Markdown report
let md = `# Benchmark Report\n\n`;
md += `**Defense:** ${data.defense}\n`;
md += `**Overall Score:** ${data.score.overallScore.toFixed(3)}\n`;
md += `**Detection Rate:** ${(1 - data.score.attackSuccessRate) * 100}%\n`;
for (const [category, catData] of Object.entries(data.score.categoryScores)) {
md += `- **${category}:** ${(catData.detectionRate * 100).toFixed(1)}%\n`;
}Reproducible Runs
const seed = createSeedManager({ seed: "my-benchmark-v1" });
// Use seed for deterministic corpus generation and result ordering
const shuffled = seed.shuffle(corpus);
const proof = seed.generateProof("2026.04", { mock: "1.0.0" });
console.log(`Reproducibility proof: ${proof}`);Related Packages
@reaatech/pi-bench-runner— Benchmark execution engine@reaatech/pi-bench-leaderboard— Leaderboard managementprompt-injection-bench— CLI and umbrella package
