Skip to content
reaatechREAATECH

@reaatech/pi-bench-leaderboard

pending npm

Manages and persists ranked leaderboard data for prompt injection defenses using a factory-provided manager object. It calculates composite scores and assigns performance tiers, with built-in support for JSON file I/O and pairwise entry comparisons.

@reaatech/pi-bench-leaderboard

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Leaderboard management and JSON file persistence for prompt-injection-bench. Ranks defenses by composite score, assigns S/A/B/C/D tiers, and supports pairwise comparison with storable entries.

Installation

terminal
npm install @reaatech/pi-bench-leaderboard
# or
pnpm add @reaatech/pi-bench-leaderboard

Feature Overview

  • Tiered ranking — S/A/B/C/D tiers based on composite weighted score
  • Limit enforcement — Configurable max entries with automatic eviction of lowest scores
  • JSON file persistence — Auto-creates .prompt-injection-bench/leaderboard.json in working directory
  • Pairwise comparison — Compare any two entries in the leaderboard
  • Duplicate handling — Same defense+version replaces previous entry
  • Dual ESM/CJS output — works with import and require

Quick Start

typescript
import { createLeaderboardManager } from "@reaatech/pi-bench-leaderboard";
import { type DefenseScore } from "@reaatech/pi-bench-core";
 
const manager = createLeaderboardManager({ maxEntries: 100 });
 
// Add a defense score to the leaderboard
manager.addEntry(rebuffScore);
 
// Get ranked entries (sorted by composite score, descending)
const rankings = manager.getRankings();
for (const entry of rankings) {
  console.log(`${entry.tier} | ${entry.defense} v${entry.version}: ${entry.score.toFixed(3)}`);
}
 
// Compare two entries
const comparison = manager.compare("rebuff", "lakera");

API Reference

LeaderboardManager

MethodDescription
addEntry(score)Add or update a defense’s score
getRankings()Return all entries sorted by composite score
getEntry(defense, version?)Get a specific entry
compare(defenseA, defenseB)Pairwise comparison of two defenses
removeEntry(defense, version)Remove an entry
clear()Clear all entries
sizeCurrent number of entries

LeaderboardConfig

PropertyTypeDefaultDescription
maxEntriesnumber100Maximum leaderboard entries
compositeWeightsobjectWeights for composite score: { detection: 0.5, fpr: 0.2, latency: 0.15, consistency: 0.15 }

createLeaderboardManager(config?)

Factory function.

Storage

ExportDescription
getDefaultLeaderboardPath()Returns .prompt-injection-bench/leaderboard.json in CWD
loadLeaderboardEntries(path?)Load entries from disk
saveLeaderboardEntries(entries, path?)Persist entries to disk

LeaderboardEntry

PropertyTypeDescription
defensestringDefense name
versionstringDefense version
scorenumberComposite weighted score
tierstringS, A, B, C, or D
submittedAtstringISO timestamp

Usage Patterns

Persist to Disk

typescript
import { createLeaderboardManager } from "@reaatech/pi-bench-leaderboard";
import {
  saveLeaderboardEntries,
  loadLeaderboardEntries,
} from "@reaatech/pi-bench-leaderboard";
 
const manager = createLeaderboardManager({ maxEntries: 50 });
 
// Load existing entries
const existing = loadLeaderboardEntries();
for (const entry of existing) manager.addEntry(entry.score);
 
// Run a new benchmark, add the result
const score = /* ... from benchmark run ... */;
manager.addEntry(score);
 
// Persist
saveLeaderboardEntries(manager.getRankings());

Check Ranking

typescript
const rankings = manager.getRankings();
const myDefense = rankings.find((e) => e.defense === "my-defense");
 
if (myDefense) {
  console.log(
    `My defense is ranked #${rankings.indexOf(myDefense) + 1} ` +
    `(Tier ${myDefense.tier}, Score: ${myDefense.score.toFixed(3)})`
  );
}

Comparison

typescript
const comparison = manager.compare("rebuff", "lakera");
if (comparison) {
  console.log(`Winner: ${comparison.winner} by ${comparison.margin.toFixed(3)}`);
  console.log(`Effect size: ${comparison.effectSize}`);
}

License

MIT