@reaatech/pi-bench-leaderboard
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Leaderboard management and JSON file persistence for prompt-injection-bench. Ranks defenses by composite score, assigns S/A/B/C/D tiers, and supports pairwise comparison with storable entries.
Installation
terminal
npm install @reaatech/pi-bench-leaderboard
# or
pnpm add @reaatech/pi-bench-leaderboardFeature Overview
- Tiered ranking — S/A/B/C/D tiers based on composite weighted score
- Limit enforcement — Configurable max entries with automatic eviction of lowest scores
- JSON file persistence — Auto-creates
.prompt-injection-bench/leaderboard.jsonin working directory - Pairwise comparison — Compare any two entries in the leaderboard
- Duplicate handling — Same defense+version replaces previous entry
- Dual ESM/CJS output — works with
importandrequire
Quick Start
typescript
import { createLeaderboardManager } from "@reaatech/pi-bench-leaderboard";
import { type DefenseScore } from "@reaatech/pi-bench-core";
const manager = createLeaderboardManager({ maxEntries: 100 });
// Add a defense score to the leaderboard
manager.addEntry(rebuffScore);
// Get ranked entries (sorted by composite score, descending)
const rankings = manager.getRankings();
for (const entry of rankings) {
console.log(`${entry.tier} | ${entry.defense} v${entry.version}: ${entry.score.toFixed(3)}`);
}
// Compare two entries
const comparison = manager.compare("rebuff", "lakera");API Reference
LeaderboardManager
| Method | Description |
|---|---|
addEntry(score) | Add or update a defense’s score |
getRankings() | Return all entries sorted by composite score |
getEntry(defense, version?) | Get a specific entry |
compare(defenseA, defenseB) | Pairwise comparison of two defenses |
removeEntry(defense, version) | Remove an entry |
clear() | Clear all entries |
size | Current number of entries |
LeaderboardConfig
| Property | Type | Default | Description |
|---|---|---|---|
maxEntries | number | 100 | Maximum leaderboard entries |
compositeWeights | object | — | Weights for composite score: { detection: 0.5, fpr: 0.2, latency: 0.15, consistency: 0.15 } |
createLeaderboardManager(config?)
Factory function.
Storage
| Export | Description |
|---|---|
getDefaultLeaderboardPath() | Returns .prompt-injection-bench/leaderboard.json in CWD |
loadLeaderboardEntries(path?) | Load entries from disk |
saveLeaderboardEntries(entries, path?) | Persist entries to disk |
LeaderboardEntry
| Property | Type | Description |
|---|---|---|
defense | string | Defense name |
version | string | Defense version |
score | number | Composite weighted score |
tier | string | S, A, B, C, or D |
submittedAt | string | ISO timestamp |
Usage Patterns
Persist to Disk
typescript
import { createLeaderboardManager } from "@reaatech/pi-bench-leaderboard";
import {
saveLeaderboardEntries,
loadLeaderboardEntries,
} from "@reaatech/pi-bench-leaderboard";
const manager = createLeaderboardManager({ maxEntries: 50 });
// Load existing entries
const existing = loadLeaderboardEntries();
for (const entry of existing) manager.addEntry(entry.score);
// Run a new benchmark, add the result
const score = /* ... from benchmark run ... */;
manager.addEntry(score);
// Persist
saveLeaderboardEntries(manager.getRankings());Check Ranking
typescript
const rankings = manager.getRankings();
const myDefense = rankings.find((e) => e.defense === "my-defense");
if (myDefense) {
console.log(
`My defense is ranked #${rankings.indexOf(myDefense) + 1} ` +
`(Tier ${myDefense.tier}, Score: ${myDefense.score.toFixed(3)})`
);
}Comparison
typescript
const comparison = manager.compare("rebuff", "lakera");
if (comparison) {
console.log(`Winner: ${comparison.winner} by ${comparison.margin.toFixed(3)}`);
console.log(`Effect size: ${comparison.effectSize}`);
}Related Packages
@reaatech/pi-bench-core— Core types@reaatech/pi-bench-scoring— Scoring engineprompt-injection-bench— CLI and umbrella package
