Skip to content
reaatechREAATECH

@reaatech/llm-judge-consensus

npm v0.1.0

Aggregates multiple LLM evaluation scores into a single consensus result using strategies like majority voting, weighted voting, or cost-optimized tiebreaking. It provides a set of classes implementing a shared `execute` method that returns a normalized score and an agreement metric.

@reaatech/llm-judge-consensus

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Multi-judge consensus strategies for combining individual judgment scores into a final evaluation. Includes majority voting, weighted voting, and a cheap-first tiebreaker strategy to minimize API costs.

Installation

terminal
npm install @reaatech/llm-judge-consensus
# or
pnpm add @reaatech/llm-judge-consensus

Feature Overview

  • Three consensus strategies implementing shared interface
  • Confidence-weighted score aggregation
  • Automatic agreement score computation using variance-based formula
  • Cheap-first pattern uses N cheap model judgments with optional expensive tiebreakers
  • Zero external dependencies beyond types

Quick Start

typescript
import { MajorityVoting, CheapFirstTiebreaker } from '@reaatech/llm-judge-consensus';
 
const strategy = new MajorityVoting();
const result = strategy.execute([judgment1, judgment2, judgment3]);
 
console.log(result.finalScore, result.agreementScore);
// 0.82, 0.94
typescript
const cheapFirst = new CheapFirstTiebreaker(2);
 
const result = cheapFirst.execute([
  gpt4oMiniJudgment,
  gpt4oMiniJudgment2,
  gpt4oJudgment,    // tiebreaker — only used if cheap judges disagree
]);
 
console.log(result.tiebreakerUsed);
// false (or true if cheap judges disagreed)

API Reference

MajorityVoting

PropertyDescription
strategymajority-voting — confidence-weighted average with agreement computation
execute(judgments)Weight scores by confidence, return consensus

CheapFirstTiebreaker

PropertyDescription
strategycheap-first-tiebreaker
constructor(cheapCount)cheapCount fast/cheap judgments to compare first (default 2)
agreementThreshold0.8 — escalate to remaining judges if cheap pair agreement is below this
execute(judgments)Compare cheap pair; escalate to remaining judges if agreement < threshold

WeightedVoting

PropertyDescription
strategyweighted-voting
constructor(weights)User-defined weights array (must match judgments.length)
execute(judgments)Weight scores by provided weights, return consensus

ConsensusStrategy Interface

MemberTypeDescription
namestringStrategy identifier
execute(judgments)(judgments: Judgment[]) => ConsensusResultExecute consensus on input judgments

ConsensusResult

FieldTypeDescription
finalScorenumberConsensus score (0–1)
agreementScorenumberInter-judge agreement (0–1)
methodstringStrategy name used
individualJudgmentsJudgment[]Input judgments
tiebreakerUsedbooleanWhether escalation happened (CheapFirstTiebreaker)

License

MIT