Skip to content
reaatechREAATECH

@reaatech/llm-judge-bias

npm v0.1.0

Identifies systematic position, length, and style biases in LLM evaluations using detector classes that analyze judgment consistency across varied inputs. It provides a `ComprehensiveBiasDetector` to orchestrate these checks and return structured reports, requiring an LLM judge interface to perform the underlying scoring.

@reaatech/llm-judge-bias

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Bias detection suite for identifying systematic biases in LLM judgments. Detects position bias (order effects), length bias (verbosity preference), and style bias (formatting/register effects).

Installation

terminal
npm install @reaatech/llm-judge-bias
# or
pnpm add @reaatech/llm-judge-bias

Feature Overview

  • Position bias detection with original/swapped order comparison
  • Length bias detection via Pearson correlation between response length and score
  • Style bias detection comparing formal/casual/bullet-point transformations
  • Automatic debiasing by averaging both orders
  • ComprehensiveBiasDetector orchestrates all three detectors in one pass
  • Configurable thresholds for each bias dimension

Quick Start

typescript
import { PositionBiasDetector } from '@reaatech/llm-judge-bias';
 
const detector = new PositionBiasDetector(0.1);
 
const report = await detector.detect(engine, [
  { id: 'a', content: 'Response A...' },
  { id: 'b', content: 'Response B...' },
]);
 
console.log(report.hasBias, report.recommendation);
typescript
import { ComprehensiveBiasDetector } from '@reaatech/llm-judge-bias';
 
const detector = new ComprehensiveBiasDetector({
  positionThreshold: 0.1,
  lengthThreshold: 0.3,
  styleThreshold: 0.1,
});
 
const report = await detector.runAll(engine, {
  candidates: [{ id: 'a', content: '...' }, { id: 'b', content: '...' }],
  responses: [{ id: '1', content: '...' }],
  styleBaseResponse: 'Some response...',
  styleContext: { query: '...', response: '...', context: '...' },
});
 
console.log(report.hasBias, report.recommendation);

API Reference

PositionBiasDetector

ExportDescription
constructor(threshold)Create a detector with a sensitivity threshold (e.g. 0.1)
detect(judge, candidates, context?)Compare original vs. swapped order scores
debias(judge, candidates, context?)Return averaged judgment from both orders

LengthBiasDetector

ExportDescription
constructor(threshold)Create a detector with a sensitivity threshold (e.g. 0.3)
detect(judge, responses[])Measure Pearson correlation between response length and score

StyleBiasDetector

ExportDescription
constructor(threshold)Create a detector with a sensitivity threshold (e.g. 0.1)
detect(judge, baseResponse, context, styles?)Compare original vs. style-transformed scores

ComprehensiveBiasDetector

ExportDescription
constructor(options)Create with per-dimension thresholds
detectPosition()Run position bias detection
detectLength()Run length bias detection
detectStyle()Run style bias detection
runAll()Orchestrate all three detectors in one pass, return ComprehensiveBiasReport

Report Types

ExportDescription
PositionBiasReporthasBias, averageBias, biasByPosition, recommendation
LengthBiasReporthasBias, correlation, details[]
StyleBiasReporthasBias, details[]
ComprehensiveBiasReporthasBias, positionBias?, lengthBias?, styleBias?

Default Style Transforms

ExportDescription
formalRewrite response in formal/technical register
casualRewrite response in casual/conversational register
bullet-pointsRewrite response as a bullet-point list

License

MIT