reaatech/llm-judge-toolkit
These packages provide a modular framework for automating LLM-based evaluation, including prompt templates, consensus strategies, and bias detection. You would adopt them to standardize how you score model outputs while managing costs, caching, and calibration against human-labeled datasets. The system is built on a decoupled architecture where a central `JudgmentEngine` orchestrates pluggable providers, cache backends, and statistical analysis tools through a shared set of TypeScript interfaces.
Packages
10 packages
@reaatech/llm-judge-bias
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-cache
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-calibration
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-cli
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-consensus
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-engine
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-infra
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-providers
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-templates
- status
- published
- published
- 1 day ago
@reaatech/llm-judge-types
- status
- published
- published
- 1 day ago
Comments
Sign in with GitHub to comment and vote.
