Skip to content
reaatechREAATECH

reaatech/classifier-evals

0Last commit: May 17, 2026GitHub →

These packages provide a comprehensive evaluation harness for testing and monitoring intent classification systems. They allow you to automate dataset validation, calculate classification metrics, run LLM-as-judge assessments, and enforce regression quality gates within CI/CD pipelines. The suite is built around a shared set of Zod schemas and TypeScript types, ensuring consistent data structures across the entire evaluation lifecycle from CLI execution to observability exports.

Packages

8 packages

@reaatech/classifier-evals

v0.1.0
Provides a shared library of TypeScript types, Zod schemas, and observability utilities for classification evaluation workflows. It includes pre-configured Pino logging, OpenTelemetry instrumentation, and PII redaction helpers to standardize data handling across the classifier-evals ecosystem.
status
published
published
7 days ago

@reaatech/classifier-evals-cli

v0.1.0
Provides a CLI for executing classifier evaluations, comparing model performance, enforcing regression gates, and running LLM-as-judge workflows. It outputs results in JSON, HTML, or JUnit formats and is designed for integration into CI pipelines.
status
published
published
7 days ago

@reaatech/classifier-evals-dataset

v0.1.0
Provides utilities for loading, validating, and partitioning classifier evaluation datasets from CSV, JSON, or JSONL files. It exports a set of functions for performing stratified splits, K-fold cross-validation, and label normalization on standardized dataset objects.
status
published
published
7 days ago

@reaatech/classifier-evals-exporters

v0.1.0
Exports classifier evaluation results into JSON, interactive HTML reports, or observability traces for Arize Phoenix and Langfuse. It provides a set of utility functions that transform `EvalRun` objects into these formats for reporting and analysis.
status
published
published
7 days ago

@reaatech/classifier-evals-gates

v0.1.0
Evaluates classification model performance against threshold, baseline, and distribution gates using a configurable engine. It provides a `GateEngine` instance that processes metrics and exports results into GitHub Actions annotations, JUnit XML, or PR comment markdown.
status
published
published
7 days ago

@reaatech/classifier-evals-judge

v0.1.0
Evaluates classification model outputs using LLM-as-a-judge with support for consensus voting, real-time cost tracking, and PII redaction. It provides a `createJudgeEngine` factory function that returns an engine instance for executing batch evaluations against OpenAI or Anthropic APIs.
status
published
published
7 days ago

@reaatech/classifier-evals-mcp-server

v0.1.0
Exposes classifier evaluation workflows—including running evaluations, checking regression gates, and performing LLM-as-judge comparisons—as a set of Model Context Protocol (MCP) tools. It provides a CLI executable and a `startMCPServer` function that runs over stdio, requiring the `@modelcontextprotocol/sdk` at runtime.
status
published
published
7 days ago

@reaatech/classifier-evals-metrics

v0.1.0
Calculates classification performance metrics, including confusion matrices, multi-class F1 scores, and statistical model comparisons. It provides a collection of utility functions that operate on arrays of classification result objects defined by the `@reaatech/classifier-evals` core package.
status
published
published
7 days ago

Comments

Sign in with GitHub to comment and vote.

Loading comments…