Skip to content
reaatech

classifier-evals · packages

Every package shipped from reaatech/classifier-evals, published or pending.

8 packages

@reaatech/classifier-evals

v0.1.1
Canonical TypeScript types, Zod schemas, and shared utilities (structured logging, OpenTelemetry tracing/metrics, PII redaction, hashing) for the classifier-evals evaluation ecosystem. Exports 40+ Zod-validated types and schemas covering classification results, datasets, confusion matrices, metrics, and evaluation runs, plus a Pino-based logger and OpenTelemetry instrumentation.
status
published
published
12 days ago

@reaatech/classifier-evals-cli

v0.1.1
A CLI for running classifier evaluations, comparing models, checking regression gates, and exporting results, built on Commander.js and the `@reaatech/classifier-evals-*` ecosystem.
status
published
published
12 days ago

@reaatech/classifier-evals-dataset

v0.1.1
A dataset loading and validation utility for classifier evaluation, supporting CSV, JSON, and JSONL formats. Provides functions (`loadDataset`, `validateDataset`, `splitDataset`) for loading, schema validation, stratified train/test splitting, K-fold cross-validation, label normalization, alias resolution, and hierarchical label handling.
status
published
published
12 days ago

@reaatech/classifier-evals-exporters

v0.1.1
Export classifier evaluation results as JSON, HTML, Arize Phoenix traces, or Langfuse traces. Provides four functions (`exportToJson`, `exportToHtml`, `exportToPhoenix`, `exportToLangfuse`) that accept an `EvalRun` object and format-specific options.
status
published
published
12 days ago

@reaatech/classifier-evals-gates

v0.1.1
A gate evaluation engine that checks classifier metrics (accuracy, F1, precision, recall) against threshold, baseline-comparison, and distribution gates, returning pass/fail results and CI output formats (GitHub Actions annotations, JUnit XML, PR comment markdown). It provides a `createGateEngine()` function that returns an object with `evaluateGates()`, `formatForGitHubActions()`, and `formatAsJUnit()` methods, and pairs with `@reaatech/classifier-evals-metrics` for metric calculation.
status
published
published
12 days ago

@reaatech/classifier-evals-judge

v0.1.1
A function that creates an LLM-as-judge engine for evaluating classifier outputs, supporting Anthropic and OpenAI models with configurable consensus voting, real-time cost tracking, and built-in prompt templates for classification evaluation, ambiguity detection, and error categorization.
status
published
published
12 days ago

@reaatech/classifier-evals-mcp-server

v0.1.1
An MCP server that exposes five tools (`run_eval`, `check_gates`, `compare_models`, `llm_judge`, `generate_report`) for running classifier evaluation pipelines, checking regression gates, comparing models, and generating reports, communicating over stdio transport with any MCP-compatible client.
status
published
published
12 days ago

@reaatech/classifier-evals-metrics

v0.1.1
A function that computes confusion matrices, 14 classification metrics (accuracy, macro/micro/weighted precision/recall/F1, MCC, Cohen's Kappa), model comparison with McNemar's test and Cohen's d, and evaluation run construction from classification results.
status
published
published
12 days ago