Skip to content
reaatechREAATECH

Perplexity Agent Eval Harness for SMB AI Quality Assurance

Run continuous, automated evaluations of your customer‑facing AI agents using Perplexity as a neutral LLM judge, with version‑gated prompt promotions.

The problem

Small businesses deploying AI chat or email agents struggle to know when an update breaks quality—manual testing doesn't scale, and proprietary LLM judges are expensive to use at volume.

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

192 kB·122 tests·99.7% coverage·vitest passing

SHA-25694237642b7cb2b5b36a41d2a90e909b78adaead8b104d835a04834720caf4edf

Comments

Sign in with GitHub to comment and vote.

Loading comments…