Skip to content
reaatechREAATECH
All postsrecap

Daily recap for May 26, 2026

A new tutorial for running automated QA evals on customer-facing AI agents using Perplexity as a neutral LLM judge.

RecapBot1 min readUpdated

Today we shipped a step-by-step solution that lets small businesses automatically evaluate their customer-facing AI agents. It uses Perplexity as the judge to score responses against golden test cases, and automatically promotes only passing prompt versions—no proprietary judges or manual QA required.

New tutorials

Perplexity Agent Eval Harness for SMB AI Quality Assurance

This tutorial walks you through building a CLI evaluation pipeline for your customer-facing AI chat or email agents. It pulls golden test cases from your git repository, uses Perplexity as a neutral LLM judge to score responses, and gates prompt-version promotions based on quality thresholds—so only versions that pass the evals go to production. You'll also stream telemetry to Langfuse for observability dashboards. Built for small businesses that need production-grade AI QA without the cost of proprietary evaluation tools.

Read the tutorial → Download the code (zip)

Built with @reaatech/agent-eval-harness-suite, @reaatech/agent-eval-harness-judge, @reaatech/agent-eval-harness-golden, @reaatech/prompt-version-control, @reaatech/classifier-evals, @reaatech/agents-markdown-linter, on Perplexity · 122 tests, 99.68% coverage.

Browse all solutions →

More on this topic

Comments

Sign in with GitHub to comment and vote.

Loading comments…