Today we shipped a step-by-step solution that lets small businesses automatically evaluate their customer-facing AI agents. It uses Perplexity as the judge to score responses against golden test cases, and automatically promotes only passing prompt versions—no proprietary judges or manual QA required.
New tutorials
Perplexity Agent Eval Harness for SMB AI Quality Assurance
This tutorial walks you through building a CLI evaluation pipeline for your customer-facing AI chat or email agents. It pulls golden test cases from your git repository, uses Perplexity as a neutral LLM judge to score responses, and gates prompt-version promotions based on quality thresholds—so only versions that pass the evals go to production. You'll also stream telemetry to Langfuse for observability dashboards. Built for small businesses that need production-grade AI QA without the cost of proprietary evaluation tools.
Read the tutorial → Download the code (zip)
Built with @reaatech/agent-eval-harness-suite, @reaatech/agent-eval-harness-judge, @reaatech/agent-eval-harness-golden, @reaatech/prompt-version-control, @reaatech/classifier-evals, @reaatech/agents-markdown-linter, on Perplexity · 122 tests, 99.68% coverage.
Browse all solutions →
- recap
Daily recap for June 7, 2026
Five new step-by-step tutorials landed today, covering HR compliance, insurance quote comparison, GitHub DevOps, financial Q&A, and secure Databricks analysis — plus 24 package updates across five toolkits.
- recap
Daily recap for June 6, 2026
Today we shipped six new tutorials, headlined by a LangChain reliability suite that keeps SMB support agents online 24/7 with circuit breakers and automatic runbooks.
- recap
Daily recap for June 5, 2026
Today we shipped 7 new tutorials, including a Return Reason Agent for Shopify that automates refund/RMA decisions from return reasons.
Comments
Sign in with GitHub to comment and vote.
