Files · Automated triage of customer feedback into product roadmap items

76 (1 binary, 622.2 kB total)attempt 1

README.md·3273 B·markdown

markdown

# Automated triage of customer feedback into product roadmap items
 
> Turn support tickets and NPS comments into prioritized product suggestions.
 
A tutorialized reference solution demonstrating how to wire 6 **REAA agent-eval-harness packages** into a **Hono API** stack, using customer feedback triage as the example evaluation domain.
 
## Architecture
 
This service provides an evaluation harness API that scores, costs, and gates AI agent trajectories. The evaluation pipeline has 8 stages orchestrated by `eval-orchestrator`:
 
1. **Observability init** — tracing + logging via `@reaatech/agent-eval-harness-observability`
2. **Judge creation** — provider-agnostic LLM-as-judge via `@reaatech/agent-eval-harness-judge`
3. **Batch evaluation** — `SuiteRunner` from `@reaatech/agent-eval-harness-suite` runs a YAML-configured evaluation suite
4. **Results aggregation** — `createResultsAggregator` computes metric breakdowns
5. **Cost tracking** — `calculateTrajectoryCost` + `enforceBudget` from `@reaatech/agent-eval-harness-cost`
6. **Gate evaluation** — `createGateEngine` from `@reaatech/agent-eval-harness-gate` checks quality/cost/latency thresholds
7. **Summarisation** — `generateText` from Vercel AI SDK produces human-readable summaries
8. **Dashboard recording** — in-memory dashboard with trend analysis
 
## REAA Packages
 
| Package | Role |
|---------|------|
| `@reaatech/agent-eval-harness-types` | Domain types (Trajectory, Turn, EvalResult) and Zod schemas |
| `@reaatech/agent-eval-harness-suite` | Batch evaluation runner with YAML config and results aggregation |
| `@reaatech/agent-eval-harness-judge` | LLM-as-judge with 4 judgment types and calibration |
| `@reaatech/agent-eval-harness-cost` | Per-task LLM cost calculation and budget enforcement |
| `@reaatech/agent-eval-harness-gate` | CI/CD regression gates with JUnit/GitHub output |
| `@reaatech/agent-eval-harness-observability` | OTel tracing, metrics, Pino logging, and in-memory dashboards |
 
## API Endpoints
 
All routes are mounted under `/api/eval/` via a Hono app wrapped in a Next.js catch-all route handler.
 
### POST /api/eval/run
Run a full evaluation suite on one or more trajectories.
 
**Request body:**
```json
{
  "trajectories": [{ "turns": [{ "turn_id": 1, "role": "user", "content": "...", "timestamp": "..." }] }],
  "judgeModel": "gpt-5.2",
  "provider": "openai"
}
```
 
**Response:** `EvalApiResponse` with overallScore, passRate, gateResult, costBreakdown, summary.
 
### POST /api/eval/judge
Judge a single trajectory.
 
### POST /api/eval/gates
Evaluate gates on aggregated results.
 
### GET /api/eval/report
Get the current dashboard summary.
 
## Example Trajectories
 
The repo includes 3 golden trajectories demonstrating customer feedback triage scenarios:
- **Happy path** — 8 support tickets clustered into 3 roadmap items
- **Error path** — malformed NPS data causing partial ingestion failure
- **Boundary** — single positive feedback item
 
## Quickstart
 
```bash
pnpm install
# Set required env vars (see .env.example)
pnpm dev              # start dev server
pnpm test             # run tests with coverage
pnpm typecheck        # TypeScript check
pnpm lint             # ESLint check
```
 
## License
 
MIT — see [LICENSE](./LICENSE).