AWS Bedrock RAG Eval Harness for SMB Customer Support Bots

Automatically score RAG answer quality, track evaluation costs, and block deployments when your AI support bot’s accuracy dips.

aws-bedrock rag eval-harness langfuse nextjs typescript ci-cd quality-gate customer-support

The problem

SMB support teams rely on RAG chatbots to handle customer questions, but hallucinations or irrelevant answers slip through unnoticed, damaging trust. They have no systematic way to continuously measure answer quality and catch regressions before customers do.

Built from

Intro

This tutorial walks you through building an automated RAG evaluation harness for a customer support chatbot. You’ll create a Next.js API that scores RAG answer quality on four metrics (faithfulness, relevance, context precision, context recall) using AWS Bedrock as a judge LLM, tracks evaluation spend with configurable budgets, and gates CI/CD deployments when quality dips below defined thresholds. Evaluation traces and scores are pushed to Langfuse for dashboarding and alerting.

This is for developers who run customer-facing RAG chatbots and need a systematic way to catch answer quality regressions before they reach users.

Prerequisites

Node.js >= 22 and pnpm 10 installed
An AWS account with Bedrock access (Claude Sonnet 4 or another compatible model enabled)
A Langfuse account (the free tier works) with a public and secret key
Basic familiarity with TypeScript, Next.js App Router, and AWS SDK
AWS credentials configured in your environment (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)

Step 1: Scaffold the Next.js project

Create the project directory and initialize a Next.js 16+ App Router project with the required dependencies:

terminal

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

176 kB·79 tests·100.0% coverage·vitest passing

SHA-256da11a746e915830e411cda33eb2699322fad8620b5eaaf49e926b52300171294

Book a conversation All solutions

Comments

Loading comments…

import { CostTracker, BudgetManager, CostReporter, type BudgetAlert } from "@reaatech/rag-eval-cost"; import { type CostBreakdown } from "@reaatech/rag-eval-core"; export class EvalCostManager { private readonly tracker: CostTracker; private readonly manager: BudgetManager; private readonly reporter: CostReporter; constructor(budgetLimit: number, hardLimit: boolean) { this.tracker = new CostTracker({ budgetLimit, hardLimit, alertThresholds: [0.5, 0.75, 0.9], }); this.manager = new BudgetManager({ budgetLimit, hardLimit, alertThresholds: [0.5, 0.75, 0.9], }); this.reporter = new CostReporter(); } recordCost( sampleId: string, inputTokens: number, outputTokens: number, metric: string, provider: string, ): { shouldStop: boolean } { const totalTokens = inputTokens + outputTokens; const costPerToken = 0.000003; const cost = totalTokens * costPerToken; const trackerResult = this.tracker.recordCost( sampleId, cost, { input: inputTokens, output: outputTokens, total: totalTokens }, metric, provider, ); this.manager.recordSpend(cost); return { shouldStop: trackerResult.shouldStop }; } getTotalCost(): number { return this.tracker.getTotalCost(); } getCostBreakdown(): CostBreakdown { return this.tracker.getBreakdown(); } getBudgetUsage(): number { return this.manager.getBudgetUsage(); } getActiveAlerts(): BudgetAlert[] { return this.manager.getActiveAlerts(); } generateReport(): { totalCost: number; costPerSample: number; trend: string } { return this.reporter.generateReport(this.tracker.getBreakdown()); } generateJUnitXml(): string { return this.reporter.generateJUnitXml(this.tracker.getBreakdown()); } } export function createEvalCostManager( budgetLimit: number, hardLimit?: boolean, ): EvalCostManager { return new EvalCostManager(budgetLimit, hardLimit ?? false); }

{"query":"What is the return policy?","context":["Our return policy allows returns within 30 days of purchase.","Items must be in original condition with receipt."],"ground_truth":"Returns are accepted within 30 days with original receipt and condition.","generated_answer":"You can return items within 30 days as long as you have the receipt."} {"query":"How do I track my order?","context":["Order tracking is available in your account dashboard under 'My Orders'.","You will receive a tracking number via email once your order ships."],"ground_truth":"Track your order through the account dashboard or via the tracking number emailed after shipment.","generated_answer":"You can track your order in your account dashboard."} {"query":"Do you offer international shipping?","context":["We ship to over 50 countries worldwide.","International shipping rates vary by destination and are calculated at checkout.","Delivery times range from 5-14 business days depending on the destination."],"ground_truth":"We ship to 50+ countries with rates calculated at checkout and delivery in 5-14 business days.","generated_answer":"International shipping is available and rates are shown at checkout."} {"query":"Can I change my subscription plan?","context":["You can upgrade or downgrade your subscription at any time from the Billing settings.","Changes take effect at the start of the next billing cycle.","Price differences are prorated for the remainder of the current cycle."],"ground_truth":"Subscription changes are made in Billing settings, take effect next billing cycle, and are prorated.","generated_answer":"Yes, you can change your plan anytime in Billing settings."} {"query":"What happens when my free trial ends?","context":["After your 14-day free trial, your account will be downgraded to a free tier with limited features.","You can upgrade to a paid plan at any time to regain full access.","Your data is retained for 30 days after the trial ends before being archived."],"ground_truth":"After the 14-day trial, accounts downgrade to a limited free tier; data is retained for 30 days before archiving.","generated_answer":"Your free trial ends after 14 days and you will need to upgrade to continue using all features."}

AWS Bedrock RAG Eval Harness for SMB Customer Support Bots

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Configure environment variables

Step 3: Define the types

Step 4: Create the configuration loader

Step 5: Build the dataset manager

Step 6: Create the AWS Bedrock judge adapter

Step 7: Build the cost tracker

Step 8: Create the quality gate

Step 9: Wire up observability with Langfuse

Step 10: Create the CLI runner

Step 11: Build the API route handlers

Step 12: Prepare the evaluation dataset

Step 13: Write and run the tests

Step 14: Start the dev server and trigger an evaluation

Next steps