A boutique marketing agency uses an AI agent to generate ad copy. The agent often misses the brand voice, requiring manual edits. The agency wants to capture these corrections and use them to fine-tune a smaller, cheaper model that better matches their style. They need a system that logs agent outputs, captures user feedback (accept/reject/edit), and periodically exports a clean dataset for fine-tuning. This reduces reliance on expensive API calls and improves quality over time.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial you’ll build an agent feedback loop that logs AI-generated ad copy, captures user corrections (accept, reject, or edit), and exports a clean fine-tuning dataset. You’ll wire up six REAA (Recording, Evaluation, and Analysis for Agents) packages — trace recording, SDK interception, evaluation harnesses, and a replay CLI — into a Next.js 16 App Router project with a PostgreSQL database via Drizzle ORM. By the end, you’ll have a running API that lets a marketing team improve their AI agent over time by collecting real human feedback.
PostgreSQL running locally or a remote connection string
An OpenAI API key (the provider-agnostic wrapper uses @ai-sdk/openai by default)
A Langfuse account (free tier at langfuse.com) for LLM observability — you’ll need public key, secret key, and base URL
Familiarity with TypeScript, Next.js App Router route handlers, and basic SQL concepts
Step 1: Scaffold the project
Create a new Next.js 16 App Router project. The create-next-app CLI generates the shell — package.json, tsconfig.json, config files, and the app/ directory with a default layout and page.
Expected output: a new directory with your project name, pnpm install runs automatically, and you land inside the project folder. You’ll see files like package.json, next.config.ts, tsconfig.json, app/layout.tsx, and app/page.tsx.
Step 2: Install all dependencies
Open package.json and add the following exact-pinned dependencies and devDependencies. The project uses a provider-agnostic LLM interface (ai + @ai-sdk/openai), six REAA packages for trace recording and evaluation, Drizzle ORM for the database, Langfuse for observability, and Zod for schema validation.
Replace each <...> placeholder with your real credentials. Never commit .env.local — it’s already in .gitignore from the scaffold.
Step 4: Create the database schema and connection
You need four PostgreSQL tables: agent_runs (stores each LLM call), feedback (stores user accept/reject/edit decisions), fine_tune_datasets (groups of examples), and dataset_examples (individual training pairs). Drizzle ORM maps these to type-safe query builders.
Create src/services/db.ts to initialize the Postgres connection:
ts
import postgres from "postgres";const sql = postgres(process.env.DATABASE_URL as string);export default sql;
Create src/services/schema.ts with all four table definitions:
Create src/services/drizzle.ts to wire Drizzle to the Postgres client:
ts
import { drizzle } from "drizzle-orm/postgres-js";import sql from "./db";import * as schema from "./schema";export const db = drizzle(sql, { schema });
Create src/services/errors.ts with custom error classes:
Expected output: four new files in src/services/ — db.ts, schema.ts, drizzle.ts, and errors.ts. Run pnpm typecheck to confirm the types compile.
Step 5: Define shared types
The recipe uses a set of TypeScript interfaces that flow through every service. These give you end-to-end type safety when passing data between the LLM provider, the recording service, the feedback manager, and the dataset exporter.
Expected output: src/services/types.ts with all interfaces. Run pnpm typecheck to confirm there are no errors.
Step 6: Create the provider-agnostic LLM module
The LLM provider layer wraps the Vercel AI SDK generateText function. By importing openai from @ai-sdk/openai, the recipe works with any OpenAI-compatible API. Swap the model string in AGENT_MODEL to point at a different provider without changing any call site.
Expected output: src/lib/provider.ts exports generate and ModelConfig. Run pnpm typecheck to verify the import of generateText from ai resolves.
Step 7: Wire up Langfuse observability
Langfuse records every LLM call as a trace with a generation span. This lets you inspect latency, token usage, and model versions on the Langfuse dashboard. The traceAgentCall function is called after each ad copy generation.
Expected output: src/lib/observability.ts connects to Langfuse at module load time. The flushObservability call ensures traces are sent before the response is returned.
Step 8: Build the recording service
The recording service wraps three REAA packages — @reaatech/agent-replay-core provides the RecordingEngine and LocalFileStorage for span-structured traces, @reaatech/agent-replay-interceptors provides OpenAIInterceptor to transparently capture SDK calls, and @reaatech/agent-replay-integrations provides state adapters for LangChain/LangGraph framework capture.
Create src/services/interceptor-manager.ts for transparent SDK recording:
ts
import { OpenAIInterceptor, InterceptorRegistry,} from "@reaatech/agent-replay-interceptors";import { RecordingEngine } from "@reaatech/agent-replay-core";export class InterceptorManager { private registry: InterceptorRegistry; private interceptor: OpenAIInterceptor; private engine: RecordingEngine; constructor() { this.engine = new RecordingEngine(); this.registry = new InterceptorRegistry(); this.interceptor = new OpenAIInterceptor(this.engine); } async enable(): Promise<void> { this.registry.register("openai", this.interceptor); await this.registry.enable(["openai"]); } async disable(): Promise<void> { await this.registry.disable(); }}
Expected output: two files under src/services/. RecordingSessionManager wraps the full lifecycle of a trace — start, stop, serialize to .artrace.json, and save via LocalFileStorage. InterceptorManager lets you transparently monkey-patch the OpenAI SDK client.
Step 9: Build the agent service
The AdCopyAgent class is the core of the recipe. It constructs a system prompt from brand voice guidance, opens a recording session, calls the LLM via the provider-agnostic generate function, measures duration, captures request/response events as trace spans, persists the trace, records observability via Langfuse, and inserts a row into the agent_runs table.
Create src/services/agent-service.ts:
ts
import { type ModelConfig, generate } from "../lib/provider";import { type RecordingSessionManager, type TraceRecorder } from "./recording-service";import { type db as DrizzleDb } from "./drizzle";import { agentRuns } from "./schema";import { type AdCopyRequest, type AdCopyResult } from "./types";import { AgentServiceError } from "./errors";interface ChatMessage { role: "user" | "assistant" | "system" | "tool"; content
Expected output: src/services/agent-service.ts depends on the provider, recording service, drizzle, schema, types, and errors modules. Run pnpm typecheck to confirm.
Step 10: Build the feedback service
The FeedbackManager records user decisions (accept, reject, or edit) on agent runs. It validates that the referenced agent run exists, that the decision is one of the three allowed values, and that edits include a corrected output.
Create src/services/feedback-service.ts:
ts
import { eq, desc, inArray } from "drizzle-orm";import { type db as DrizzleDb } from "./drizzle";import { feedback as feedbackTable, agentRuns } from "./schema";import { type FeedbackInput, type FeedbackRecord } from "./types";import { ValidationError, NotFoundError } from "./errors";export class FeedbackManager { constructor(private db: typeof DrizzleDb) {} async recordFeedback(input: FeedbackInput): Promise<FeedbackRecord> { const run = await this.db .select() .from(agentRuns) .where(eq(agentRuns.id, input.agentRunId)) .limit(1); if (run.length === 0) { throw new NotFoundError(`Agent run ${input.agentRunId} not found`); } if (!["accept", "reject", "edit"].includes(input.decision)) { throw new ValidationError( `Invalid decision: ${input.decision}. Must be one of: accept, reject, edit` ); } if (input.decision === "edit" && !input.correctedOutput) { throw new ValidationError("correctedOutput is required for edit decisions"); } const createdId = crypto.randomUUID(); const now = new Date(); await this.db.insert(feedbackTable).values({ id: createdId, agentRunId: input.agentRunId, decision: input.decision, correctedOutput: input.correctedOutput, userNotes: input.userNotes, }); return { id: createdId, agentRunId: input.agentRunId, decision: input.decision, correctedOutput: input.correctedOutput, userNotes: input.userNotes, createdAt: now, }; } async getFeedbackForRun(agentRunId: string): Promise<FeedbackRecord[]> { const rows = await this.db .select() .from(feedbackTable) .where(eq(feedbackTable.agentRunId, agentRunId)) .orderBy(desc(feedbackTable.createdAt)); return rows.map((row) => ({ id: row.id, agentRunId: row.agentRunId, decision: row.decision as "accept" | "reject" | "edit", correctedOutput: row.correctedOutput ?? undefined, userNotes: row.userNotes ?? undefined, createdAt: row.createdAt ?? new Date(), })); } async getAcceptedExamples(options?: { limit?: number }): Promise< Array<{ prompt: string; output: string; correctedOutput?: string; decision: string; }> > { const rows = await this.db .select({ prompt: agentRuns.prompt, output: agentRuns.output, correctedOutput: feedbackTable.correctedOutput, decision: feedbackTable.decision, }) .from(feedbackTable) .innerJoin(agentRuns, eq(feedbackTable.agentRunId, agentRuns.id)) .where(inArray(feedbackTable.decision, ["accept", "edit"])) .orderBy(desc(feedbackTable.createdAt)) .limit(options?.limit ?? 100); return rows.map((row) => ({ prompt: row.prompt, output: row.output ?? "", correctedOutput: row.correctedOutput ?? undefined, decision: row.decision, })); }}
Expected output: src/services/feedback-service.ts with three methods. getAcceptedExamples is what the dataset exporter calls — it only pulls records where the user accepted the output or provided an edit.
Step 11: Build the dataset export service
The FineTuneDatasetService creates named datasets, populates them from accepted/corrected feedback records, and exports them as fine-tuning examples in the OpenAI chat format ({ "messages": [{ "role": "user", ... }, { "role": "assistant", ... }] }).
Create src/services/dataset-export-service.ts:
ts
import { eq, desc, inArray } from "drizzle-orm";import { type db as DrizzleDb } from "./drizzle";import { feedback as feedbackTable, agentRuns, fineTuneDatasets, datasetExamples,} from "./schema";import { type DatasetConfig, type DatasetRecord, type FineTuneExample } from "./types";import { NotFoundError } from "./errors";export class FineTuneDatasetService { constructor(private db: typeof DrizzleDb) {} async
Expected output: src/services/dataset-export-service.ts. The generateDataset method uses correctedOutput when the user edited the output, and falls back to the original output for accepted runs. Rejected runs are excluded.
Step 12: Build the evaluation service
The evaluation service wraps the remaining three REAA packages — @reaatech/agent-eval-harness-golden for golden trajectory comparison, @reaatech/agent-eval-harness-suite for batch evaluation suites, and @reaatech/agent-replay-cli for programmatic trace replay.
Create src/services/evaluation-service.ts:
ts
import { compareAgainstGolden, quickCreateGolden,} from "@reaatech/agent-eval-harness-golden";import { SuiteRunner, parseConfig, createResultsAggregator, RunComparator,} from "@reaatech/agent-eval-harness-suite";import { replay } from "@reaatech/agent-replay-cli";import { type EvaluateOptions, type EvaluationResult } from "./types";export class AgentEvaluationService { createGoldenFromTrace( tracePath: string, description: string, tags: string[] ): Promise<unknown> { const trajectory = { id: tracePath, steps: [] }; return Promise.resolve(quickCreateGolden(trajectory as never, description, tags)); } evaluateAgainstGolden( golden: unknown, candidate: unknown, options?: EvaluateOptions ): EvaluationResult { const result = compareAgainstGolden(golden as never, candidate as never, { similarityThreshold: options?.similarityThreshold, }); return { similarity: result.similarity, regressions: result.regressions.length, passes: result.passesThreshold, details: result.diffSummary, }; } replayTrace( tracePath: string, mode: "stubbed" | "live" | "partial" | "diff" ): Promise<void> { return Promise.resolve(replay({ tracePath, mode } as never)); } runSuite(configYaml: string): Promise<unknown> { const config = parseConfig(configYaml); const runner = new SuiteRunner(config as never); return Promise.resolve(runner.run([] as never, {} as never)); } exportReport( runResult: unknown, format: "json" | "markdown" ): Promise<string> { const config = parseConfig("metrics: []\njudge_model: default\n"); const aggregator = createResultsAggregator(config); return Promise.resolve(aggregator.export(runResult as never, format)); } compareRuns( baseline: unknown, candidate: unknown ): Promise<unknown> { const comparator = new RunComparator(); return Promise.resolve(comparator.compare(baseline as never, candidate as never)); }}
Expected output: src/services/evaluation-service.ts. The evaluateAgainstGolden method compares a candidate trace against a golden reference and returns a similarity score, regression count, pass/fail status, and a diff summary.
Step 13: Create the API routes
The recipe exposes five route handlers through the Next.js App Router. Each validates the request body with Zod, delegates to the appropriate service class, and returns structured JSON responses with proper HTTP status codes.
Create app/api/agent/generate/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { generate } from "../../../../src/lib/provider";import { AdCopyAgent } from "../../../../src/services/agent-service";import { RecordingSessionManager, TraceRecorder } from "../../../../src/services/recording-service";import { RecordingEngine } from "@reaatech/agent-replay-core";import { db } from "../../../../src/services/drizzle";import { traceAgentCall, flushObservability } from "../../../../src/lib/observability";import { ValidationError } from "../../../../src/services/errors";const generateRequestSchema = z.object({ prompt: z.string().min(1, "Prompt is required"), brandVoice: z.string().optional(), tone: z.string().optional(), sessionId: z.string().optional(),});const recordingEngine = new RecordingEngine();const agentSessionManager = new RecordingSessionManager();const agentTraceRecorder = new TraceRecorder(recordingEngine);const agent = new AdCopyAgent({ generate, sessionManager: agentSessionManager, traceRecorder: agentTraceRecorder, traceAgentCall: async (...args: Parameters<typeof traceAgentCall>) => { traceAgentCall(...args); await flushObservability(); }, db,});export async function POST(req: NextRequest): Promise<NextResponse> { try { const body: unknown = await req.json(); const parsed = generateRequestSchema.safeParse(body); if (!parsed.success) { const messages = parsed.error.issues.map((e: { message: string }) => e.message).join(", "); return NextResponse.json({ error: messages }, { status: 400 }); } const result = await agent.generateAdCopy(parsed.data); return NextResponse.json({ data: result }, { status: 200 }); } catch (error: unknown) { if (error instanceof ValidationError) { return NextResponse.json({ error: error.message }, { status: 400 }); } const message = error instanceof Error ? error.message : "Internal server error"; return NextResponse.json({ error: message }, { status: 500 }); }}
Create app/api/agent/feedback/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { FeedbackManager } from "../../../../src/services/feedback-service";import { db } from "../../../../src/services/drizzle";import { ValidationError, NotFoundError } from "../../../../src/services/errors";const feedbackRequestSchema = z.object({ agentRunId: z.uuid("agentRunId must be a valid UUID"), decision: z.enum(["accept", "reject", "edit"]), correctedOutput: z.string().optional(), userNotes: z.string().optional(),});const feedbackManager = new FeedbackManager(db);export async function POST(req: NextRequest): Promise<NextResponse> { try { const body: unknown = await req.json(); const parsed = feedbackRequestSchema.safeParse(body); if (!parsed.success) { const messages = parsed.error.issues.map((e: { message: string }) => e.message).join(", "); return NextResponse.json({ error: messages }, { status: 400 }); } const result = await feedbackManager.recordFeedback(parsed.data); return NextResponse.json({ data: result }, { status: 201 }); } catch (error: unknown) { if (error instanceof ValidationError) { return NextResponse.json({ error: error.message }, { status: 400 }); } if (error instanceof NotFoundError) { return NextResponse.json({ error: error.message }, { status: 404 }); } const message = error instanceof Error ? error.message : "Internal server error"; return NextResponse.json({ error: message }, { status: 500 }); }}
Create app/api/datasets/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { FineTuneDatasetService } from "../../../src/services/dataset-export-service";import { db } from "../../../src/services/drizzle";import { ValidationError } from "../../../src/services/errors";const createDatasetSchema = z.object({ name: z.string().min(1, "Dataset name is required"), description: z.string().optional(), maxExamples: z.number().optional(), exportFormat: z.enum(["jsonl"]).optional(),});const datasetService = new FineTuneDatasetService(db);export async function GET(): Promise<NextResponse> { try { const datasets = await datasetService.listDatasets(); return NextResponse.json({ data: datasets }, { status: 200 }); } catch { return NextResponse.json({ error: "Failed to list datasets" }, { status: 500 }); }}export async function POST(req: NextRequest): Promise<NextResponse> { try { const body: unknown = await req.json(); const parsed = createDatasetSchema.safeParse(body); if (!parsed.success) { const messages = parsed.error.issues.map((e: { message: string }) => e.message).join(", "); return NextResponse.json({ error: messages }, { status: 400 }); } const result = await datasetService.createDataset(parsed.data); return NextResponse.json({ data: result }, { status: 201 }); } catch (error: unknown) { if (error instanceof ValidationError) { return NextResponse.json({ error: error.message }, { status: 400 }); } const message = error instanceof Error ? error.message : "Internal server error"; return NextResponse.json({ error: message }, { status: 500 }); }}
Create the dynamic routes by adding the directory structure app/api/datasets/[id]/. First, app/api/datasets/[id]/export/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { FineTuneDatasetService } from "../../../../../src/services/dataset-export-service";import { db } from "../../../../../src/services/drizzle";import { NotFoundError } from "../../../../../src/services/errors";const datasetService = new FineTuneDatasetService(db);export async function GET( req: NextRequest, { params }: { params: Promise<{ id: string }> }): Promise<NextResponse> { void req; try { const { id } = await params; const uuidResult = z.uuid().safeParse(id); if (!uuidResult.success) { return NextResponse.json({ error: "Invalid dataset ID format" }, { status: 400 }); } const result = await datasetService.exportDataset(id); return NextResponse.json( { data: result.data, recordCount: result.recordCount }, { status: 200 } ); } catch (error: unknown) { if (error instanceof NotFoundError) { return NextResponse.json({ error: error.message }, { status: 404 }); } const message = error instanceof Error ? error.message : "Internal server error"; return NextResponse.json({ error: message }, { status: 500 }); }}
Then app/api/datasets/[id]/generate/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { FineTuneDatasetService } from "../../../../../src/services/dataset-export-service";import { db } from "../../../../../src/services/drizzle";import { NotFoundError } from "../../../../../src/services/errors";const datasetService = new FineTuneDatasetService(db);export async function POST( req: NextRequest, { params }: { params: Promise<{ id: string }> }): Promise<NextResponse> { void req; try { const { id } = await params; const uuidResult = z.uuid().safeParse(id); if (!uuidResult.success) { return NextResponse.json({ error: "Invalid dataset ID format" }, { status: 400 }); } const result = await datasetService.generateDataset(id); return NextResponse.json( { data: { recordCount: result.recordCount, format: result.format } }, { status: 200 } ); } catch (error: unknown) { if (error instanceof NotFoundError) { return NextResponse.json({ error: error.message }, { status: 404 }); } const message = error instanceof Error ? error.message : "Internal server error"; return NextResponse.json({ error: message }, { status: 500 }); }}
Expected output: five route handler files under app/api/. Note that dynamic route params in Next.js 16 are a Promise<{ id: string }> — you must await params to get the value.
Step 14: Create the source entry point
The src/index.ts re-exports every public class and function so other modules or external consumers can import from a single entry:
ts
export { AdCopyAgent } from "./services/agent-service";export { FeedbackManager } from "./services/feedback-service";export { FineTuneDatasetService } from "./services/dataset-export-service";export { AgentEvaluationService } from "./services/evaluation-service";export { createRecordingService, RecordingSessionManager, TraceRecorder } from "./services/recording-service";export { InterceptorManager } from "./services/interceptor-manager";export { ValidationError, NotFoundError, AgentServiceError } from "./services/errors";export { generate, getModel, type ModelConfig } from "./lib/provider";export { traceAgentCall, flushObservability } from "./lib/observability";
Expected output: src/index.ts re-exports every service class, error type, and utility function. Run pnpm typecheck to verify.
Step 15: Run the tests
The project includes a test suite with numerous test cases across service, lib, API route, and integration tests. All external dependencies (LLM provider, database, recording engine) are mocked so no live network calls are needed.
Run the full test suite with coverage:
terminal
pnpm test
Expected output: vitest runs all test files and prints a summary. The coverage threshold enforces at least 90% across all four metrics (lines, branches, functions, statements) on runtime code (src/**/*.ts and app/**/route.ts). UI files like page.tsx and layout.tsx are excluded from coverage by design.
Step 16: Try the full flow
Start the Next.js dev server:
terminal
pnpm dev
Expected output: the terminal prints a message indicating the server is ready on http://localhost:3000.
Now walk through the full agent feedback loop with curl. First, generate ad copy:
terminal
curl -X POST http://localhost:3000/api/agent/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "Write a short ad for running shoes", "brandVoice": "energetic", "tone": "motivational"}'
Expected output: a JSON response with data.id, data.output containing the generated ad copy, data.modelUsed, token counts, duration, and a tracePath pointing to the .artrace.json file.
Copy the id from the response. Record feedback on that run (accept it):
Expected output: status 200 with data as an array of { messages: [{ "role": "user", "content": "..." }, { "role": "assistant", "content": "..." }] } objects — ready to use with OpenAI’s fine-tuning API.
List all datasets:
terminal
curl http://localhost:3000/api/datasets
Expected output: status 200 with data as an array containing the dataset you just created, with status: "ready".
Next steps
Swap the LLM provider: Change OPENAI_API_KEY and AGENT_MODEL to use Anthropic, Google, or a local model via a compatible endpoint — the recipe uses @ai-sdk/openai but the generate wrapper is provider-agnostic.
Add a retention policy: Implement a cron job or scheduled task that archives or deletes old traces and feedback records after a configurable number of days.
Build a web UI: Create React pages that let users view agent runs, submit feedback through a form, and trigger dataset exports with one click, using the existing API routes as the backend.
Automate evaluation: Use AgentEvaluationService to run a golden-trajectory comparison on every new agent deployment, catching regressions before they reach production.