A paralegal at a small plaintiff litigation firm spends countless hours reading through deposition transcripts to extract key facts, contradictions, and important testimony for trial prep. This manual summarization is not billable and often delays case strategy meetings. The paralegal feels overwhelmed by the volume of transcripts and fears missing critical details. They need a tool that can automatically generate accurate, organized summaries with citations to the original transcript.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a Deposition Prep Summarizer — a full-stack Next.js app with a CLI that ingests deposition transcript PDFs, extracts and chunks the text, stores it in semantic memory, plans the context window, generates an AI summary via OpenAI or Anthropic, tracks costs, and supports evaluation and session replay. If you’re a paralegal at a small plaintiff litigation firm, this tool turns hours of deposition reading into structured, citable summaries in minutes.
You’ll wire up six REAA (Reaatech Experimental Agent Architecture) packages — hybrid-rag, agent-memory, context-window-planner, llm-cost-telemetry, agent-eval-harness-suite, and agent-replay — with the Vercel AI SDK and Next.js 16 App Router. The architecture is provider-agnostic: you can flip between OpenAI and Anthropic with an environment variable.
Prerequisites
Node.js >= 22 and pnpm 10 installed
An OpenAI API key (for the default provider and for memory embeddings)
An Anthropic API key (optional — used when LLM_PROVIDER=anthropic)
Basic familiarity with TypeScript, Next.js App Router, and the vitest test runner
The project scaffold already exists — package.json, tsconfig.json, next.config.ts, and config files are in place. You only need to add the feature code.
Step 1: Configure environment variables
Copy .env.example to .env and fill in your API keys. The file lists every variable the application reads at runtime.
env
# .env
NODE_ENV=development
OPENAI_API_KEY=<your-openai-key>
ANTHROPIC_API_KEY=<your-anthropic-key>
LLM_PROVIDER=openai
LLM_MODEL=gpt-5.2
COST_BUDGET_DAILY=10.00
TELEMETRY_ENABLED=false
Expected output:LLM_PROVIDER and LLM_MODEL control which AI SDK provider is dynamically imported (defaulting to OpenAI with gpt-5.2). COST_BUDGET_DAILY caps daily spend in the cost tracker. OPENAI_API_KEY powers both the LLM calls and memory embeddings. ANTHROPIC_API_KEY is needed when you switch the provider. TELEMETRY_ENABLED is reserved for gating cost-span recording in a future iteration.
Step 2: Define the domain types
Create the core interfaces that model a deposition and its summary. These types flow through the entire pipeline.
src/types/index.ts — re-exports everything plus the LLMProvider union:
ts
export type { DepositionMetadata, Contradiction, ImportantTestimony, DepositionSummary, SummaryConfig,} from "./deposition.js";export type LLMProvider = "openai" | "anthropic";
Expected output: Five interfaces and a union type, exported from src/types/index.ts. Every downstream service imports from this barrel.
Step 3: Create Zod validation schemas
The API boundary validates incoming requests with Zod. Schemas also provide defaults for the summarization configuration.
src/validation/schemas.ts:
ts
import { z } from "zod";export const SummarizeRequestSchema = z.object({ caseId: z.string().min(1), deponentName: z.string().min(1),});export type SummarizeRequest = z.infer<typeof SummarizeRequestSchema>;export const SummaryConfigSchema = z.object({ maxTokens: z.number().default(2048), temperature: z.number().default(0.3), includeTimeline: z.boolean().default(true), includeContradictions: z.boolean().default(true),});export type SummaryConfigParsed = z.infer<typeof SummaryConfigSchema>;
Expected output: Two Zod schemas and their inferred types. SummarizeRequestSchema validates caseId and deponentName at the API boundary. SummaryConfigSchema provides sensible defaults you can override.
Step 4: Set up constants
Pull default values into a single module so every service references the same numbers.
src/constants.ts:
ts
import { SummaryConfigSchema } from "./validation/schemas.js";import type { SummaryConfig } from "./types/deposition.js";export const DEFAULT_SUMMARY_CONFIG: SummaryConfig = SummaryConfigSchema.parse({});export const DEPOSITION_CHUNK_SIZE = 1024;export const DEPOSITION_CHUNK_OVERLAP = 100;export const MAX_RETRIEVAL_MEMORIES = 20;
Expected output: Four constants. DEFAULT_SUMMARY_CONFIG is the parsed default (2048 max tokens, 0.3 temperature, both booleans true). The chunking constants drive chunkText() in the PDF ingestion service.
Step 5: Build the PDF ingestion service
This service extracts text from a deposition PDF using unpdf, maps PDF metadata to DepositionMetadata, and provides a custom chunking function that splits long text into Chunk[] objects compatible with @reaatech/hybrid-rag.
Expected output:ingestDepositionPdf(buffer) returns { metadata, rawText, totalPages }. chunkText(text, 1024, 100) splits a long transcript into sliding-window chunks of up to 1024 characters with 100-character overlap, estimating token counts as Math.ceil(length / 4). On a corrupt PDF the function throws IngestionError.
Step 6: Build the deposition memory service
Wraps @reaatech/agent-memory’s AgentMemory class to store deposition chunks and retrieve relevant memories for summarization context.
src/services/deposition-memory.ts:
ts
import { AgentMemory, OpenAILLMProvider, MemoryType, type Memory,} from "@reaatech/agent-memory";import type { ConversationTurn } from "@reaatech/agent-memory-core";import type { Chunk } from "@reaatech/hybrid-rag";import { MAX_RETRIEVAL_MEMORIES } from "../constants.js";export class DepositionMemory { private agentMemory: AgentMemory; constructor(config: { apiKey: string; model?: string }) { const model = config.model ?? "gpt-4o-mini"; const llmProvider = new OpenAILLMProvider({ apiKey: config.apiKey, model }); this.agentMemory = new AgentMemory({ storage: { provider: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", apiKey: config.apiKey, }, extraction: { llmProvider, enabledTypes: [MemoryType.FACT], batchSize: 10, confidenceThreshold: 0.7, }, }); } async storeChunks(chunks: Chunk[]): Promise<number> { const turns: ConversationTurn[] = chunks.map((chunk) => ({ speaker: "user" as const, content: chunk.content, timestamp: new Date(), })); const stored = await this.agentMemory.extractAndStore(turns); return stored.length; } async queryMemory(query: string, limit?: number): Promise<Memory[]> { return this.agentMemory.retrieve(query, { limit: limit ?? MAX_RETRIEVAL_MEMORIES, }); } async runMaintenance(): Promise<void> { await this.agentMemory.runMaintenance(); } async close(): Promise<void> { await this.agentMemory.close(); }}export function createDepositionMemory(): DepositionMemory { const apiKey = process.env.OPENAI_API_KEY; if (!apiKey) { throw new Error("OPENAI_API_KEY environment variable is required"); } return new DepositionMemory({ apiKey });}
Expected output:DepositionMemory wraps AgentMemory with in-memory storage, OpenAI embeddings (text-embedding-3-small), and FACT extraction at 0.7 confidence. storeChunks maps chunks to ConversationTurn[] and calls extractAndStore. queryMemory calls retrieve with a default limit of 20. The createDepositionMemory() factory reads OPENAI_API_KEY from the environment.
Step 7: Build the context planner
Uses @reaatech/context-window-planner to fit deposition chunks and a system prompt into a token budget, dropping low-priority chunks when the window is too small.
Expected output:DepositionContextPlanner builds a planner with an 8000-token budget (configurable), 500 reserved tokens for overhead, and a priority-greedy packing strategy. planDepositionContext assigns decreasing relevance scores to later chunks so earlier testimony is prioritized when the budget is tight. On overflow, pack() returns an included and dropped list — the summarizer uses only included.
Step 8: Build the provider-agnostic LLM client
This is the bridge between the application and the Vercel AI SDK. It dynamically imports the correct provider based on the LLM_PROVIDER environment variable.
src/services/llm-client.ts:
ts
import { generateText } from "ai";import type { SummaryConfig } from "../types/deposition.js";export class SummarizationError extends Error { constructor(message: string, public cause?: unknown) { super(message); this.name = "SummarizationError"; }}export async function generateSummary( prompt: string, config: SummaryConfig,): Promise<{ content: string; inputTokens: number; outputTokens: number }> { const provider = process.env.LLM_PROVIDER ?? "openai"; const modelId = process.env.LLM_MODEL ?? (provider === "openai" ? "gpt-5.2" : "claude-sonnet-4-6"); async function getModel() { if (provider === "openai") { const m = await import("@ai-sdk/openai"); return m.openai(modelId); } if (provider === "anthropic") { const m = await import("@ai-sdk/anthropic"); return m.anthropic(modelId); } throw new SummarizationError(`Unknown LLM provider: ${provider}`); } const model = await getModel(); try { const result = await generateText({ model, prompt, maxOutputTokens: config.maxTokens, }); return { content: result.text, inputTokens: result.usage.inputTokens ?? 0, outputTokens: result.usage.outputTokens ?? 0, }; } catch (err) { throw new SummarizationError("Failed to generate summary", err); }}
Expected output:generateSummary(prompt, config) calls generateText from the Vercel AI SDK using whichever provider LLM_PROVIDER selects. It uses dynamic import() so the unused provider never loads. Returns the generated text plus token counts. On failure it throws SummarizationError with the original error as cause.
Step 9: Build the cost telemetry service
Tracks every LLM call’s token usage and computes cost using @reaatech/llm-cost-telemetry helpers.
Expected output:trackCall computes input/output costs from token counts using calculateCostFromTokens and stores a CostSpan. getTotalCost sums all spans, isOverBudget compares against COST_BUDGET_DAILY. Costs are rounded to 6 decimal places.
Step 10: Wire the orchestrator summarizer
This is the central pipeline class that ties PDF ingestion, chunking, memory storage, context planning, LLM summarization, and cost tracking into a single summarize() call.
src/services/summarization.ts:
ts
import type { SummaryConfig, DepositionSummary, Contradiction,} from "../types/deposition.js";import type { DepositionMemory } from "./deposition-memory.js";import type { DepositionContextPlanner } from "./context-planner.js";import type { DepositionCostTracker } from "./cost-telemetry.js";import { ingestDepositionPdf, chunkText } from "./pdf-ingestion.js";import { generateSummary } from "./llm-client.js";import { DEFAULT_SUMMARY_CONFIG, DEPOSITION_CHUNK_SIZE, DEPOSITION_CHUNK_OVERLAP,} from "../constants.js";import type { Chunk }
Expected output:DepositionSummarizer runs a 7-stage pipeline: ingest PDF → chunk text → store in memory → run maintenance → query memory → plan context → generate LLM summary → track cost. Every stage is wrapped in a try/catch that throws PipelineError with a named stage so you can pinpoint failures. The returned DepositionSummary includes token usage and computed cost.
Step 11: Build the evaluation harness and replay service
These two services give you quality measurement and debugging/traceability for the summarization pipeline.
export { SummarizationError, generateSummary } from "./llm-client.js";export { IngestionError, ingestDepositionPdf, chunkText } from "./pdf-ingestion.js";export type { IngestionResult } from "./pdf-ingestion.js";export { DepositionMemory, createDepositionMemory } from "./deposition-memory.js";export { DepositionContextPlanner, DepositionContextPlannerError,} from "./context-planner.js";export { DepositionCostTracker } from "./cost-telemetry.js";export { PipelineError, DepositionSummarizer,} from "./summarization.js";export type { DepositionSummarizerDeps } from "./summarization.js";export { DepositionEvalHarness } from "./evaluation.js";export { DepositionReplayService } from "./replay-service.js";
Expected output: Two service classes plus a barrel index. DepositionEvalHarness evaluates summaries against five metrics (faithfulness, relevance, cost, latency, coherence) using a Claude judge model. DepositionReplayService records, saves, replays, and lists summarization traces on disk.
Step 12: Wire the API routes
Three App Router route handlers expose the pipeline as HTTP endpoints.
app/api/health/route.ts:
ts
import { NextResponse } from "next/server";export function GET() { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), });}
app/api/summarize/[id]/route.ts — stores and retrieves summaries by ID:
ts
import { type NextRequest, NextResponse } from "next/server";export const summaryStore = new Map<string, unknown>();export async function GET( _req: NextRequest, { params }: { params: Promise<{ id: string }> },) { const { id } = await params; const summary = summaryStore.get(id); if (!summary) { return NextResponse.json({ error: "Not found" }, { status: 404 }); } return NextResponse.json({ summary });}
app/api/summarize/route.ts — the main POST endpoint:
Expected output: Three routes. GET /api/health returns { status: "ok", timestamp }. POST /api/summarize accepts a multipart form with a PDF file, caseId, and deponentName, runs the pipeline, stores the result, and returns { summary }. GET /api/summarize/[id] retrieves a previously created summary by its id. The [id] route uses Next 16’s async params pattern.
Step 13: Build the CLI
A command-line interface lets you run the same pipeline without the Next.js server.
Expected output: Three subcommands. summarize ./deposition.pdf --case-id CASE-1 --deponent "Jane Doe" reads the PDF, runs the pipeline, and prints JSON to stdout. replay <trace-path> replays a recorded session. list-traces lists saved traces. Missing flags print usage to stderr and exit 1.
Step 14: Run the tests
The test suite covers every service and every API route, mocking external dependencies with vi.mock and vi.hoisted.
Run the full suite:
terminal
pnpm test
Or run directly:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: All 14 test files pass with zero failures covering 50 tests. Coverage metrics (lines, branches, functions, statements) hit the 90% threshold configured in vitest.config.ts. A passing test run produces output like:
You can also run TypeScript type checking and linting:
terminal
pnpm typecheckpnpm lint
Expected output:pnpm typecheck exits with zero TypeScript errors. pnpm lint exits with zero warnings.
Next steps
Add a web UI — Replace the app/page.tsx scaffold placeholder with a "use client" form that accepts a PDF upload, calls POST /api/summarize, and renders the returned DepositionSummary (key facts list, contradictions table, important testimony sections, cost display, and an error banner).
Implement findContradictions — The current stub returns an empty array. Wire it to an LLM call that parses the transcript for conflicting statements and returns structured Contradiction[] results with citations.
Add prompt version management — Store the summarization system prompt in a Phoenix prompt registry so you can version, tag, and A/B test different prompt strategies.
Deploy as a serverless function — The Next.js build produces a standalone output. Deploy to Vercel or a Docker container and set the env vars for a production instance your firm can use.
from
"@reaatech/hybrid-rag"
;
import type { PackingResult } from "@reaatech/context-window-planner";