A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build a legal research engine for small law firms that indexes documents into a Qdrant vector store using Cohere embeddings and answers natural-language legal questions with citations. When a lawyer asks “What is the statute of limitations for breach of contract?”, the system retrieves relevant case law from Qdrant and generates a cited answer through Cohere command-a-03-2025. Complex multi-step queries — like comparing rulings across jurisdictions — are automatically decomposed into sub-questions, answered independently with per-query budget checks, and synthesized into a single response. By the end you’ll have a working Next.js 15 application with two API routes (POST /api/chat and POST /api/ingest), a query router that classifies complexity, a budget engine that caps per-conversation spending at $0.50, and a full test suite.
Prerequisites
Node.js >= 22 and pnpm 10 (the package.json specifies "engines": { "node": ">=22" } and "packageManager": "pnpm@10.0.0")
A running Qdrant instance — start one with Docker: docker run -p 6333:6333 qdrant/qdrant
Familiarity with TypeScript, Next.js App Router, and basic REST API concepts
The Cohere command-a-03-2025 and embed-english-v3.0 models available on your Cohere account
Step 1: Scaffold the project
Create an empty directory and set up the project skeleton. You’ll write a package.json that pins every dependency, then add TypeScript, ESLint, Next.js, and test configuration.
Install everything with pnpm. This pulls in Next.js 15, Cohere’s TypeScript SDK, the Qdrant REST client, LlamaIndex for text chunking, Zod for validation, four REAA agent packages, and all dev tooling.
terminal
pnpm install
Expected output: pnpm resolves all packages and creates pnpm-lock.yaml and node_modules/. You should see the install complete without errors.
Step 3: Set environment variables
The application reads three environment variables. Copy the example file and fill in your real keys.
QDRANT_URL — the HTTP endpoint of your Qdrant instance (default: http://localhost:6333)
QDRANT_API_KEY — optional; only needed if your Qdrant cluster requires authentication
Step 4: Create shared types and the Cohere pricing provider
All request/response shapes are validated with Zod. This module also defines CoherePricingProvider, which implements the PricingProvider interface expected by the budget engine, and InMemorySpendStore, a local adapter for tracking per-conversation spend.
The QdrantStorageAdapter wraps the @qdrant/js-client-rest client and implements the MemoryStorage interface from @reaatech/agent-memory-storage. This allows the retrieval module to treat Qdrant as a generic memory store — it handles point CRUD, semantic vector search, metadata scanning, and health checks. The adapter is over 200 lines; the full file is in the downloadable artifact. Below are the constructor and the core search methods.
Create src/lib/qdrant-storage.ts:
ts
import { type MemoryStorage, type SearchOptions, type MetadataFilter, type MemorySearchResult, type BackupData, type BatchUpdate, MemoryQuery,} from "@reaatech/agent-memory-storage";import { type Memory, type MemoryId, type HealthStatus, MemoryType, MemoryImportance, MemorySource, MemoryLifecycle,} from "@reaatech/agent-memory-core";import { QdrantClient } from "@qdrant/js-client-rest";function qdrantPointToMemory( id: string,
Step 6: Build the document ingestion pipeline
The ingestion module takes a raw legal document string, splits it into overlapping chunks using LlamaIndex’s SentenceSplitter, generates Cohere embeddings for each chunk, and upserts the resulting vectors into the Qdrant legal-docs collection. A collection management function (ensureCollection) creates the collection on first use with 1024-dimensional cosine-distance vectors.
Create src/lib/ingestion.ts:
ts
import { CohereEmbeddingProvider, CachedEmbeddingProvider, InMemoryEmbeddingCache,} from "@reaatech/agent-memory-embedding";import { SentenceSplitter } from "llamaindex";import { QdrantClient } from "@qdrant/js-client-rest";import { randomUUID } from "node:crypto";import type { DocumentChunk } from "./types.js";function getEmbedder(): CachedEmbeddingProvider { const apiKey = process.env.COHERE_API_KEY; if (!apiKey) { throw new Error("COHERE_API_KEY environment variable is not set"); } const base = new CohereEmbeddingProvider({ apiKey, model: "embed-english-v3.0", dimensions: 1024, }); return new CachedEmbeddingProvider( base, new InMemoryEmbeddingCache({ maxSize: 1000, ttlMs: 60000 }), );}function getQdrantClient(): QdrantClient { const url = process.env.QDRANT_URL; if (!url) { throw new Error("QDRANT_URL environment variable is not set"); } const apiKey = process.env.QDRANT_API_KEY; return new QdrantClient({ url, ...(apiKey ? { apiKey } : {}), });}const DEFAULT_COLLECTION = "legal-docs";export async function ensureCollection( name: string = DEFAULT_COLLECTION,): Promise<void> { const client = getQdrantClient(); const collections = await client.getCollections(); const exists = collections.collections.some((c) => c.name === name); if (exists) { return; } await client.createCollection(name, { vectors: { size: 1024, distance: "Cosine" }, });}export async function ingestDocument( document: string,): Promise<DocumentChunk[]> { if (document.length === 0) { return []; } const embedder = getEmbedder(); const splitter = new SentenceSplitter({ chunkSize: 512, chunkOverlap: 50, }); const nodes = splitter.splitText(document); const texts: string[] = (nodes as Array<{ text?: string } | string>).map((n) => { if (typeof n === "string") { return n; } return n.text ?? JSON.stringify(n); }); if (texts.length === 0) { return []; } const vectors = await embedder.embedBatch(texts); const client = getQdrantClient(); const collectionName = DEFAULT_COLLECTION; const chunks: DocumentChunk[] = texts.map((text, index) => ({ id: randomUUID(), text, metadata: { chunkIndex: index, source: "legal-document" }, })); await client.upsert(collectionName, { points: chunks.map((chunk, index) => ({ id: chunk.id, vector: vectors[index] ?? [], payload: { text: chunk.text, metadata: chunk.metadata, }, })), }); return chunks;}
Step 7: Build the retrieval module
The retrieval module wires together the Cohere embedding provider, the Qdrant storage adapter, and the Cohere chat client into a single performRetrieval function. It uses MemoryRetriever from @reaatech/agent-memory-retrieval to search Qdrant for the top 5 semantically similar documents, formats them into an LLM context with ContextInjector, and sends them to Cohere command-a-03-2025 along with a legal system prompt. If no documents match, it still calls the LLM to give the lawyer an honest “no relevant precedents found” answer.
Create src/lib/retrieval.ts:
ts
import { MemoryRetriever, ContextInjector, RetrievalStrategy,} from "@reaatech/agent-memory-retrieval";import { CohereEmbeddingProvider, CachedEmbeddingProvider, InMemoryEmbeddingCache,} from "@reaatech/agent-memory-embedding";import { CohereClientV2 } from "cohere-ai";import { QdrantClient } from "@qdrant/js-client-rest";import { QdrantStorageAdapter } from "./qdrant-storage.js";function getEmbedder(): CachedEmbeddingProvider { const apiKey = process.env.COHERE_API_KEY; if (!apiKey) { throw new
Step 8: Set up query routing
The routing module uses @reaatech/agent-handoff-routing to classify each incoming query. Two agents are registered — simple-qa for short factual lookups and complex-research for multi-step jurisdictional analysis. A keyword-based complexity detector scans for markers like “compare”, “across jurisdictions”, and “trend analysis” to decide which skills are required, then the CapabilityBasedRouter picks the best agent. The route decision also handles clarification and fallback cases gracefully.
Create src/lib/routing.ts:
ts
import { CapabilityBasedRouter, AgentRegistry,} from "@reaatech/agent-handoff-routing";import type { AgentCapabilities, HandoffPayload, Specialization, AvailabilityStatus,} from "@reaatech/agent-handoff";import { randomUUID } from "node:crypto";const registry = new AgentRegistry();const simpleQaSkills: Specialization[] = [];const simpleQaAvail: AvailabilityStatus = "available";registry.register({ agentId: "simple-qa",
Step 9: Add the budget engine
The budget module wraps @reaatech/agent-budget-engine into three helper functions: createConversationBudget sets a $0.50 limit for each conversation, preCheckBudget validates whether an estimated cost fits within the remaining budget before making an API call, and recordSpend logs actual usage after each call. This prevents runaway costs from multi-step queries.
Create src/lib/budget.ts:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { CoherePricingProvider } from "./types.js";const spendStore = new SpendStore();export const controller = new BudgetController({ spendTracker: spendStore, pricing: new CoherePricingProvider(),});export function createConversationBudget( conversationId: string, limitDollars: number,): void { controller.defineBudget({ scopeType: "conversation" as never, scopeKey: conversationId, limit: limitDollars, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: [], disableTools: [], }, });}export function preCheckBudget( conversationId: string, estimatedCost: number, modelId: string,): { allowed: boolean; action?: string; remaining?: number } { const result = controller.check({ scopeType: "conversation" as never, scopeKey: conversationId, estimatedCost, modelId, tools: [], }); return { allowed: result.allowed, action: result.action, remaining: result.remaining, };}export function recordSpend( conversationId: string, cost: number, inputTokens: number, outputTokens: number, modelId: string,): void { controller.record({ requestId: crypto.randomUUID(), scopeType: "conversation" as never, scopeKey: conversationId, cost, inputTokens, outputTokens, modelId, provider: "cohere", timestamp: new Date(), });}
Step 10: Create the API routes
You need two route handlers under the App Router: POST /api/chat for legal Q&A and POST /api/ingest for document indexing. Create the directory structure first:
terminal
mkdir -p app/api/chat app/api/ingest
Create app/api/ingest/route.ts — this validates the incoming JSON, ensures the Qdrant collection exists, runs the ingestion pipeline, and returns the created chunks:
Create app/api/chat/route.ts — this is the main entry point. It validates the request, creates a conversation budget, routes the query, and dispatches to either simple Q&A or complex multi-step research:
ts
import { NextRequest, NextResponse } from "next/server.js";import { ChatRequestSchema, CoherePricingProvider } from "../../../src/lib/types.js";import { routeQuery } from "../../../src/lib/routing.js";import { createConversationBudget, preCheckBudget, recordSpend,} from "../../../src/lib/budget.js";import { performRetrieval } from "../../../src/lib/retrieval.js";import { CohereClientV2 } from "cohere-ai";const pricingProvider = new CoherePricingProvider();function extractText(response: unknown): string { if (
Step 11: Add the frontend pages
Next.js App Router requires at least a root layout. Add a simple homepage that describes the two available API endpoints.
Create app/layout.tsx:
tsx
import type { Metadata } from "next";import type { ReactNode } from "react";export const metadata: Metadata = { title: "Cohere RAG Legal Research", description: "Instant case law Q&A with citations, powered by Cohere embeddings and retrieval.",};export default function RootLayout({ children,}: { children: ReactNode;}): React.ReactElement { return ( <html lang="en"> <body>{children}</body> </html> );}
Create app/page.tsx:
tsx
export default function HomePage(): React.ReactElement { return ( <main> <h1>Cohere RAG Legal Research</h1> <p> Instant case law Q&A with citations, powered by Cohere embeddings and retrieval. </p> <ul> <li>POST /api/chat — send a legal query and get an answer with citations</li> <li>POST /api/ingest — ingest a legal document for indexing</li> </ul> </main> );}
Step 12: Write and run the tests
The test suite uses Vitest with MSW (Mock Service Worker) to intercept HTTP calls to Cohere and Qdrant, plus vi.mock to stub REAA packages. This keeps tests fast and deterministic — no real API calls or running Qdrant needed.
Create tests/setup.ts — this configures MSW with handlers for the Cohere chat endpoint, the Cohere embed endpoint, and Qdrant’s collections endpoint:
ts
import { setupServer } from "msw/node";import { http, HttpResponse } from "msw";import { beforeAll, afterEach, afterAll, vi } from "vitest";export const handlers = [ http.post("https://api.cohere.com/v2/chat", () => HttpResponse.json({ id: "cmpl_test", message: { role: "assistant", content: [{ type: "text", text: "Based on the retrieved documents, the statute of limitations for breach of contract is typically 4-6 years depending on jurisdiction." }], }, usage: { input_tokens: 50, output_tokens: 30 }, }), ), http.post("https://api.cohere.com/v1/embed", () => HttpResponse.json({ id: "embed_test", texts: ["legal text"], embeddings: [[0.1, 0.2, 0.3]], meta: { billed_units: { input_tokens: 10 } }, }), ), http.get("http://localhost:6333/collections", () => HttpResponse.json({ result: { collections: [{ name: "legal-docs" }] }, status: "ok", time: 0.001, }), ),];export const server = setupServer(...handlers);beforeAll(() => { server.listen({ onUnhandledRequest: "error" });});afterEach(() => { server.resetHandlers(); vi.unstubAllEnvs();});afterAll(() => { server.close();});
Run the full test suite with coverage:
terminal
pnpm test
Expected output: All tests pass with coverage percentages meeting the thresholds (lines >= 90%, branches >= 89%, functions >= 90%, statements >= 90%). The JSON report is written to vitest-report.json and the text coverage table prints to the terminal.
For a quick smoke test, start the dev server and hit the endpoints with curl:
terminal
pnpm dev
In another terminal, ingest a sample document:
terminal
curl -X POST http://localhost:3000/api/ingest \ -H "Content-Type: application/json" \ -d '{"document": "Under the Uniform Commercial Code § 2-725, an action for breach of any contract for sale must be commenced within four years after the cause of action has accrued."}'
Expected output: HTTP 201 with a JSON body containing a chunks array — each chunk has an id, text, and metadata.
Now ask a question:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -d '{"query": "What is the statute of limitations for breach of contract?", "conversationId": "conv-1"}'
Expected output: HTTP 200 with an answer string and a citations array. The answer should reference the UCC statute and the citations should include snippets from the ingested document.
Next steps
Extend the keyword-based complexity detector in routing.ts with a lightweight local classifier (e.g., a fine-tuned BERT model) for better accuracy on edge-case legal queries
Replace the ephemeral in-memory spend store with a persistent database (PostgreSQL or DynamoDB) so budget state survives server restarts
Add a /api/health endpoint that pings both the Cohere API and Qdrant, returning the HealthStatus from the storage adapter
payload: Record<string, unknown> | undefined,
score?: number,
): Memory {
return {
id,
tenantId: "default",
ownerId: "default",
content: (payload?.text as string | undefined) ?? "",
type: MemoryType.FACT,
source: MemorySource.SYSTEM_EVENT,
importance: MemoryImportance.MEDIUM,
confidence: score ?? 1.0,
tags: [],
lifecycle: MemoryLifecycle.ACTIVE,
createdAt: new Date(
(payload?.createdAt as string | undefined) ?? Date.now(),
"You are a legal research assistant. Break down the user's complex legal query into 2-3 specific sub-questions. Return each sub-question on a separate line, starting with a dash.",