Files · Cohere RAG Legal Research for SMB Law Firms
35 (0 binary, 267.3 kB total)attempt 2
README.md·4018 B·markdown
markdown
# Cohere RAG Legal Research for SMB Law Firms
Instant case law Q&A with citations, powered by Cohere embeddings and retrieval.
## Problem
Small law firms cannot afford paralegal hours for manual legal research. Important precedents are missed, increasing risk. This system indexes firm documents and public case law into Qdrant using Cohere Embed, then serves answers using Cohere Command R+ for retrieval-augmented generation.
## Architecture
```
POST /api/ingest → Ingestion pipeline → Qdrant vector store
↓
POST /api/chat → Route query → Simple QA or Complex Research
↓
MemoryRetriever → Qdrant search → Context injection
↓
Cohere Command R+ → Answer with citations
```
The system uses Next.js App Router for API routes and REAA agent packages for routing, retrieval, and budget control.
## Tech Stack
- **Framework:** Next.js 15 (App Router)
- **LLM:** Cohere Command R+ (`command-a-03-2025`)
- **Embeddings:** Cohere (`embed-english-v3.0`, 1024 dim)
- **Vector Store:** Qdrant (REST)
- **Orchestration:** `@reaatech/agent-memory-retrieval`, `@reaatech/agent-memory-embedding`, `@reaatech/agent-handoff-routing`, `@reaatech/agent-budget-engine`
- **Chunking:** LlamaIndex (`SentenceSplitter`)
- **Validation:** Zod 4
## Prerequisites
- Node.js >= 22
- pnpm 10
- A running Qdrant instance (see below)
- A Cohere API key
### Qdrant (Docker)
```bash
docker run -p 6333:6333 qdrant/qdrant
```
## Setup
```bash
pnpm install
cp .env.example .env
# Edit .env with your keys
pnpm dev
```
## Environment Variables
| Variable | Description |
|---|---|
| `COHERE_API_KEY` | Cohere API key (from https://dashboard.cohere.com/api-keys) |
| `QDRANT_URL` | Qdrant HTTP URL (default: http://localhost:6333) |
| `QDRANT_API_KEY` | Optional Qdrant API key |
## API Endpoints
### POST /api/chat
Send a legal query and receive an answer with citations.
**Request:**
```json
{
"query": "What is the statute of limitations for breach of contract?",
"conversationId": "conv-1"
}
```
**Response (200):**
```json
{
"answer": "Based on the retrieved case law...",
"citations": ["Smith v. Jones (2020)", "Contract Law § 123"]
}
```
**Error responses:**
- `400` — Invalid request body or missing fields
- `402` — Budget exceeded (payment required)
- `500` — Internal server error (Cohere API, Qdrant, etc.)
### POST /api/ingest
Ingest a legal document for indexing.
**Request:**
```json
{
"document": "The parties agree that the statute of limitations..."
}
```
**Response (201):**
```json
{
"chunks": [
{ "id": "uuid-1", "text": "The parties agree...", "metadata": { "chunkIndex": 0, "source": "legal-document" } }
]
}
```
## Cost Estimation
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| `command-a-03-2025` | $0.015 | $0.06 |
| `embed-english-v3.0` | $0.0001 | $0 |
Per-conversation budget is set to $0.50 by default. The system pre-checks budget before each API call and blocks requests that exceed the limit.
## Query Routing
The system automatically classifies queries:
- **Simple QA** (`simple-qa` agent): Short, factual queries like "What is a tort?" — single retrieval + answer
- **Complex Research** (`complex-research` agent): Multi-step queries like "Compare recent data privacy rulings across California and EU jurisdictions" — decomposes into sub-questions, retrieves for each, synthesizes final answer
## Development
```bash
pnpm dev # Start Next.js dev server
pnpm typecheck # Run TypeScript type checking
pnpm lint # Run ESLint
pnpm test # Run vitest with coverage
```
## Test Architecture
Tests use:
- **MSW** (Mock Service Worker) for HTTP mocking of Cohere and Qdrant endpoints
- **vi.stubEnv** for environment variable mocking
- Direct module mocking for REAA packages
- In-memory storage for Qdrant adapter tests
Run tests with: `pnpm test`
## License
MIT — see LICENSE