Files · Mistral AI Lead Intake for Clio Legal Client Onboarding
73 (1 binary, 571.0 kB total)attempt 1
README.md·4289 B·markdown
markdown
# Mistral AI Lead Intake for Clio Legal Client Onboarding
> Capture new client leads via web chat and documents, detect duplicates with hybrid-RAG, and automatically create contacts and matters in Clio through its REST API.
## What it does
Mistral-powered legal lead intake pipeline:
1. **Conversational intake** — a chat interface powered by `@mistralai/mistralai` that asks structured questions and extracts lead fields (name, email, case type, description).
2. **Hybrid-RAG deduplication** — incoming leads are compared against previously indexed leads using hybrid vector + BM25 retrieval to flag potential duplicates.
3. **Document ingestion** — upload PDFs or images; text is extracted via `unpdf` or `tesseract.js` OCR, chunked, and analyzed for lead fields.
4. **Clio sync** — authenticated leads are pushed to Clio as contacts and matters via the Clio REST API with OAuth2.
## Architecture
```
┌─────────────┐ ┌──────────────┐ ┌────────────────┐
│ Next.js UI │────▶│ API Routes │────▶│ Service Layer │
│ (page.tsx) │ │ /chat │ │ MistralChat │
│ │ │ /upload │ │ DedupService │
│ │ │ /clio/* │ │ Ingestion │
│ │ │ /dedup │ │ ClioService │
└─────────────┘ └──────────────┘ └───────┬────────┘
│
┌────────────┴────────────┐
│ @reaatech/* Stack │
│ (hybrid-rag, embedding,│
│ ingestion, retrieval) │
└────────────┬────────────┘
│
┌────────────┴────────────┐
│ Langfuse (tracing) │
└─────────────────────────┘
```
## Prerequisites
- **Node.js** >= 22
- **pnpm** (see `packageManager` in `package.json`)
- API keys for: Mistral AI, OpenAI, Langfuse, Qdrant, Clio (OAuth2)
## Quick start
```bash
cp .env.example .env
```
Populate the following in `.env`:
| Variable | Description |
|---|---|
| `MISTRAL_API_KEY` | Mistral AI API key |
| `OPENAI_API_KEY` | OpenAI API key (for embeddings) |
| `LANGFUSE_PUBLIC_KEY` | Langfuse public key |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key |
| `LANGFUSE_HOST` | Langfuse host URL |
| `QDRANT_URL` | Qdrant vector database URL |
| `QDRANT_COLLECTION_NAME` | Qdrant collection name (default: `leads`) |
| `CLIO_CLIENT_ID` | Clio OAuth2 client ID |
| `CLIO_CLIENT_SECRET` | Clio OAuth2 client secret |
| `CLIO_REDIRECT_URI` | Clio OAuth2 redirect URI (`http://localhost:3000/api/clio/callback`) |
```bash
pnpm install
pnpm dev
```
Open [http://localhost:3000](http://localhost:3000).
## API routes
| Method | Path | Purpose |
|---|---|---|
| `POST` | `/api/chat` | Send a chat message; returns assistant response, lead data, and dedup result |
| `GET` | `/api/chat` | Health check |
| `POST` | `/api/upload` | Upload a document (PDF/image); returns extracted text, chunks, lead fields, and dedup result |
| `POST` | `/api/dedup` | Standalone duplicate check; returns `{ isDuplicate, matchedLeadId, similarityScore, matchedChunks }` |
| `POST` | `/api/clio/auth` | Generate Clio OAuth2 authorization URL; returns `{ authUrl }` |
| `GET` | `/api/clio/callback` | Clio OAuth2 callback; exchanges code for token; returns `{ token }` |
| `POST` | `/api/clio/push-lead` | Push a validated lead to Clio as a contact + matter; requires `Authorization: Bearer <token>` |
## Testing
```bash
pnpm test
```
Runs vitest with coverage reporting.
## License
MIT — see [LICENSE](./LICENSE).