Files · Anthropic RAG Pipeline for SharePoint Knowledge Search
100 (1 binary, 663.4 kB total)attempt 1
README.md·2554 B·markdown
markdown
# Anthropic RAG Pipeline for SharePoint Knowledge Search
> Enable SMB teams to find answers across SharePoint document libraries using a hybrid RAG system with human escalation.
A production-grade hybrid retrieval-augmented generation pipeline that indexes SharePoint document libraries, combining BM25 keyword search and vector similarity for precise answers. Built with the `@reaatech/*` package family.
## Prerequisites
- Node.js >= 22
- pnpm 10+
- Azure AD app registration (for SharePoint Graph API access)
- Anthropic API key
- VoyageAI API key
- Langfuse account (for observability)
## Setup
```bash
pnpm install
cp .env.example .env
# Fill in your environment variables
pnpm dev
```
## Architecture
```
User Query → POST /api/chat
→ Cache check (@reaatech/llm-cache)
→ Budget pre-check (@reaatech/agent-budget-middleware)
→ Hybrid search (LanceDB + BM25/vector)
→ Prompt building
→ Claude answer generation (@anthropic-ai/sdk)
→ Confidence routing (@reaatech/confidence-router)
→ ROUTE: Return answer with sources
→ CLARIFY: Request more details
→ FALLBACK: Slack alert via webhook
SharePoint Sync → POST /api/ingest
→ SharePoint Connector (@microsoft/microsoft-graph-client)
→ Document parsing (pdfjs-dist, mammoth)
→ Text chunking (@reaatech/hybrid-rag)
→ Embedding (VoyageAI)
→ Vector store (LanceDB)
```
## API
### `POST /api/chat`
```json
{ "query": "What is the Q3 forecast?", "useCase": "finance", "temperature": 0.3 }
```
### `POST /api/ingest`
```json
{ "since": "2024-01-01T00:00:00Z" }
```
## Environment Variables
See `.env.example` for all required variables.
| Variable | Description |
|----------|-------------|
| `ANTHROPIC_API_KEY` | Claude API key |
| `VOYAGE_API_KEY` | VoyageAI embedding API key |
| `AZURE_TENANT_ID` | Azure AD tenant ID |
| `AZURE_CLIENT_ID` | Azure AD app client ID |
| `AZURE_CLIENT_SECRET` | Azure AD client secret |
| `SHAREPOINT_SITE_ID` | SharePoint site ID |
| `SHAREPOINT_DRIVE_ID` | SharePoint drive/library ID |
| `SLACK_WEBHOOK_URL` | Slack incoming webhook URL |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key |
| `LANCE_DB_DIR` | LanceDB storage directory |
| `CONFIDENCE_ROUTE_THRESHOLD` | Router threshold (default: 0.8) |
| `BUDGET_DAILY_LIMIT_USD` | Daily budget limit |
## Test
```bash
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
pnpm dev # Next.js dev server
```
## License
MIT — see [LICENSE](./LICENSE).