Skip to content
reaatechREAATECH

Files · Anthropic RAG Pipeline for SharePoint Knowledge Search

100 (1 binary, 663.4 kB total)attempt 1

README.md·2554 B·markdown
markdown
# Anthropic RAG Pipeline for SharePoint Knowledge Search
 
> Enable SMB teams to find answers across SharePoint document libraries using a hybrid RAG system with human escalation.
 
A production-grade hybrid retrieval-augmented generation pipeline that indexes SharePoint document libraries, combining BM25 keyword search and vector similarity for precise answers. Built with the `@reaatech/*` package family.
 
## Prerequisites
 
- Node.js >= 22
- pnpm 10+
- Azure AD app registration (for SharePoint Graph API access)
- Anthropic API key
- VoyageAI API key
- Langfuse account (for observability)
 
## Setup
 
```bash
pnpm install
cp .env.example .env
# Fill in your environment variables
pnpm dev
```
 
## Architecture
 
```
User Query → POST /api/chat
  → Cache check (@reaatech/llm-cache)
  → Budget pre-check (@reaatech/agent-budget-middleware)
  → Hybrid search (LanceDB + BM25/vector)
  → Prompt building
  → Claude answer generation (@anthropic-ai/sdk)
  → Confidence routing (@reaatech/confidence-router)
    → ROUTE: Return answer with sources
    → CLARIFY: Request more details
    → FALLBACK: Slack alert via webhook
 
SharePoint Sync → POST /api/ingest
  → SharePoint Connector (@microsoft/microsoft-graph-client)
  → Document parsing (pdfjs-dist, mammoth)
  → Text chunking (@reaatech/hybrid-rag)
  → Embedding (VoyageAI)
  → Vector store (LanceDB)
```
 
## API
 
### `POST /api/chat`
 
```json
{ "query": "What is the Q3 forecast?", "useCase": "finance", "temperature": 0.3 }
```
 
### `POST /api/ingest`
 
```json
{ "since": "2024-01-01T00:00:00Z" }
```
 
## Environment Variables
 
See `.env.example` for all required variables.
 
| Variable | Description |
|----------|-------------|
| `ANTHROPIC_API_KEY` | Claude API key |
| `VOYAGE_API_KEY` | VoyageAI embedding API key |
| `AZURE_TENANT_ID` | Azure AD tenant ID |
| `AZURE_CLIENT_ID` | Azure AD app client ID |
| `AZURE_CLIENT_SECRET` | Azure AD client secret |
| `SHAREPOINT_SITE_ID` | SharePoint site ID |
| `SHAREPOINT_DRIVE_ID` | SharePoint drive/library ID |
| `SLACK_WEBHOOK_URL` | Slack incoming webhook URL |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key |
| `LANCE_DB_DIR` | LanceDB storage directory |
| `CONFIDENCE_ROUTE_THRESHOLD` | Router threshold (default: 0.8) |
| `BUDGET_DAILY_LIMIT_USD` | Daily budget limit |
 
## Test
 
```bash
pnpm test            # vitest run with coverage
pnpm typecheck       # TypeScript type checking
pnpm lint            # ESLint
pnpm dev             # Next.js dev server
```
 
## License
 
MIT — see [LICENSE](./LICENSE).