Files · OpenAI Knowledge Agent for SMB Employee Onboarding
32 (0 binary, 191.8 kB total)attempt 2
README.md·2824 B·markdown
markdown
# OpenAI Knowledge Agent for SMB Employee Onboarding
A persistent AI memory system that ingests onboarding documents, learns company norms, and answers new hires' questions in natural language, powered by OpenAI. Built with Next.js App Router and REAA's agent-memory stack.
## Architecture
```
Client (Browser)
│
▼
Next.js App Router
├── /api/onboarding/ingest ──> processUpload() ──> AgentMemory storage
└── /api/onboarding/chat ──> searchForQuestion()
└─> formatAsContext()
└─> askQuestion() (OpenAI)
```
The system uses REAA's `agent-memory`, `agent-memory-core`, `agent-memory-embedding`, `agent-memory-retrieval` and `agent-memory-storage` packages as a purpose-built knowledge engine:
1. **Ingestion**: Onboarding documents (`.md`, `.txt`) are uploaded via the `/api/onboarding/ingest` endpoint, chunked into segments of ~500 tokens with overlap, and stored as `Memory` objects in a Postgres/pgvector database.
2. **Embedding**: Both OpenAI (`text-embedding-3-small`, 1536 dimensions) and `fastembed` (`BAAI/bge-small-en-v1.5`, 384 dimensions) providers are supported with transparent in-memory caching.
3. **Retrieval**: Semantic search combines embedding similarity with recency ranking to surface the most relevant document chunks.
4. **Answering**: Retrieved context is injected into a GPT-4o-mini prompt with source citations, formatted via `ContextInjector`.
## Setup
### Prerequisites
- Node.js >= 22
- pnpm >= 10
- PostgreSQL with pgvector extension
### Environment Variables
Copy `.env.example` to `.env` and configure:
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key for chat completions and embeddings |
| `DATABASE_URL` | PostgreSQL connection URL (alternative to individual vars) |
| `DB_HOST` | Database host (default: localhost) |
| `DB_PORT` | Database port (default: 5432) |
| `DB_USER` | Database user |
| `DB_PASSWORD` | Database password |
| `DB_NAME` | Database name |
| `EMBEDDING_PROVIDER` | `openai` (default) or `fastembed` |
| `EMBEDDING_MODEL` | Model name (default: `text-embedding-3-small`) |
### Install & Run
```bash
pnpm install
pnpm dev # Start Next.js dev server
pnpm typecheck # TypeScript check
pnpm lint # ESLint
pnpm test # Run tests with coverage
```
## Dependencies
### REAA Packages (exact versions)
- `@reaatech/agent-memory@0.1.0`
- `@reaatech/agent-memory-core@0.1.0`
- `@reaatech/agent-memory-embedding@0.1.0`
- `@reaatech/agent-memory-retrieval@0.1.0`
- `@reaatech/agent-memory-storage@0.1.0`
### Third-party (exact versions)
- `openai@6.37.0`
- `pgvector@0.2.1`
- `fastembed@2.1.0`
- `postgres@3.4.9`
- `next@16.2.6` · `react@19.2.6` · `react-dom@19.2.6`