Skip to content
reaatechREAATECH

Files · OpenAI Knowledge Agent for SMB Employee Onboarding

32 (0 binary, 191.8 kB total)attempt 2

README.md·2824 B·markdown
markdown
# OpenAI Knowledge Agent for SMB Employee Onboarding
 
A persistent AI memory system that ingests onboarding documents, learns company norms, and answers new hires' questions in natural language, powered by OpenAI. Built with Next.js App Router and REAA's agent-memory stack.
 
## Architecture
 
```
Client (Browser)


Next.js App Router
  ├── /api/onboarding/ingest  ──>  processUpload()  ──>  AgentMemory storage
  └── /api/onboarding/chat    ──>  searchForQuestion()
                                   └─> formatAsContext()
                                   └─> askQuestion() (OpenAI)
```
 
The system uses REAA's `agent-memory`, `agent-memory-core`, `agent-memory-embedding`, `agent-memory-retrieval` and `agent-memory-storage` packages as a purpose-built knowledge engine:
 
1. **Ingestion**: Onboarding documents (`.md`, `.txt`) are uploaded via the `/api/onboarding/ingest` endpoint, chunked into segments of ~500 tokens with overlap, and stored as `Memory` objects in a Postgres/pgvector database.
2. **Embedding**: Both OpenAI (`text-embedding-3-small`, 1536 dimensions) and `fastembed` (`BAAI/bge-small-en-v1.5`, 384 dimensions) providers are supported with transparent in-memory caching.
3. **Retrieval**: Semantic search combines embedding similarity with recency ranking to surface the most relevant document chunks.
4. **Answering**: Retrieved context is injected into a GPT-4o-mini prompt with source citations, formatted via `ContextInjector`.
 
## Setup
 
### Prerequisites
 
- Node.js >= 22
- pnpm >= 10
- PostgreSQL with pgvector extension
 
### Environment Variables
 
Copy `.env.example` to `.env` and configure:
 
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key for chat completions and embeddings |
| `DATABASE_URL` | PostgreSQL connection URL (alternative to individual vars) |
| `DB_HOST` | Database host (default: localhost) |
| `DB_PORT` | Database port (default: 5432) |
| `DB_USER` | Database user |
| `DB_PASSWORD` | Database password |
| `DB_NAME` | Database name |
| `EMBEDDING_PROVIDER` | `openai` (default) or `fastembed` |
| `EMBEDDING_MODEL` | Model name (default: `text-embedding-3-small`) |
 
### Install & Run
 
```bash
pnpm install
pnpm dev          # Start Next.js dev server
pnpm typecheck    # TypeScript check
pnpm lint         # ESLint
pnpm test         # Run tests with coverage
```
 
## Dependencies
 
### REAA Packages (exact versions)
- `@reaatech/agent-memory@0.1.0`
- `@reaatech/agent-memory-core@0.1.0`
- `@reaatech/agent-memory-embedding@0.1.0`
- `@reaatech/agent-memory-retrieval@0.1.0`
- `@reaatech/agent-memory-storage@0.1.0`
 
### Third-party (exact versions)
- `openai@6.37.0`
- `pgvector@0.2.1`
- `fastembed@2.1.0`
- `postgres@3.4.9`
- `next@16.2.6` · `react@19.2.6` · `react-dom@19.2.6`