OpenAI Knowledge Agent for SMB Employee Onboarding
A persistent AI memory system that ingests onboarding docs, learns company norms, and answers new hires' questions in natural language, powered by OpenAI.
SMBs rely on a handful of people to onboard new hires, but that knowledge is trapped in scattered PDFs, Slack messages, and tribal memory. New employees waste weeks hunting for answers while senior staff are pulled from revenue work.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build an AI onboarding assistant that ingests company documents, stores them as searchable vector embeddings, and answers new-hire questions with cited sources using OpenAI. By the end, you’ll have a working Next.js application with document upload, semantic search, and a chat interface — all backed by REAA’s agent-memory stack for persistent knowledge storage.
Prerequisites
Node.js >= 22
pnpm >= 10
PostgreSQL with the pgvector extension installed
An OpenAI API key (for chat completions and embeddings)
Familiarity with TypeScript and Next.js App Router routing
Step 1: Scaffold the project and install dependencies
Create an empty directory and set up the package.json with all required dependencies. The project uses Next.js 16, the REAA agent-memory packages, the OpenAI SDK, fastembed for local embeddings, and pgvector for vector storage.
Expected output: pnpm downloads all packages and creates node_modules/ and pnpm-lock.yaml. No errors.
Step 2: Configure TypeScript, Next.js, ESLint, and Vitest
Set up the TypeScript compiler, Next.js config, ESLint with strict type-checked rules, and Vitest for testing. The project targets ES2022 with NodeNext module resolution.
Expected output:tsc --noEmit exits with code 0 and no errors.
Step 3: Set up environment variables
The application reads its API key, database connection, and embedding configuration from environment variables. Create a template file with all required keys.
Create .env.example:
env
# OpenAI API key for chat completions and embeddingsOPENAI_API_KEY=<your-openai-api-key># PostgreSQL connection URL (alternative to individual connection vars)DATABASE_URL=postgres://postgres:***@localhost:5432/agent_memory# Database connection (individual vars)DB_HOST=localhostDB_PORT=5432DB_USER=postgresDB_PASSWORD=<your-db-password>DB_NAME=agent_memory# Embedding provider: "openai" or "fastembed"EMBEDDING_PROVIDER=openai# Embedding model name (used when EMBEDDING_PROVIDER=openai)# For openai: "text-embedding-3-small" (default, 1536 dims) or "text-embedding-3-large"# For fastembed: ignored (uses BAAI/bge-small-en-v1.5, 384 dims)EMBEDDING_MODEL=text-embedding-3-small
Copy it to .env.local and fill in your values:
terminal
cp .env.example .env.local
Open .env.local and replace <your-openai-api-key> with your real OpenAI API key, and <your-db-password> with your PostgreSQL password.
Before moving on, make sure your PostgreSQL database has the pgvector extension enabled:
terminal
psql -U postgres -d agent_memory -c "CREATE EXTENSION IF NOT EXISTS vector;"
Expected output:CREATE EXTENSION if the extension was newly created, or a notice that it already exists.
Step 4: Build the AgentMemory singleton
This module creates and manages a singleton AgentMemory instance backed by PostgreSQL with pgvector. It reads connection parameters from environment variables and sets up the extraction layer to process facts and preferences.
getMemory() returns the same instance on every call. createAgentMemory() reads env vars for database host, port, credentials, and the OpenAI embedding model, then wires them into the AgentMemory constructor.
Step 5: Build the embedding providers
The embedding module supports two providers: OpenAI’s text-embedding-3-small (1536 dimensions) and fastembed’s BAAI/bge-small-en-v1.5 (384 dimensions, runs locally). Both are wrapped in a CachedEmbeddingProvider with an in-memory cache to avoid redundant API calls.
getEmbeddingProvider() reads EMBEDDING_PROVIDER from the environment and instantiates either OpenAI or fastembed, then wraps it with a 1,000-entry in-memory cache that expires entries after 60 seconds. The FastembedProvider lazily initializes the model on first use.
Step 6: Build the document ingestion pipeline
The ingestion module takes a file buffer, splits it into overlapping chunks at paragraph or sentence boundaries, embeds each chunk, and stores the resulting Memory objects in PostgreSQL. It accepts .md, .txt, and .markdown files.
Create src/lib/ingestion.ts:
ts
import { randomUUID } from "node:crypto";import { MemoryType, MemorySource, MemoryImportance, MemoryLifecycle, type Memory,} from "@reaatech/agent-memory-core";import { getMemory } from "./memory.js";import { getEmbeddingProvider } from "./embedding.js";const CHUNK_SIZE = 500;const CHUNK_OVERLAP = 50;const SUPPORTED_EXTENSIONS = new Set([".md", ".txt", ".markdown"]);function splitTextIntoChunks(text: string): string[] { if (text.length === 0) { return []; } const chunks: string[] = []; let startIndex = 0; while (startIndex < text.length) { let endIndex = startIndex + CHUNK_SIZE; if (endIndex >= text.length) { chunks.push(text.slice(startIndex)); break; } // Try to break at a paragraph boundary const searchWindow = text.slice(startIndex, endIndex + CHUNK_OVERLAP); const paragraphBreak = searchWindow.lastIndexOf("\n\n"); if ( paragraphBreak >= CHUNK_SIZE - 100 && paragraphBreak <= CHUNK_SIZE + CHUNK_OVERLAP ) { endIndex = startIndex + paragraphBreak; } else { // Try to break at a sentence boundary const sentenceBreak = searchWindow.lastIndexOf(". "); if ( sentenceBreak >= CHUNK_SIZE - 100 && sentenceBreak <= CHUNK_SIZE + CHUNK_OVERLAP ) { endIndex = startIndex + sentenceBreak + 1; } } chunks.push(text.slice(startIndex, endIndex)); startIndex = endIndex - CHUNK_OVERLAP; } return chunks;}export async function processUpload( fileBuffer: Buffer, fileName: string,): Promise<number> { // Validate file extension const ext = fileName.toLowerCase().slice(fileName.lastIndexOf(".")); if (!SUPPORTED_EXTENSIONS.has(ext)) { throw new Error( `Unsupported file extension "${ext}". Supported: ${Array.from(SUPPORTED_EXTENSIONS).join(", ")}`, ); } // Validate non-empty if (fileBuffer.length === 0) { return 0; } const text = fileBuffer.toString("utf-8"); const chunks = splitTextIntoChunks(text); const tenantId = "default"; const ownerId = "default"; const memory = getMemory(); const storage = memory.getStorage(); const embedder = getEmbeddingProvider(); const modelInfo = embedder.getModelInfo(); for (const chunkText of chunks) { // Embed the chunk content to get a real vector const vector = await embedder.embed(chunkText); const memoryObj: Memory = { id: randomUUID(), tenantId, ownerId, content: chunkText, type: MemoryType.FACT, category: "onboarding", source: MemorySource.USER_STATEMENT, importance: MemoryImportance.MEDIUM, confidence: 1.0, tags: ["onboarding", fileName], lifecycle: MemoryLifecycle.ACTIVE, createdAt: new Date(), updatedAt: new Date(), lastAccessedAt: new Date(), embeddings: { vector, model: modelInfo.name, dimensions: modelInfo.dimensions, }, version: 1, history: [], }; await storage.create(memoryObj); } return chunks.length;}
Each chunk is 500 characters with a 50-character overlap, ensuring context continuity across chunk boundaries. The chunker prefers paragraph breaks (\n\n) and falls back to sentence breaks (. ). Each Memory object stores the embedded vector, source filename as a tag, and lifecycle metadata.
Step 7: Build the memory retrieval pipeline
The retriever module wraps REAA’s MemoryRetriever to search stored memories using semantic similarity and recency ranking. A ContextInjector formats retrieved memories into a structured context string for LLM prompts.
Create src/lib/retriever.ts:
ts
import { MemoryRetriever, ContextInjector, RetrievalStrategy,} from "@reaatech/agent-memory-retrieval";import type { Memory } from "@reaatech/agent-memory-core";import type { MemoryStorage } from "@reaatech/agent-memory-storage";import type { EmbeddingProvider } from "@reaatech/agent-memory-embedding";import { getMemory } from "./memory.js";import { getEmbeddingProvider } from "./embedding.js";let retrieverInstance: MemoryRetriever | null = null;export function createRetriever( storage?: MemoryStorage, embedder?: EmbeddingProvider,): MemoryRetriever { const resolvedStorage: MemoryStorage = storage ?? getMemory().getStorage(); const resolvedEmbedder = embedder ?? getEmbeddingProvider(); const retriever = new MemoryRetriever(resolvedStorage, resolvedEmbedder, { defaultLimit: 5, useCrossEncoder: false, diversityFactor: 0.3, strategies: [RetrievalStrategy.SEMANTIC, RetrievalStrategy.RECENCY], }); return retriever;}export function getRetriever(): MemoryRetriever { if (!retrieverInstance) { retrieverInstance = createRetriever(); } return retrieverInstance;}export async function searchForQuestion( question: string, topK: number = 5,): Promise<Memory[]> { if (!question || question.trim().length === 0) { return []; } const retriever = getRetriever(); const memories = await retriever.retrieve(question, { limit: topK }); return memories;}export async function formatAsContext(memories: Memory[]): Promise<string> { if (memories.length === 0) { return "No relevant memories found."; } const injector = new ContextInjector(); const context = await injector.injectMemoriesIntoContext([], memories); return context;}
The retriever combines semantic similarity (vector search via pgvector) with recency ranking and returns up to 5 results by default. formatAsContext uses REAA’s ContextInjector to format memories as tagged entries with confidence scores and dates.
Step 8: Build the chat module
The chat module wraps the OpenAI chat completions API with a system prompt that instructs the model to answer only from provided context and cite sources. It includes a non-streaming askQuestion function and a streaming askQuestionStream generator.
Create src/lib/chat.ts:
ts
import OpenAI from "openai";export class ChatError extends Error { statusCode: number; constructor(message: string, statusCode: number = 500) { super(message); this.name = "ChatError"; this.statusCode = statusCode; }}const SYSTEM_PROMPT = "You are an onboarding assistant for new employees. Answer questions using ONLY the provided context below. If the context does not contain relevant information, politely say you don't have that information yet. Cite sources by referencing [Source: filename] when available.";const MAX_CONTEXT_CHARS = 8000;function getClient(): OpenAI { const apiKey = process.env["OPENAI_API_KEY"]; if (!apiKey) { throw new ChatError("OPENAI_API_KEY is not set", 500); } return new OpenAI({ apiKey });}function truncateContext(context: string): string { if (context.length > MAX_CONTEXT_CHARS) { return context.slice(0, MAX_CONTEXT_CHARS) + "...[truncated]"; } return context;}export async function askQuestion( question: string, context: string,): Promise<string> { if (!question || question.trim().length === 0) { throw new ChatError("Question cannot be empty", 400); } const client = getClient(); const truncatedContext = truncateContext(context); try { const completion = await client.chat.completions.create({ model: "gpt-4o-mini", temperature: 0.2, messages: [ { role: "system", content: SYSTEM_PROMPT }, { role: "user", content: `Context:\n${truncatedContext}\n\nQuestion: ${question}`, }, ], }); const choice = completion.choices[0]; if (!choice) { return ""; } return choice.message.content ?? ""; } catch (err: unknown) { if (err instanceof OpenAI.APIError) { const statusCode: number = (err.status as number | undefined) ?? 500; throw new ChatError( `OpenAI API error (${String(err.status)}): ${err.message}`, statusCode, ); } throw err; }}export async function* askQuestionStream( question: string, context: string,): AsyncGenerator<string> { if (!question || question.trim().length === 0) { throw new ChatError("Question cannot be empty", 400); } const client = getClient(); const truncatedContext = truncateContext(context); const stream = await client.chat.completions.create({ model: "gpt-4o-mini", temperature: 0.2, messages: [ { role: "system", content: SYSTEM_PROMPT }, { role: "user", content: `Context:\n${truncatedContext}\n\nQuestion: ${question}`, }, ], stream: true, }); for await (const chunk of stream) { const delta = chunk.choices[0]?.delta; if (delta?.content) { yield delta.content; } }}
The ChatError class carries an HTTP status code so route handlers can map errors to appropriate responses. The temperature is set to 0.2 for consistent, factual answers, and context is truncated at 8,000 characters to stay within token limits.
Step 9: Create the API routes
Two route handlers power the backend. The ingest route accepts file uploads and pushes them through the document ingestion pipeline. The chat route accepts a question, retrieves relevant memories, formats them as context, and calls OpenAI.
Both routes validate input and return structured JSON error objects. The ingest route validates file type and limits file size to 10 MB before calling processUpload. The chat route creates a pipeline: search for relevant memories, format them as context, ask OpenAI, and return the answer with source snippets.
Step 10: Add server instrumentation
The instrumentation hook runs at startup in the Node.js runtime. First, update next.config.ts to enable the instrumentation hook — without this flag, the register() function in src/instrumentation.ts will never fire.
Update next.config.ts:
ts
import type { NextConfig } from "next";const nextConfig: NextConfig = { experimental: { instrumentationHook: true, },};export default nextConfig;
Now create the observability stub and instrumentation module. The instrumentation initializes observability and eagerly creates the AgentMemory singleton so the database connection is ready before the first request arrives.
Create src/observability.ts:
ts
export function initObservability(): void { // Stub: designed for OpenTelemetry or console logging later console.log("[observability] Agent memory observability initialized");}
The dynamic import() calls ensure that Node-only dependencies are only loaded in the Node.js runtime, avoiding errors in the Edge runtime. The getMemory() call at startup warms up the database connection pool so the first user request doesn’t incur a cold-start penalty.
Step 11: Build the chat UI
The frontend is a single-page React client component with a chat window, a message input, and a file upload area for onboarding documents. It communicates with both API routes.
Create app/layout.tsx:
tsx
import type { Metadata } from "next";export const metadata: Metadata = { title: "Onboarding Assistant", description: "AI-powered onboarding assistant for new employees",};export default function RootLayout({ children,}: { children: React.ReactNode;}) { return ( <html lang="en"> <body>{children}</body> </html> );}
Expected output: Next.js starts on http://localhost:3000. Open it in your browser — you’ll see the Onboarding Assistant chat interface with a file upload widget and a text input. Upload a Markdown file with onboarding content, then ask a question. The assistant responds with answers and green source tags showing which document chunks were used.
Step 12: Run the tests
The test suite uses Vitest with MSW to mock OpenAI API calls. A setup file configures MSW handlers that return fixed responses for chat completions and embeddings, so tests run without hitting the real API.
Expected output: All 47 tests across 7 test suites pass. The coverage report shows at least 90% coverage across lines, branches, functions, and statements. You’ll see output like:
Add multi-tenancy support by making tenantId dynamic per organization, isolating onboarding documents between different SMBs.
Extend file format support to .pdf and .docx by adding extraction libraries and adjusting the chunker to handle structured documents.
Deploy the agent to production with connection pooling, rate limiting on the chat endpoint, and persistent conversation history for multi-turn dialogue.