Small business HR teams spend hours answering repetitive policy questions via email or Slack. A simple chatbot would help, but it must retain nuance across many policy documents and understand who is asking to give per‑role answers.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds an HR policy Q&A agent for small and medium-sized businesses. Employees ask questions like “How much PTO do I have?” or “What’s the parental leave policy?” and Grok answers them using policy documents and employee context that persists across sessions.
The agent uses @reaatech/agent-memory-core to maintain employee memory across conversations, @reaatech/agent-memory-storage for PostgreSQL-backed persistence, @reaatech/agent-memory-retrieval to fetch relevant policy paragraphs from a Qdrant vector store, and fastembed for local embedding generation. xAI’s Grok handles reasoning via an OpenAI-compatible API endpoint. All responses are validated against a Zod schema before being returned.
Prerequisites
Node.js >= 22 and pnpm 10
Docker — for running Qdrant vector store and PostgreSQL with pgvector extension
An xAI API key (get one at x.ai) — the Grok endpoint is OpenAI-compatible at https://api.x.ai/v1
A PostgreSQL 15+ instance with pgvector extension enabled
Basic familiarity with Next.js App Router and TypeScript
Step 1: Install dependencies
Start from the scaffolded project and add every required package. The REAA packages are pinned to 0.1.0 exactly — do not use ^ or ~.
terminal
pnpm add @reaatech/agent-memory-core@0.1.0
\
@reaatech/agent-memory-storage@0.1.0 \
@reaatech/agent-memory-retrieval@0.1.0 \
@reaatech/agent-memory-embedding@0.1.0 \
@qdrant/js-client-rest@1.18.0 \
fastembed@2.1.0 \
openai@4.100.0 \
jose@6.0.10 \
zod@4.4.3 \
pdf-parse@2.4.5 \
pg@8.20.0
pnpm add -D @types/pg@8.20.0 msw@2.14.6
Then run pnpm install to lock everything down. You should see all eight REAA packages and every third-party package listed in package.json with exact semver pins.
Expected output:pnpm-lock.yaml contains entries for all packages with no ^, ~, or latest ranges.
Step 2: Configure environment variables
Copy the example file and fill in your credentials. Every secret stays out of source code — the app reads everything from process.env.
terminal
cp .env.example .env
Open .env and set each variable:
env
# xAI — Grok via OpenAI-compatible endpointXAI_API_KEY=<your-xai-api-key># Qdrant — vector store for policy chunksQDRANT_URL=http://localhost:6333QDRANT_API_KEY=<your-qdrant-api-key># PostgreSQL — memory persistence (pgvector extension required)DB_HOST=localhostDB_PORT=5432DB_NAME=agent_memoryDB_USER=postgresDB_PASSWORD=<your-db-password># PostgreSQL schema (defaults to public)DB_SCHEMA=public# JWT — HMAC secret for verifying employee session tokensJWT_SECRET=<your-jwt-secret># OpenAI — optional, only needed for OpenAIEmbeddingProviderOPENAI_API_KEY=<your-openai-key># Policies — collection name and PDF pathCOLLECTION_NAME=hr_policiesPOLICY_PDF_PATH=./policies/handbook.pdf
The NEXT_RUNTIME check in src/instrumentation.ts guards Node-only imports so the app can still run in Edge environments. The instrumentationHook: true flag in next.config.ts activates the hook.
Expected output:.env contains all 13 variables. No real secrets appear in any source file.
Step 3: Define domain types
src/types.ts exports all the TypeScript interfaces and Zod schemas the rest of the codebase relies on. Every type is explicit — no : any.
PolicyAnswerSchema validates Grok’s JSON response. ChatMessageSchema validates incoming requests at the route handler before they reach the service layer.
Expected output:pnpm typecheck shows zero errors after this file is saved.
Step 4: Build the embedding service
src/services/embedding.ts wraps fastembed’s FlagEmbedding in a class that implements the EmbeddingProvider interface from @reaatech/agent-memory-embedding. The CachingEmbeddingDecorator adds an in-memory LRU cache so repeated embeddings are served without calling the model again.
ts
import { EmbeddingModel, FlagEmbedding } from "fastembed";import { type EmbeddingProvider, type ModelInfo } from "@reaatech/agent-memory-embedding";let modelInstance: FlagEmbedding | null = null;async function getEmbeddingModel(): Promise<FlagEmbedding> { if (!modelInstance) { modelInstance = await FlagEmbedding.init({ model: EmbeddingModel.BGESmallENV15 }); } return modelInstance;}export class LocalEmbeddingService implements EmbeddingProvider { async embed(text: string): Promise<number[]> { const model = await getEmbeddingModel(); return model.queryEmbed("query: " + text); } async embedBatch(texts: string[]): Promise<number[][]> { const model = await getEmbeddingModel(); const results: number[][] = []; const generator = model.passageEmbed(texts.map(t => "passage: " + t)); for await (const batch of generator) { for (const vec of batch) { results.push(Array.from(vec)); } } return results; } getModelInfo(): ModelInfo { return { name: "BAAI/bge-small-en-v1.5", dimensions: 384, maxInputLength: 512 }; }}export class CachingEmbeddingDecorator implements EmbeddingProvider { private cache: Map<string, number[]>; private maxSize: number; private inner: EmbeddingProvider; constructor(inner: EmbeddingProvider, maxSize = 1000) { this.inner = inner; this.cache = new Map(); this.maxSize = maxSize; } async embed(text: string): Promise<number[]> { const cached = this.cache.get(text); if (cached !== undefined) return cached; const result = await this.inner.embed(text); this.setCache(text, result); return result; } private setCache(key: string, value: number[]): void { if (this.cache.size >= this.maxSize) { const firstKey = this.cache.keys().next().value; if (firstKey !== undefined) { this.cache.delete(firstKey); } } this.cache.set(key, value); } async embedBatch(texts: string[]): Promise<number[][]> { const results = await this.inner.embedBatch(texts); for (let i = 0; i < texts.length; i++) { this.cache.set(texts[i]!, results[i]!); } return results; } getModelInfo(): ModelInfo { return this.inner.getModelInfo(); }}export class MockEmbeddingService implements EmbeddingProvider { async embed(_text: string): Promise<number[]> { return new Array(384).fill(0); } async embedBatch(texts: string[]): Promise<number[][]> { return texts.map(() => new Array(384).fill(0)); } getModelInfo(): ModelInfo { return { name: "mock", dimensions: 384, maxInputLength: 512 }; }}export function createEmbedder(): EmbeddingProvider { if (process.env.NODE_ENV === "test") { return new MockEmbeddingService(); } return new CachingEmbeddingDecorator(new LocalEmbeddingService());}
The factory createEmbedder() returns MockEmbeddingService in test mode and the cached real embedder in all other environments. Prefix query: for single-text embeddings and passage: for batch embeddings — this is how fastembed distinguishes query-type vs passage-type embeddings.
Expected output:pnpm typecheck passes; pnpm test shows 100 % line coverage on this file.
Step 5: Build the Grok client
src/lib/grok.ts talks to xAI’s Grok through the OpenAI-compatible endpoint. The openai package handles the HTTP transport — you configure it with baseURL: "https://api.x.ai/v1" and call client.chat.completions.create() with model: "grok-3".
The role: "developer" message carries the system prompt, not role: "system" — this is the current convention for Chat Completions. chatWithContext() injects memory context inside <relevant_memories> tags so Grok sees relevant employee history alongside the current question.
Expected output: Tests mock the openai package and verify the developer role, grok-3 model name, and max_tokens: 2048 appear in every request.
Step 6: Build the policies service
src/services/policies.ts handles PDF ingestion and Qdrant vector operations. The parsePDF() function reads a PDF file as a buffer and instantiates PDFParse (the named export from pdf-parse) to extract text. chunkText() splits the text into overlapping segments with UUIDs assigned to each chunk.
ts
import { readFile } from "fs/promises";import { randomUUID } from "crypto";import { PDFParse } from "pdf-parse";import { QdrantClient } from "@qdrant/js-client-rest";import type { EmbeddingProvider } from "@reaatech/agent-memory-embedding";import type { PolicyChunk, IngestResult } from "../types.js";export async function parsePDF(filePath: string): Promise<string> { const buffer = await readFile(filePath); const doc = new PDFParse({ data: buffer });
The collection uses Cosine distance with 384 dimensions — matching the BAAI/bge-small-en-v1.5 model’s output. A 409 error on collection creation is silently swallowed since the collection may already exist from a previous run.
Expected output:chunkText("short") returns one chunk with a defined UUID and metadata.policyName === "Unknown". chunkText("") returns an empty array.
Step 7: Build the memory service
src/services/memory.ts wraps @reaatech/agent-memory-storage for persistence and @reaatech/agent-memory-retrieval for semantic search. In development, it falls back to InMemoryMemoryStorage so the app runs without a live PostgreSQL instance.
getRelevantMemories() uses both SEMANTIC and RECENCY strategies with a diversityFactor of 0.3 to balance relevance with variety in retrieved memories. injectContext() formats the retrieved memories as a context block that gets injected into Grok’s system prompt.
Expected output:getMemories("tenant-1", "nonexistent-owner") returns an empty array. getRelevantMemories() without an embedder throws "EmbeddingProvider required for retrieval".
Step 8: Build the chat service
src/services/chat.ts orchestrates the full pipeline. On each question, it retrieves relevant memories, formats them as context, calls Grok with the system prompt plus context, and validates the response with PolicyAnswerSchema.parse().
ts
import { randomUUID } from "crypto";import { PolicyAnswerSchema } from "../types.js";import type { EmployeeContext, ChatResponse, IngestResult,} from "../types.js";import { MemoryType, MemoryImportance, MemorySource, MemoryLifecycle,} from "@reaatech/agent-memory-core";import type { Memory } from "@reaatech/agent-memory-core";import type { MemoryService } from "./memory.js";import type { GrokClient } from "../lib/grok.js";import type { PoliciesService }
The catch block returns a safe fallback on any error — whether that’s a Grok API timeout, malformed JSON, or a Zod validation failure. Grok is asked to return JSON matching PolicyAnswerSchema, so the parsing step validates the structure before the response goes back to the client.
Expected output: When Grok returns { "answer": "...", "sources": [...] } as a JSON string, answer() returns it with the correct sessionId. When Grok throws or returns non-JSON, answer() returns the safe fallback string and an empty sources array.
Step 9: Wire up JWT authentication
src/lib/auth.ts verifies employee session tokens using jose. Tokens are signed with HMAC-SHA256 using the secret from JWT_SECRET.
ts
import { jwtVerify } from "jose";import type { EmployeeContext } from "../types.js";export async function verifyToken(token: string): Promise<EmployeeContext | null> { try { const secret = process.env.JWT_SECRET; if (!secret) return null; const { payload } = await jwtVerify(token, new TextEncoder().encode(secret)); const employeeId = payload.employeeId as string | undefined; const tenantId = payload.tenantId as string | undefined; const role = payload.role as string | undefined; const tenure = payload.tenure as string | undefined; const name = payload.name as string | undefined; if (!employeeId || !tenantId || !role || !tenure || !name) return null; return { employeeId, tenantId, role, tenure, name }; } catch { return null; }}export function extractBearerToken(authHeader: string | null): string | null { if (!authHeader) return null; const parts = authHeader.split(" "); if (parts.length !== 2 || parts[0] !== "Bearer") return null; return parts[1] || null;}
verifyToken() returns null on any failure — expired token, wrong signature, malformed payload — and the route handlers treat null as a 401 response. The five required claims (employeeId, tenantId, role, tenure, name) are all validated before the function returns an EmployeeContext.
Expected output: A valid JWT with all five claims returns a full EmployeeContext. An expired or malformed token returns null. A missing JWT_SECRET env variable returns null without calling jwtVerify.
Step 10: Write the API route handler
Create the route handler at src/app/api/hr/chat/route.ts. It validates the request body with ChatMessageSchema, extracts and verifies the bearer token, and delegates to ChatService.answer(). Note that the relative import paths go up three levels from src/app/api/hr/chat/ to reach src/.
ts
import { NextRequest, NextResponse } from "next/server";import { ChatMessageSchema } from "../../../src/types.js";import { extractBearerToken, verifyToken } from "../../../src/lib/auth.js";import { ChatService } from "../../../src/services/chat.js";import { MemoryService } from "../../../src/services/memory.js";import { GrokClient } from "../../../src/lib/grok.js";import { PoliciesService } from "../../../src/services/policies.js";import { createEmbedder } from "../../../src/services/embedding.js";const embedder = createEmbedder();const memoryService = new MemoryService(embedder);const grokClient = new GrokClient();const policiesService = new PoliciesService(embedder);const chatService = new ChatService(memoryService, grokClient, policiesService);export async function POST(request: NextRequest): Promise<NextResponse> { try { const body = await request.json(); const parsed = ChatMessageSchema.safeParse(body); if (!parsed.success) { return NextResponse.json({ error: "Invalid request body" }, { status: 400 }); } const token = extractBearerToken(request.headers.get("Authorization")); if (!token) { return NextResponse.json({ error: "Unauthorized" }, { status: 401 }); } const employeeContext = await verifyToken(token); if (!employeeContext) { return NextResponse.json({ error: "Unauthorized" }, { status: 401 }); } const response = await chatService.answer(employeeContext, parsed.data.message, parsed.data.sessionId); return NextResponse.json(response, { status: 200 }); } catch { return NextResponse.json({ error: "Internal error" }, { status: 500 }); }}
Services are instantiated at module level so they persist across requests in the Next.js server runtime. The handler uses named exports (POST) and returns NextResponse.json(...) for every response — this sets the correct content-type: application/json header automatically.
Expected output: A valid POST to /api/hr/chat with a valid JWT and body { employeeId, message } returns 200 with a ChatResponse. Missing or invalid JWT returns 401. Malformed body returns 400. Internal errors return 500 with no leaked details.
Step 11: Add the observability hook
src/lib/observability.ts initializes the console logger for the memory stack. src/instrumentation.ts is the Next.js instrumentation hook — it runs once on server startup and guards Node-only imports with NEXT_RUNTIME === "nodejs".
ts
// src/lib/observability.tsimport { setLogger, getLogger } from "@reaatech/agent-memory-core";export function initLogger(): void { setLogger(console);}export { getLogger };
experimental.instrumentationHook: true in next.config.ts activates this hook. The dynamic import() inside register() ensures Edge runtime code paths never attempt to load Node-only modules.
Expected output:pnpm build succeeds. The instrumentation hook initializes logging without throwing in a production build.
Step 12: Run the tests
The test suite lives in tests/ and mirrors the source layout — one test file per module, with happy path, error path, and boundary cases for each.
terminal
pnpm test
All external HTTP calls (xAI, OpenAI embeddings, Qdrant) are intercepted by MSW handlers in tests/setup.ts. The test setup configures onUnhandledRequest: "error" so any accidental real network call fails the test immediately.
Key test patterns:
vi.mock("openai") — intercepts the Grok client’s HTTP calls
vi.mock("@qdrant/js-client-rest") — provides a fake QdrantClient that responds without network I/O
vi.mock("fastembed") — provides a mocked FlagEmbedding that resolves immediately
Expected output:pnpm test produces vitest-report.json with numFailedTests === 0.
Connect to PostgreSQL and enable the pgvector extension:
terminal
psql -h localhost -U postgres -d agent_memory -c "CREATE EXTENSION IF NOT EXISTS vector;"
Then start the Next.js dev server:
terminal
pnpm dev
Expected output: Next.js starts on port 3000. Qdrant responds at http://localhost:6333. PostgreSQL accepts connections on port 5432.
Next steps
Ingest a handbook: Place a PDF at policies/handbook.pdf and call POST /api/hr/chat to index it into Qdrant via the chat service’s ingestHandbook() method.
Add streaming responses: Use GrokClient.streamChat() to yield tokens as they arrive from Grok instead of waiting for the full response.
Switch to Postgres storage: Set usePostgres = true in MemoryService constructor and configure DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD in .env for persistent cross-session employee memory.
Add role-based filtering: Use the role field in PolicyChunk.metadata to filter retrieved policy chunks by the employee’s role before injecting them into Grok’s context.
const result = await doc.getText();
return result.text;
}
export function chunkText(
text: string,
chunkSize = 500,
overlap = 50,
policyName = "Unknown"
): PolicyChunk[] {
if (!text || chunkSize <= 0) return [];
const chunks: PolicyChunk[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + chunkSize, text.length);
"You are an HR policy assistant. Answer based ONLY on the provided policy context. If the answer is not in the context, say 'I don't have that information.' Be concise and reference specific policy sections.";