Small businesses store SOPs, policies, and tribal knowledge in Confluence, but employees waste hours searching across spaces. The built‑in search is keyword‑based and misses the context of real questions.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building an OpenAI-powered knowledge agent that answers questions about your Confluence wiki content. You’ll create a Next.js API that crawls Confluence spaces, converts pages to Markdown, embeds them into a Qdrant vector store, and responds to natural-language questions using OpenAI’s GPT models — with semantic caching, session continuity, confidence-based routing, and Langfuse observability along the way.
It uses the REAA stack (confidence-router, llm-cache, session-continuity, agent-memory-core, agent-handoff, agents-markdown) plus Qdrant for vector search, Zod for config validation, and MSW for testing.
This tutorial is for TypeScript developers familiar with Next.js App Router and basic RAG concepts.
Prerequisites
Node.js 22+ and pnpm 10
An OpenAI API key with access to gpt-5.2 and text-embedding-3-small
A Confluence instance (Cloud or Server) with an API token
A Qdrant instance (local via Docker, or cloud at cloud.qdrant.io)
A Langfuse account (optional — observability is a no-op when credentials are absent)
A Next.js App Router project with these dependencies already installed:
Create a .env.example with placeholder values for every integration. Each variable maps to an external service the agent talks to.
env
# Env vars used by openai-knowledge-agent-for-confluence-smb-internal-wiki.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentOPENAI_API_KEY=<your-openai-api-key>CONFLUENCE_BASE_URL=<https://your-instance.atlassian.net/wiki>CONFLUENCE_USERNAME=<your-confluence-username>CONFLUENCE_API_TOKEN=<your-confluence-api-token>CONFLUENCE_SPACE_KEYS=<comma-separated-space-keys>QDRANT_URL=<http://127.0.0.1:6333>QDRANT_API_KEY=<your-qdrant-api-key>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=<https://cloud.langfuse.com>
Copy this to a local .env file and fill in real values. The Confluence API token is generated from your Atlassian account settings (not your password). The CONFLUENCE_SPACE_KEYS field accepts comma-separated values like ENG,HR,OPS.
Step 2: Create the typed configuration with Zod
Create src/lib/config.ts — a Zod schema that validates every environment variable at import time. This catches misconfiguration immediately rather than failing mid-request.
The schema uses z.url() for URLs (rejects strings like localhost without a protocol) and transforms the comma-separated space keys into a cleaned array. Any missing or invalid variable throws a ConfigError with a descriptive message listing every issue.
Step 3: Build the OpenAI client
Create src/lib/openai-client.ts — the gateway for generating answers (using the Responses API) and embedding text. Both functions are wrapped with withRetry from @reaatech/agent-memory-core to handle transient failures.
Expected output: An OpenAI client singleton with two retry-safe functions. generateAnswer uses the Responses API (gpt-5.2) and generateEmbedding produces 1536-dimensional vectors.
Step 4: Build the Confluence crawler
Create src/lib/confluence-client.ts — a REST client that iterates every page in the configured Confluence spaces using basic auth and the body.storage expand to get HTML content.
ts
// src/lib/confluence-client.tsimport { NodeHtmlMarkdown } from "node-html-markdown";import { config } from "./config.js";const auth = Buffer.from( `${config.confluenceUsername}:${config.confluenceApiToken}`,).toString("base64");export interface ConfluencePage { id: string; title: string; spaceKey: string; body: string;}export class ConfluenceAuthError extends Error { constructor(message: string) { super(message); this.name = "ConfluenceAuthError"; }}const nhm = new NodeHtmlMarkdown();export function htmlToMarkdown(html: string): string { return nhm.translate(html);}export async function fetchAllPages( spaceKeys: string[],): Promise<ConfluencePage[]> { const pages: ConfluencePage[] = []; for (const spaceKey of spaceKeys) { let nextUrl: string | null = `${config.confluenceBaseUrl}/rest/api/content?spaceKey=${spaceKey}&expand=body.storage&limit=50`; while (nextUrl) { const response = await fetch(nextUrl, { headers: { Authorization: `Basic ${auth}`, Accept: "application/json", }, }); if (response.status === 401 || response.status === 403) { throw new ConfluenceAuthError( `Authentication failed for Confluence at ${config.confluenceBaseUrl}`, ); } if (!response.ok) { throw new Error( `Confluence API error: ${String(response.status)} ${response.statusText}`, ); } const data = (await response.json()) as { results: Array<{ id: string; title: string; space?: { key: string }; body?: { storage?: { value: string } }; }>; _links?: { next?: string }; }; for (const result of data.results) { pages.push({ id: result.id, title: result.title, spaceKey, body: result.body?.storage?.value ?? "", }); } nextUrl = data._links?.next ? `${config.confluenceBaseUrl}${data._links.next}` : null; } } return pages;}
The NodeHtmlMarkdown instance is created once at module scope and reused to convert HTML page bodies into clean Markdown — the format that the validation and chunking steps expect.
Step 5: Validate and chunk markdown content
Create two modules that prepare ingested content for vector storage.
src/lib/markdown-validator.ts — validates that a page’s content is usable before you spend tokens on embedding it:
ts
// src/lib/markdown-validator.tsimport type { ValidationResult, Finding } from "@reaatech/agents-markdown";export function validatePageContent( markdown: string, pageId: string,): ValidationResult { const errors: Finding[] = []; const warnings: Finding[] = []; const suggestions: Finding[] = []; if (!markdown || markdown.trim().length === 0) { errors.push({ rule: "empty-content", severity: "error", message: "Page content is empty or contains only whitespace", autoFixable: false, }); } if (!markdown.startsWith("---\n")) { warnings.push({ rule: "missing-frontmatter", severity: "warning", message: "Page content is missing frontmatter (does not start with ---\\n)", autoFixable: false, }); } if (markdown.trim().length < 10) { warnings.push({ rule: "very-short-content", severity: "warning", message: "Page content is very short (less than 10 characters)", autoFixable: false, }); } return { valid: errors.length === 0, type: "skill", path: pageId, errors, warnings, suggestions, };}
src/lib/chunker.ts — splits validated markdown into smaller pieces that fit within embedding context windows:
ts
// src/lib/chunker.tsexport function chunkDocument( markdown: string, maxTokens: number = 512,): string[] { const paragraphs = markdown.split("\n\n"); const chunks: string[] = []; let currentChunk: string[] = []; let currentTokens = 0; for (const paragraph of paragraphs) { const paragraphTokens = Math.ceil(paragraph.length / 4); if (currentTokens + paragraphTokens > maxTokens && currentChunk.length > 0) { chunks.push(currentChunk.join("\n\n")); currentChunk = []; currentTokens = 0; } currentChunk.push(paragraph); currentTokens += paragraphTokens; } if (currentChunk.length > 0) { chunks.push(currentChunk.join("\n\n")); } return chunks.filter((chunk) => chunk.trim().length > 0);}
The chunker splits on paragraph boundaries (\n\n) and uses approximate token counting (text.length / 4). It recombines paragraphs until the next one would push the chunk past the maxTokens limit.
Step 6: Build the Qdrant vector store adapter
Create src/lib/vector-store.ts — the adapter that manages a Qdrant collection for storing and searching document chunks by their embeddings.
The adapter provides four operations: creating a client, ensuring a collection exists (creates it if missing), upserting chunk embeddings with metadata, and searching by vector similarity. It uses randomId() from @reaatech/agents-markdown to generate point IDs.
Step 7: Wire up LLM response caching
Create src/lib/cache.ts — an in-memory semantic cache that avoids regenerating answers for queries the agent has already seen.
The cache uses the same embedding model as your ingestion pipeline. The semantic similarity threshold of 0.8 means conceptually similar questions (like “What is the time-off policy?” vs “How many PTO days do we get?”) match the same cached entry.
Step 8: Implement session storage and management
The agent needs conversation memory. Create an in-memory adapter that implements the IStorageAdapter interface from @reaatech/session-continuity.
src/lib/session-store.ts — a full IStorageAdapter implementation using Map<string, Session> and Map<string, Message[]>. This file is ~270 lines with 13 methods; the excerpt below shows the core structure. The complete file includes full implementations of updateSession (with optimistic concurrency), listSessions (with filtering), getMessages (with ordering and pagination), updateMessage, deleteMessage, deleteAllMessages, getExpiredSessions, health, and close:
ts
// src/lib/session-store.tsimport { type IStorageAdapter, type TokenCounter, type Session, type Message, type UpdateSessionOptions, ConcurrencyError,} from "@reaatech/session-continuity";export class InMemoryStorageAdapter implements IStorageAdapter { private sessions: Map<string, Session> = new Map(); private messages: Map<string, Message[]> = new Map(); createSession( session: Omit<Session, "id" | "createdAt" | "lastActivityAt">, ): Promise<Session> { const id = crypto.randomUUID(); const now = new Date(); const newSession: Session = { ...session, id, createdAt: now, lastActivityAt: now, version: 1, }; this.sessions.set(id, newSession); this.messages.set(id, []); return Promise.resolve(newSession); } getSession(id: string): Promise<Session | null> { return Promise.resolve(this.sessions.get(id) ?? null); } updateSession( id: string, updates: Partial<Session>, options?: UpdateSessionOptions, ): Promise<Session> { const existing = this.sessions.get(id); if (!existing) { throw new Error(`Session not found: ${id}`); } if ( options?.expectedVersion !== undefined && existing.version !== undefined && existing.version !== options.expectedVersion ) { throw new ConcurrencyError( id, options.expectedVersion, existing.version, ); } const updated: Session = { ...existing, ...updates, id: existing.id, createdAt: existing.createdAt, version: (existing.version ?? 0) + 1, }; this.sessions.set(id, updated); return Promise.resolve(updated); } // ... deleteSession, listSessions, addMessage, getMessages, // updateMessage, deleteMessage, deleteAllMessages, // getExpiredSessions, health, close}export class SimpleTokenCounter implements TokenCounter { readonly model = "simple"; readonly tokenizer = "character-count"; count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { if (typeof msg.content === "string") { total += this.count(msg.content); } else if (Array.isArray(msg.content)) { for (const part of msg.content) { if (part.type === "text") { total += this.count(part.text); } } } } return total; }}
src/lib/session-manager.ts — wraps SessionManager from @reaatech/session-continuity with the in-memory adapter and token counter:
ts
// src/lib/session-manager.tsimport { SessionManager, type Session, type Message } from "@reaatech/session-continuity";import { InMemoryStorageAdapter, SimpleTokenCounter,} from "./session-store.js";const storage = new InMemoryStorageAdapter();const tokenCounter = new SimpleTokenCounter();export const sessionManager = new SessionManager({ storage, tokenCounter, tokenBudget: { maxTokens: 4096, reserveTokens: 500, overflowStrategy: "compress", }, compression: { strategy: "sliding_window", targetTokens: 3500, },});export async function getOrCreateSession( userId: string, sessionId?: string,): Promise<Session> { if (sessionId) { const existing = await storage.getSession(sessionId); if (existing) return existing; } const sessions = await storage.listSessions({ userId }); if (sessions.length > 0) { return sessions[0]; } return sessionManager.createSession({ userId });}export async function addMessage( sessionId: string, role: string, content: string,): Promise<Message> { return sessionManager.addMessage(sessionId, { role: role as "user" | "assistant" | "system" | "tool", content, });}export async function getContext( sessionId: string,): Promise<Message[]> { return sessionManager.getConversationContext(sessionId);}
The session manager compresses conversation history when it exceeds 4096 tokens using a sliding window strategy that keeps the most recent 3500 tokens.
src/lib/confidence-router.ts — routes queries based on intent confidence. High-confidence queries go to the RAG pipeline; mid-confidence triggers a clarification prompt; low-confidence falls back to human search escalation:
ts
// src/lib/confidence-router.tsimport { ConfidenceRouter, type RoutingDecision } from "@reaatech/confidence-router";export const router = new ConfidenceRouter({ routeThreshold: 0.8, fallbackThreshold: 0.3, clarificationEnabled: true,});export function routeQuery( predictions: Array<{ label: string; confidence: number }>,): RoutingDecision { return router.decide({ predictions });}export async function processQuery( query: string,): Promise<RoutingDecision> { return router.process(query);}
src/lib/handoff.ts — constructs the escalation payload when the agent cannot answer:
ts
// src/lib/handoff.tsimport { createHandoffConfig, HandoffError, withRetry, type HandoffPayload,} from "@reaatech/agent-handoff";export function buildHandoffPayload(context: string): HandoffPayload { const config = createHandoffConfig({ routing: { minConfidenceThreshold: 0.6 }, }); return { handoffId: crypto.randomUUID(), sessionId: "", conversationId: "", sessionHistory: [], compressedContext: { summary: context, keyFacts: [], intents: [], entities: [], openItems: [], compressionMethod: "none", originalTokenCount: 0, compressedTokenCount: 0, compressionRatio: 1, }, handoffReason: { type: "confidence_too_low", currentConfidence: 0, threshold: config.routing.minConfidenceThreshold, message: context, }, userMetadata: { userId: "" }, conversationState: { resolvedEntities: {}, openQuestions: [], contextVariables: {}, }, createdAt: new Date(), };}export async function escalateToHumanSearch( query: string,): Promise<string> { try { return await withRetry(() => { return Promise.resolve(`Your question "${query}" has been escalated to a human search agent. They will follow up with you soon.`); }, { maxRetries: 3, backoff: "exponential", baseDelayMs: 100, maxDelayMs: 5000, shouldRetry: () => false, }); } catch (error) { throw new HandoffError( error instanceof Error ? error.message : "Unknown error", "routing_error", ); }}
src/lib/fallback.ts — orchestrates the low-confidence decision with proper CLARIFY/FALLBACK branching:
Create src/jobs/ingest.ts — the pipeline that crawls your Confluence spaces, converts pages to Markdown, validates and chunks them, embeds each chunk, and stores everything in Qdrant.
The pipeline handles failures gracefully: failed pages are collected in the errors array rather than aborting the entire run. If a single chunk fails to embed, it’s skipped but the rest of the page continues.
Step 12: Create the chat API route and health check
Health route at app/api/health/route.ts:
ts
// app/api/health/route.tsimport { type NextRequest, NextResponse } from "next/server";export async function GET(_req: NextRequest): Promise<NextResponse> { void _req; await Promise.resolve(); return NextResponse.json({ status: "ok" });}
Chat route at app/api/chat/route.ts — the main question-answering endpoint:
ts
// app/api/chat/route.tsimport { type NextRequest, NextResponse } from "next/server";import { config } from "../../../src/lib/config.js";import { generateAnswer, generateEmbedding } from "../../../src/lib/openai-client.js";import { createQdrantClient, searchChunks } from "../../../src/lib/vector-store.js";import { routeQuery } from "../../../src/lib/confidence-router.js";import { escalateToHumanSearch } from "../../../src/lib/handoff.js";import { createCacheEngine, checkCache, storeCache } from "../../../src/lib/cache.js";import { getOrCreateSession, addMessage, getContext } from "../../../src/lib/session-manager.js";import { traceQuery } from "../../../src/lib/telemetry.js";const
The route handler has a clear pipeline: parse and validate the JSON body, get or create a session, route the query through the confidence router, return CLARIFY or FALLBACK responses if needed, check the semantic cache, return cached answer if found, embed the query and search Qdrant for relevant chunks, retrieve conversation context, build the augmented prompt, generate the answer via OpenAI, cache the response, store the message, trace to Langfuse, and return the answer with source references.
Step 13: Create the barrel export
Replace the placeholder src/index.ts with clean barrel exports:
ts
// src/index.tsexport { runIngestion } from "./jobs/ingest.js";export { router, routeQuery, processQuery } from "./lib/confidence-router.js";export { createCacheEngine, checkCache, storeCache } from "./lib/cache.js";export { sessionManager, getOrCreateSession, addMessage, getContext } from "./lib/session-manager.js";
Step 14: Set up MSW test infrastructure
Create tests/setup.ts — Mock Service Worker handlers for OpenAI, Confluence, Qdrant, and Langfuse. Every test imports this, so external HTTP calls never leave the process.
Both should exit with zero errors and zero warnings.
Next steps
Swap in a persistent backend — replace InMemoryStorageAdapter and InMemoryAdapter with Redis or SQLite for durable session and cache storage across server restarts
Add a web UI — build a chat interface using server components that calls POST /api/chat and renders the answer with source badges
Improve the confidence router — replace the hardcoded { label: "qa", confidence: 0.92 } with a real intent classifier that provides actual predictions
Schedule ingestion — wrap runIngestion() in a cron job or Next.js server action to keep the vector store in sync with Confluence changes
Add re-ranking — insert a cross-encoder (like Cohere rerank) between Qdrant search and the LLM prompt to improve retrieval quality
qdrant
=
createQdrantClient
();
let cacheEngine: ReturnType<typeof createCacheEngine> | null = null;
function getCacheEngine(): ReturnType<typeof createCacheEngine> {
const systemInstruction = `You are a helpful assistant for ${config.confluenceBaseUrl}. Answer the question based on the following context. If you cannot answer, say so.`;