A conversational AI that answers employee and customer questions by searching Notion workspaces, using Perplexity's search and REAA's hybrid RAG for context-rich, cited responses.
Small businesses store institutional knowledge in Notion but struggle to find answers quickly across scattered pages, leading to repetitive questions and lost productivity.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a conversational AI that answers employee and customer questions by searching your Notion workspace. When a question comes in, the pipeline retrieves the most relevant Notion page chunks using Qdrant vector search, then hands the query and context to Perplexity’s pplx-70b-online model to generate a cited answer — all while keeping your combined Perplexity and embedding costs under a configurable budget.
Perplexity API key — sign up at https://perplexity.ai and create a key from your dashboard
Qdrant instance — use the cloud service or run locally with docker run -p 6333:6333 qdrant/qdrant
Step 1: Clone the scaffold and install dependencies
The project starts from a Next.js scaffold with all packages pre-installed. Verify the dependencies in package.json match these exact versions before you begin:
The four required variables are NOTION_TOKEN, NOTION_DATABASE_ID, PERPLEXITY_API_KEY, and QDRANT_URL. The config loader in src/lib/config.ts throws a descriptive error naming the missing variable if any required var is absent. Optional vars (prefixed with <optional-...> in .env.example) are safe to leave blank.
Step 3: Create the Notion indexer
The src/lib/notion-indexer.ts module fetches pages from your Notion database, extracts their text content, and splits them into overlapping chunks for embedding.
Import the Notion client and the pagination utilities:
ts
import { Client, collectPaginatedAPI, iteratePaginatedAPI, isFullPageOrDataSource, APIResponseError } from "@notionhq/client";import type { RichTextItemResponse } from "@notionhq/client";
The createNotionClient factory returns a configured Client instance:
ts
export function createNotionClient(token: string): Client { return new Client({ auth: token });}
fetchAllDataSourcePages collects all pages from your database using collectPaginatedAPI:
extractPageContent walks the block tree of each page and concatenates plain text from paragraphs, headings, lists, and other text-bearing blocks:
ts
const TEXT_BLOCK_TYPES = new Set([ "paragraph", "heading_1", "heading_2", "heading_3", "heading_4", "bulleted_list_item", "numbered_list_item", "to_do", "toggle", "quote", "callout", "code",]);export async function extractPageContent(client: Client, pageId: string): Promise<string> { const textParts: string[] = []; for await (const block of iteratePaginatedAPI(client.blocks.children.list, { block_id: pageId })) { const blockData = block as Record<string, unknown>; const blockType = blockData.type as string | undefined; if (!blockType || !TEXT_BLOCK_TYPES.has(blockType)) continue; const content = blockData[blockType] as Record<string, unknown> | undefined; if (!content) continue; const richText = content.rich_text as Array<Record<string, unknown>> | undefined; if (!richText) continue; const text = richText.map((r) => { const pt = r.plain_text; return typeof pt === "string" ? pt : ""; }).join(""); if (text) textParts.push(text); } return textParts.join("\n\n");}
splitIntoChunks splits page text by paragraph boundaries and merges paragraphs until the chunk reaches chunkSize characters (default 1000), keeping overlap characters from the previous chunk’s tail for context continuity:
The src/lib/embedding.ts module wraps FastEmbed’s FlagEmbedding class with the BGEBaseEN model (768-dimensional embeddings). The model downloads once on first call and is cached in a module-level variable.
Import and initialize the model:
ts
import { EmbeddingModel, FlagEmbedding } from "fastembed";let model: FlagEmbedding | null = null;export async function createEmbeddingModel(): Promise<FlagEmbedding> { try { const m = await FlagEmbedding.init({ model: EmbeddingModel.BGEBaseEN }); model = m; return m; } catch (error) { const msg = error instanceof Error ? error.message : String(error); console.error("[embedding] FlagEmbedding.init failed:", msg); throw error; }}export function getEmbeddingModel(): FlagEmbedding { if (!model) throw new Error("Embedding model not initialized. Call createEmbeddingModel() first."); return model;}
generateEmbedding produces a single embedding for a query string. generateChunkEmbeddings batches passage embeddings through an async generator:
ts
export async function generateEmbedding(text: string): Promise<number[]> { const m = model ?? await createEmbeddingModel(); return m.queryEmbed(text);}export async function generateChunkEmbeddings(chunks: string[], batchSize?: number): Promise<number[][]> { const m = model ?? await createEmbeddingModel(); const bs = batchSize ?? 256; const generator = m.passageEmbed(chunks, bs); const result: number[][] = []; for await (const batch of generator) { result.push(...batch); } return result;}
Step 5: Create the Qdrant vector store wrapper
The src/lib/qdrant-store.ts module wraps @reaatech/hybrid-rag-qdrant’s QdrantClientWrapper to handle collection creation, batch upserts, and vector search.
Create the wrapper with Cosine distance (recommended for text embeddings) and 768-dimensional vectors matching BGEBaseEN:
ts
import { QdrantClientWrapper } from "@reaatech/hybrid-rag-qdrant";import type { RetrievalResult as HybridRetrievalResult } from "@reaatech/hybrid-rag";import type { EnvConfig, NotionChunk, RetrievalResult } from "./types.js";export async function createQdrantWrapper(config: EnvConfig): Promise<QdrantClientWrapper> { const wrapper = new QdrantClientWrapper({ url: config.QDRANT_URL, apiKey: config.QDRANT_API_KEY, collectionName: "notion_chunks", vectorSize: 768, distance: "Cosine", }); try { await wrapper.initialize(); } catch (err) { throw new Error(`Qdrant initialization failed: ${String(err)}`); } return wrapper;}
upsertChunksToQdrant maps each NotionChunk to a Qdrant point and calls upsertBatch. Empty arrays are a no-op:
The Memory.content field stores a JSON string encoding the speaker and text, which the Perplexity client parses back out when building conversation history.
Step 7: Create the budget guard
The src/services/budget-guard.ts module sets up the BudgetController from @reaatech/agent-budget-engine with a custom PerplexityPricingProvider that maps model names to USD-per-million-token rates.
The PerplexityPricingProvider class implements the PricingProvider interface expected by BudgetController:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import type { PricingProvider as EnginePricingProvider } from "@reaatech/agent-budget-engine";import { BudgetScope } from "@reaatech/agent-budget-types";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { randomUUID } from "node:crypto";export { BudgetScope, SpendStore };export function createBudgetController(pricingProvider: EnginePricingProvider): BudgetController { const controller = new BudgetController({ spendTracker: new SpendStore(), pricing: pricingProvider, }); controller.on("hard-stop", (event: unknown) => { console.warn("[budget] hard-stop triggered", event); }); return controller;}export function defineDefaultBudget(controller: BudgetController, limitUsd: number): void { controller.defineBudget({ scopeType: BudgetScope.User, scopeKey: "*", limit: limitUsd, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: [], disableTools: [] }, });}
checkAndRecordBudget performs a pre-flight check using controller.check() and conditionally records the actual spend with controller.record():
The src/lib/perplexity-client.ts module calls the Perplexity chat completions endpoint with the retrieved context chunks and session history injected as messages.
Create the client:
ts
import Perplexity, { ChatCompletionsPostRequestModelEnum, ChatCompletionsPostRequest } from "perplexity-sdk";import type { RetrievalResult } from "./types.js";export function createPerplexityClient(apiKey: string) { return new Perplexity({ apiKey }).client();}
generateAnswer builds a system prompt with the context chunks, prepends the parsed session history, and appends the user’s query:
ts
export async function generateAnswer( client: ReturnType<typeof createPerplexityClient>, query: string, contextChunks: RetrievalResult[], sessionHistory?: Array<{ content: string }>,): Promise<{ answer: string; citations: Array<{ sourceTitle: string; excerpt: string }>; usage: { inputTokens: number; outputTokens: number } }> { const contextBlock = contextChunks.length > 0 ? "Context chunks:\n" + contextChunks.map((c, i) => `${String(i + 1)}. ${c.content}`).join("\n") : ""; const systemPrompt = "You are a knowledge agent. Answer the user's question using the context chunks below. Cite each source by its title and include a short excerpt. Output valid JSON with an 'answer' field and a 'citations' array of {sourceTitle, excerpt}."; const systemContent = contextBlock ? `${systemPrompt}\n\n${contextBlock}` : systemPrompt; const rawMessages: Array<{ role: string; content: string }> = [ { role: "system", content: systemContent }, ]; if (sessionHistory) { for (const mem of sessionHistory) { try { const parsed = JSON.parse(mem.content) as { speaker: string; text: string }; rawMessages.push({ role: parsed.speaker === "user" ? "user" : "assistant", content: parsed.text, }); } catch { rawMessages.push({ role: "user", content: mem.content }); } } } rawMessages.push({ role: "user", content: query }); const request = new ChatCompletionsPostRequest(); request.model = ChatCompletionsPostRequestModelEnum.Pplx70bOnline; request.messages = rawMessages; try { const result = await client.chatCompletionsPost(request); const inputTokens = result.usage?.promptTokens ?? 0; const outputTokens = result.usage?.completionTokens ?? 0; const rawContent = result.choices?.[0]?.message?.content ?? ""; let answer: string; let citations: Array<{ sourceTitle: string; excerpt: string }> = []; try { const parsed: unknown = JSON.parse(rawContent); const parsedObj = parsed as Record<string, unknown>; answer = rawContent; if (Array.isArray(parsedObj.citations)) { citations = parsedObj.citations.map((c: unknown) => { const cit = c as Record<string, unknown>; const st = cit.sourceTitle; const ex = cit.excerpt; return { sourceTitle: typeof st === "string" ? st : "", excerpt: typeof ex === "string" ? ex : "", }; }); } } catch { answer = rawContent; } return { answer, citations, usage: { inputTokens, outputTokens } }; } catch (err) { throw new Error(`Perplexity API error: ${String(err)}`); }}
Step 9: Add Zod validation and JSON repair
The src/lib/schemas.ts module defines request/response schemas using Zod and provides a validateAndRepairJson function that handles malformed JSON from the model.
Define the schemas:
ts
import { z } from "zod";import type { Citation } from "./types.js";export const CitationSchema = z.object({ sourceTitle: z.string(), excerpt: z.string(), relevanceScore: z.number().min(0).max(1),});export const PerplexityJsonResponseSchema = z.object({ answer: z.string(), citations: z.array(CitationSchema),});export const ChatRequestSchema = z.object({ query: z.string().min(1).max(4000), sessionId: z.string().optional(), maxTokens: z.number().int().min(1).max(4096).optional(),});
validateAndRepairJson attempts direct parse, then applies three repair strategies — strip trailing commas, escape unescaped quotes, and close open braces — before falling back to regex extraction of the answer field:
ts
function repairJson(raw: string): string { let repaired = raw.replace(/,(\s*[\]}])/g, "$1"); repaired = repaired.replace(/(?<!\\)"(?=(?:[^"]*"[^"]*")*[^"]*$)/g, '\\"'); let openBraces = 0; for (const ch of repaired) { if (ch === "{") openBraces++; if (ch === "}") openBraces--; } if (openBraces > 0) { repaired += "}".repeat(openBraces); } return repaired;}export function validateAndRepairJson(raw: string): { answer: string; citations: Citation[];} { try { const parsed: unknown = JSON.parse(raw); return PerplexityJsonResponseSchema.parse(parsed); } catch { try { const repaired = repairJson(raw); const parsed: unknown = JSON.parse(repaired); return PerplexityJsonResponseSchema.parse(parsed); } catch { const match = raw.match(/"answer"\s*:\s*"([^"]*)"/); const answer = match ? match[1] : raw; return { answer, citations: [] }; } }}
Step 10: Assemble the answer generator service
The src/services/answer-generator.ts module orchestrates the full pipeline: validation, session loading, embedding, retrieval, budget pre-flight, Perplexity call, JSON repair, spend recording, and response assembly.
The AnswerGenerator class holds all dependencies as constructor arguments:
ts
import type { QdrantClientWrapper } from "@reaatech/hybrid-rag-qdrant";import { config } from "../lib/config.js";import { ChatRequestSchema, validateAndRepairJson } from "../lib/schemas.js";import { generateEmbedding } from "../lib/embedding.js";import { createQdrantWrapper, searchSimilarChunks } from "../lib/qdrant-store.js";import { InMemorySessionStore } from "../lib/session-store.js";import type { BudgetController } from "@reaatech/agent-budget-engine";import { createPerplexityClient, generateAnswer } from "../lib/perplexity-client.js";import { createBudgetController, defineDefaultBudget, checkAndRecordBudget, PerplexityPricingProvider, DEFAULT_PERPLEXITY_MODEL,} from "./budget-guard.js";import { createTrace } from "./observability.js";import type { ChatResponse } from "../lib/types.js";export class AnswerGenerator { private qdrantWrapper: QdrantClientWrapper; private sessionStore: InMemorySessionStore; private perplexityClient: ReturnType<typeof createPerplexityClient>; private pricingProvider: PerplexityPricingProvider; private budgetController: BudgetController; private initialized: boolean = false; constructor(deps: { qdrantWrapper: QdrantClientWrapper; sessionStore: InMemorySessionStore; perplexityClient: ReturnType<typeof createPerplexityClient>; pricingProvider: PerplexityPricingProvider; budgetController: BudgetController; }) { this.qdrantWrapper = deps.qdrantWrapper; this.sessionStore = deps.sessionStore; this.perplexityClient = deps.perplexityClient; this.pricingProvider = deps.pricingProvider; this.budgetController = deps.budgetController; }
The generateAnswer method runs the full pipeline. On budget exhaustion, it returns early with a specific message rather than hitting the API:
The module exports an async singleton getter that lazy-initializes all dependencies on first use:
ts
let singleton: AnswerGenerator | null = null;export async function getAnswerGenerator(): Promise<AnswerGenerator> { if (!singleton) { const pricingProvider = new PerplexityPricingProvider(); const budgetController = createBudgetController(pricingProvider); const qdrantWrapper = await createQdrantWrapper(config); const sessionStore = new InMemorySessionStore(); const perplexityClient = createPerplexityClient(config.PERPLEXITY_API_KEY); singleton = new AnswerGenerator({ qdrantWrapper, sessionStore, perplexityClient, pricingProvider, budgetController, }); } return singleton;}
Step 11: Create the chat API route
The src/api/chat/route.ts module exposes the POST /api/chat endpoint using Next.js App Router conventions. It validates the request body with Zod, calls the AnswerGenerator, and returns the response with an optional sessionId.
The src/services/observability.ts module initializes Langfuse if the public and secret keys are present in the environment, otherwise it stores a no-op stub:
The src/cron/index.ts module exports runNotionIndexingJob, which fetches all Notion pages, generates embeddings, and upserts them into Qdrant. The job runs on server startup via instrumentation.ts:
Step 14: Configure Next.js instrumentation for startup
The src/instrumentation.ts file runs once when the Next.js server starts in Node.js environments. It guards against Edge runtime and uses dynamic imports to keep Node-only modules out of the Edge bundle:
next.config.ts must have experimental.instrumentationHook: true for this file to be picked up. The scaffold already has this configured:
ts
import type { NextConfig } from "next";const nextConfig = { experimental: { instrumentationHook: true, },} as NextConfig;export default nextConfig;
Step 15: Run the application
Start the development server:
terminal
pnpm dev
On boot, instrumentation.ts runs initObservability() and then calls runNotionIndexingJob() to perform the initial indexing pass — your Notion pages are fetched, chunked, embedded, and stored in Qdrant before the server begins accepting requests.
Send a question to the chat API:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -d '{"query": "What is our PTO policy?"}'
Expected output: A JSON response with answer, citations, confidence, sessionId, and usage fields:
json
{ "answer": "According to the HR Handbook...", "citations": [ { "sourceTitle": "HR Handbook 2024", "excerpt": "All full-time employees receive 20 days of paid time off per year...", "relevanceScore": 0.89 } ], "confidence": 0.89, "sessionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "usage": { "perplexityTokens": 324, "embeddingTokens": 12, "totalCost": 0.000324 }}
Next steps
Add a recurring cron schedule using the CRON_SCHEDULE env var to re-index pages daily as your Notion workspace changes.
Connect the observability module to a hosted Langfuse instance for production tracing by setting LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY.
Replace the in-memory session store with a Redis-backed adapter from @reaatech/agent-memory-storage to persist conversations across server restarts.
Tune the budget limit by adjusting BUDGET_LIMIT_USD — the default is $5.00 with an 80% soft cap and a hard cap at 100%.