Property managers spend excessive time answering routine tenant questions via email and phone, reducing time for higher-value tasks like leasing and maintenance coordination.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a self-service AI knowledge base that answers tenant questions about policies, maintenance, and lease terms using data from AppFolio. You’ll index AppFolio lease agreements, property rules, and FAQs into a Qdrant vector store with hybrid search, answer questions through AWS Bedrock Claude with source citations, track LLM spend per tenant with a budget engine, and validate answer quality with golden trajectory testing. By the end you’ll have a working Next.js chat API, a health check endpoint, and a basic chat widget UI.
AWS account with Bedrock access enabled (Claude Sonnet 4 model)
Qdrant — a local instance (docker run -p 6333:6333 qdrant/qdrant) or a cloud cluster
Voyage AI API key — for embedding generation
AppFolio API key and property ID — the target data source
(Optional) Langfuse account — for observability tracing
Step 1: Scaffold the project
The project shell is already on disk with a Next.js 16 App Router setup, pnpm install completed, and all root configs in place. Start by verifying the scaffold.
terminal
ls package.json next.config.ts tsconfig.json vitest.config.ts .env.example
Expected output: All five files exist. The pins every dependency to an exact semver — no or anywhere.
Expected output: The config module reads process.env and either returns a typed object or throws with a message like Missing required env vars: AWS_REGION, APPFOLIO_API_KEY.
Step 4: Build the AppFolio API client
Create src/appfolio/client.ts — this fetches lease documents, property rules, FAQs, tenant info, and maintenance tickets from AppFolio’s REST API.
ts
import { config } from "../config/index.js";import { type AppFolioDocument, type TenantContext, type MaintenanceTicket, AppFolioError,} from "../types/index.js";function safeString(v: unknown): string { if (typeof v === 'string') return v; if (typeof v === 'number' || typeof v === 'boolean') return String(v); return '';}export
Expected output: The request helper catches network errors and non-2xx status codes, wrapping both as AppFolioError. The getPropertyRules method loops over paginated responses until nextPageToken is absent.
Step 5: Create the Voyage AI embedding service
Create src/rag/embedding.ts — it extends EmbeddingService from @reaatech/hybrid-rag-embedding and wraps the Voyage AI API:
Expected output:embedBatch uses p-limit to limit concurrent API calls to 5. An empty array returns [] without making any API call.
Step 6: Build the Qdrant vector store adapter
Create src/rag/vector-store.ts — it wraps @qdrant/js-client-rest with retry logic via p-retry and implements hybrid search through reciprocal rank fusion:
ts
import { QdrantClient } from "@qdrant/js-client-rest";import pRetry from "p-retry";import { ChunkingStrategy, type RetrievalResult,} from "@reaatech/hybrid-rag";import { config } from "../config/index.js";interface HybridSearchQuery { vector: number[]; text: string; topK: number; vectorWeight?: number;}interface ChunkInput { id: string; vector
Expected output: Every Qdrant call is wrapped with p-retry (3 retries, 1s backoff). hybridSearch runs vector search, then fuses results by weighted reciprocal rank. The ensureCollection method creates the collection only if it doesn’t exist.
Step 7: Build the RAG ingestion pipeline
Create src/rag/ingestion.ts — it pulls all documents from AppFolio, chunks them, embeds each chunk, and upserts into Qdrant:
ts
import { AppFolioClient } from "../appfolio/client.js";import { VoyageEmbeddingService } from "./embedding.js";import { QdrantStore } from "./vector-store.js";import type { AppFolioDocument } from "../types/index.js";import { config } from "../config/index.js";interface ChunkResult { id: string; documentId: string; content: string; position: number;}interface IngestionStats { ingested: number;
Expected output: Documents are split by paragraph boundaries, then chunked at 512 characters with 50-character overlap between chunks. If embedding or upsert fails for one document, the pipeline logs the error and continues with the next — partial success is the default.
Step 8: Build the Bedrock Claude LLM wrapper
Create src/llm/bedrock.ts — it wraps the AWS SDK’s ConverseCommand and ConverseStreamCommand:
Expected output:generate sends a ConverseCommand, safely extracts text and token counts, and wraps SDK errors in BedrockError. generateStream is an async generator that yields text deltas as they arrive from the stream.
Step 9: Wire up the budget engine
Create three files under src/budget/:
src/budget/store.ts — an in-memory spend tracker extending SpendStore:
ts
import { SpendStore } from "@reaatech/agent-budget-spend-tracker";export class InMemorySpendStore extends SpendStore { private limits = new Map<string, number>(); setLimit(scopeType: string, scopeKey: string, limit: number): void { this.limits.set(`${scopeType}:${scopeKey}`, limit); } getLimit(scopeType: string, scopeKey: string): number { return this.limits.get(`${scopeType}:${scopeKey}`) ?? 0; }}
src/budget/pricing.ts — cost estimation for the budget controller:
src/budget/controller.ts — instantiates the BudgetController, registers event listeners, and exports helper functions:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { BudgetScope } from "@reaatech/agent-budget-types";import { config } from "../config/index.js";import { InMemorySpendStore } from "./store.js";import { pricingProvider } from "./pricing.js";const store = new InMemorySpendStore();export const budgetController = new BudgetController({ spendTracker: store, pricing: pricingProvider, defaultEstimateTokens: 1000,});export function defineTenantBudget( tenantId: string, limit: number,): void { store.setLimit("user", tenantId, limit); budgetController.defineBudget({ scopeType: BudgetScope.User, scopeKey: tenantId, limit, policy: { softCap: config.BUDGET_SOFT_CAP, hardCap: config.BUDGET_HARD_CAP, autoDowngrade: [ { from: ["anthropic.claude-opus-4-v1:0"], to: "anthropic.claude-sonnet-4-v1:0", }, ], disableTools: [], }, });}export function defineAnonymousBudget(): void { const limit = config.BUDGET_DEFAULT_LIMIT; store.setLimit("user", "anonymous", limit); budgetController.defineBudget({ scopeType: BudgetScope.User, scopeKey: "anonymous", limit, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: [], disableTools: [], }, });}budgetController.on("threshold-breach", (e) => { console.warn("Budget threshold breached:", e);});budgetController.on("hard-stop", (e) => { console.error("Budget hard stop:", e);});
Expected output: The singleton budgetController is shared across the application. defineTenantBudget sets a per-tenant spend limit with auto-downgrade policy, and defineAnonymousBudget sets a default limit for unauthenticated queries. Event listeners log warnings and errors for budget threshold and hard-stop events.
Step 10: Build the RAG query pipeline
Create src/rag/pipeline.ts — this is the core RAG-answer flow: embed the question, retrieve documents, check budget, augment the prompt, generate via Bedrock, and record spend:
ts
import { BudgetScope } from "@reaatech/agent-budget-types";import { VoyageEmbeddingService } from "./embedding.js";import { QdrantStore } from "./vector-store.js";import { BedrockClaude } from "../llm/bedrock.js";import { budgetController } from "../budget/controller.js";import { config } from "../config/index.js";import type { TenantContext, ChatResponse, SourceCitation,} from "../types/index.js";export class RAGQueryPipeline { constructor( private embedding: VoyageEmbeddingService,
Expected output: The pipeline follows this call order: embed → hybridSearch → budgetController.check → bedrock.generate → budgetController.record. If the budget check returns allowed: false, the pipeline returns an early error response without calling Bedrock.
Step 11: Create the chat API route
Create app/api/chat/route.ts — the Next.js App Router route handler that accepts chat messages and returns answers:
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { AppFolioClient } from "../../../src/appfolio/client.js";import { createEmbeddingService } from "../../../src/rag/embedding.js";import { QdrantStore } from "../../../src/rag/vector-store.js";import { BedrockClaude } from "../../../src/llm/bedrock.js";import { RAGQueryPipeline } from "../../../src/rag/pipeline.js";const chatSchema = z.object({ message: z.string().min(1).max(2000), tenantId: z.string().optional(), sessionId: z.string().optional(),});export async function POST( req: NextRequest,): Promise<NextResponse> { let body: unknown; try { const text = await req.text(); body = JSON.parse(text); } catch { return new NextResponse( JSON.stringify({ error: "Invalid request body" }), { status: 400, headers: { "Content-Type": "application/json" } }, ); } const parsed = chatSchema.safeParse(body); if (!parsed.success) { return new NextResponse( JSON.stringify({ error: "Invalid request", details: parsed.error.issues }), { status: 400, headers: { "Content-Type": "application/json" } }, ); } try { const appfolio = new AppFolioClient(); const embedding = createEmbeddingService(); const vectorStore = new QdrantStore(); const bedrock = new BedrockClaude(); const pipeline = new RAGQueryPipeline(embedding, vectorStore, bedrock); let tenantCtx; if (parsed.data.tenantId) { try { tenantCtx = await appfolio.getTenantInfo(parsed.data.tenantId); } catch { tenantCtx = undefined; } } const result = await pipeline.query(parsed.data.message, tenantCtx); return new NextResponse( JSON.stringify({ answer: result.answer, citations: result.citations, usage: result.usage, metadata: result.metadata, }), { status: 200, headers: { "Content-Type": "application/json" } }, ); } catch (err) { const message = err instanceof Error ? err.message : "Unknown error"; return new NextResponse( JSON.stringify({ error: "Query failed", message }), { status: 500, headers: { "Content-Type": "application/json" } }, ); }}
Expected output: The route validates the request body with Zod (message required, max 2000 chars), wires up all pipeline dependencies, optionally fetches tenant context (with graceful degradation if the API call fails), and returns a structured response with answer, citations, usage, and metadata.
Step 12: Create the health check route
Create app/api/health/route.ts — it tests connectivity to Qdrant and Bedrock:
Expected output: A GET /api/health returns {"status":"ok","checks":{"qdrant":"ok","bedrock":"ok"},"uptime":...,"version":"0.1.0"} when both services are reachable. If either fails, the status becomes "degraded" and the HTTP status is 503.
Step 13: Create instrumentation and observability
Create src/instrumentation.ts — Next.js calls register() at startup when experimental.instrumentationHook is enabled:
ts
export async function register() { if (process.env.NEXT_RUNTIME === "nodejs") { const { initObservability } = await import("./observability/langfuse.js"); initObservability(); }}
And the observability module at src/observability/langfuse.ts:
ts
import { config } from "../config/index.js";import type { SourceCitation } from "../types/index.js";export function initObservability(): void { if (config.LANGFUSE_PUBLIC_KEY && config.LANGFUSE_SECRET_KEY) { console.info("Langfuse observability configured"); }}export function traceChat( query: string, answer: string, tenantId: string, citations: SourceCitation[], usage: { inputTokens: number; outputTokens: number }, latencyMs: number,): void { console.info( JSON.stringify({ event: "chat", query, answer, tenantId, citations: citations.length, usage, latencyMs, timestamp: new Date().toISOString(), }), );}export async function flushObservability(): Promise<void> { // no-op}
Step 14: Update the public API entry point
Replace src/index.ts to re-export the public API:
ts
export const SCAFFOLD_VERSION = "0.1.0" as const;export { ChunkingStrategy } from "@reaatech/hybrid-rag";export type { RetrievalResult } from "@reaatech/hybrid-rag";export { RAGQueryPipeline } from "./rag/pipeline.js";export { AppFolioClient } from "./appfolio/client.js";export { BedrockClaude } from "./llm/bedrock.js";export { defineTenantBudget, defineAnonymousBudget, budgetController,} from "./budget/controller.js";export { runGoldenEvalSuite } from "./eval/golden.js";export { runRetrievalEval } from "./eval/retrieval.js";export { createEmbeddingService } from "./rag/embedding.js";
Step 15: Run the tests
The project includes a test suite. Run it:
terminal
pnpm typecheckpnpm lintpnpm test
Expected output:
typecheck exits with zero type errors.
lint exits with zero lint errors.
test runs vitest with coverage. You should see numFailedTests=0 and coverage thresholds of at least 90% across lines, branches, functions, and statements.
A few key test patterns used throughout the suite:
Mocking the Bedrock client with aws-sdk-client-mock:
Add streaming to the chat API — wire generateStream from BedrockClaude into a streaming Next.js route handler using ReadableStream.
Deploy the ingestion pipeline as a cron job — schedule IngestionPipeline.ingestAll() to run nightly so the knowledge base stays current with AppFolio updates.
Replace the in-memory spend store — swap InMemorySpendStore for a Redis or SQLite-backed store so budget state survives server restarts.
Integrate Langfuse tracing — replace the console.info stubs in src/observability/langfuse.ts with the real Langfuse SDK to get full trace visibility.