Databricks RAG Pipeline for Insurance Policy Analysis

A retrieval‑augmented generation service that lets small insurance agencies query policy documents with natural language, backed by Databricks LLMs and pgvector.

databricks rag insurance pgvector voyageai express typescript

The problem

Insurance brokers routinely waste hours manually searching through lengthy policy PDFs to answer coverage questions. Ambiguous phrasing and inconsistent document formats make keyword search unreliable.

Built from

Intro

This tutorial walks you through building a retrieval-augmented generation pipeline for insurance policy analysis using Next.js 16 and the App Router. You’ll build a service that lets insurance brokers query policy PDFs with natural language: upload a document, have it chunked and embedded via VoyageAI, stored in pgvector (PostgreSQL), and then ask questions answered by a Databricks-hosted LLM with semantic caching, context-window planning, structured output repair, and per-tenant cost telemetry.

Prerequisites

Node.js 22+ and pnpm 10 installed on your machine
A Databricks workspace with a model serving endpoint (e.g., databricks-dbrx-instruct) and a PAT token
A VoyageAI API key for embedding generation
A PostgreSQL database with the pgvector extension installed
A Langfuse account (for LLM observability) — optional but recommended
Basic familiarity with TypeScript and Next.js App Router conventions

Step 1: Scaffold the project and configure environment variables

The scaffold provides Next.js 16 with the App Router, all dependencies pinned to exact versions in package.json, and a src/ directory for your service code. The key packages you’ll use are:

@databricks/sdk-experimental for Databricks authentication and API calls
@reaatech/agent-memory-storage for pgvector-backed chunk persistence
@reaatech/agent-memory-retrieval for hybrid retrieval strategies
@reaatech/context-window-planner for packing chunks within token budgets
@reaatech/llm-cache for semantic caching of LLM responses
@reaatech/structured-repair-core for repairing malformed JSON output

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

183 kB·111 tests·96.5% coverage·vitest passing

SHA-256e12370b780ff7dd24cd1efcfef3c4918c001246d7510f4e3a40d3cf13f682e93

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js 22+ and pnpm 10 installed on your machine
A Databricks workspace with a model serving endpoint (e.g., databricks-dbrx-instruct) and a PAT token
A VoyageAI API key for embedding generation
A PostgreSQL database with the pgvector extension installed
A Langfuse account (for LLM observability) — optional but recommended
Basic familiarity with TypeScript and Next.js App Router conventions

Step 1: Scaffold the project and configure environment variables

The scaffold provides Next.js 16 with the App Router, all dependencies pinned to exact versions in package.json, and a src/ directory for your service code. The key packages you’ll use are:

@databricks/sdk-experimental for Databricks authentication and API calls
@reaatech/agent-memory-storage for pgvector-backed chunk persistence
@reaatech/agent-memory-retrieval for hybrid retrieval strategies
@reaatech/context-window-planner for packing chunks within token budgets
@reaatech/llm-cache for semantic caching of LLM responses
@reaatech/structured-repair-core for repairing malformed JSON output

import { z } from "zod"; import { loadConfig as loadTelemetryConfig } from "@reaatech/llm-cost-telemetry"; const envSchema = z.object({ DATABRICKS_HOST: z.string(), DATABRICKS_TOKEN: z.string().min(1), DATABRICKS_CLIENT_ID: z.string().optional().default(""), DATABRICKS_CLIENT_SECRET: z.string().optional().default(""), DATABRICKS_SERVING_ENDPOINT: z.string().default("databricks-dbrx-instruct"), VOYAGE_API_KEY: z.string().min(1), PGHOST: z.string().default("localhost"), PGPORT: z.coerce.number().default(5432), PGDATABASE: z.string().default("insurance_policies"), PGUSER: z.string().default("postgres"), PGPASSWORD: z.string().default(""), LANGFUSE_PUBLIC_KEY: z.string().optional().default(""), LANGFUSE_SECRET_KEY: z.string().optional().default(""), LANGFUSE_HOST: z.string().default("https://cloud.langfuse.com"), LLM_CACHE_SIMILARITY_THRESHOLD: z.coerce.number().default(0.8), LLM_CACHE_TTL_SECONDS: z.coerce.number().default(3600), DEFAULT_DAILY_BUDGET: z.coerce.number().default(100.0), }); export type Env = z.infer<typeof envSchema>; function loadEnv(): Env { const result = envSchema.safeParse(process.env); if (!result.success) { console.error("Invalid environment variables:", result.error.message); return envSchema.parse({ DATABRICKS_HOST: process.env.DATABRICKS_HOST || "https://placeholder.databricks.com", DATABRICKS_TOKEN: process.env.DATABRICKS_TOKEN || "placeholder", VOYAGE_API_KEY: process.env.VOYAGE_API_KEY || "placeholder", ...process.env, }); } return result.data; } export const env = loadEnv(); const telemetryConfig = loadTelemetryConfig(); export const config = { databricks: { host: env.DATABRICKS_HOST, token: env.DATABRICKS_TOKEN, clientId: env.DATABRICKS_CLIENT_ID, clientSecret: env.DATABRICKS_CLIENT_SECRET, servingEndpoint: env.DATABRICKS_SERVING_ENDPOINT, }, voyage: { apiKey: env.VOYAGE_API_KEY, }, database: { host: env.PGHOST, port: env.PGPORT, database: env.PGDATABASE, user: env.PGUSER, password: env.PGPASSWORD, }, langfuse: { publicKey: env.LANGFUSE_PUBLIC_KEY, secretKey: env.LANGFUSE_SECRET_KEY, baseUrl: env.LANGFUSE_HOST, }, cache: { similarityThreshold: env.LLM_CACHE_SIMILARITY_THRESHOLD, ttlSeconds: env.LLM_CACHE_TTL_SECONDS, }, telemetry: { defaultDailyBudget: env.DEFAULT_DAILY_BUDGET, config: telemetryConfig, }, };

import { VoyageAIClient, VoyageAIError } from "voyageai"; import { FlagEmbedding, EmbeddingModel } from "fastembed"; import { config } from "../lib/config.js"; type FastEmbedModel = Awaited<ReturnType<typeof FlagEmbedding.init>>; async function withRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> { for (let attempt = 1; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (err) { if (attempt === maxRetries) throw err; await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000)); } } throw new Error("Unreachable"); } export class EmbeddingService { private voyageClient: VoyageAIClient; private fastEmbedModel: FastEmbedModel | null = null; constructor() { this.voyageClient = new VoyageAIClient({ apiKey: config.voyage.apiKey }); } async initialize(): Promise<void> { try { this.fastEmbedModel = await FlagEmbedding.init({ model: EmbeddingModel.BGEBaseEN, }); } catch { console.warn("fastembed init failed — VoyageAI only"); } } async embedText(text: string): Promise<number[]> { try { const response = await withRetry(() => this.voyageClient.embed({ input: text, model: "voyage-3", }), ); return response.data?.[0]?.embedding ?? []; } catch (err) { if (err instanceof VoyageAIError && this.fastEmbedModel) { return this.fallbackEmbed(text); } throw err; } } async embedBatch(texts: string[]): Promise<number[][]> { try { const response = await withRetry(() => this.voyageClient.embed({ input: texts, model: "voyage-3", }), ); return (response.data ?? []).map( (d: { embedding?: number[] }) => d.embedding ?? [], ); } catch (err) { if (err instanceof VoyageAIError && this.fastEmbedModel) { return this.fallbackEmbedBatch(texts); } throw err; } } private async fallbackEmbed(text: string): Promise<number[]> { if (!this.fastEmbedModel) return []; const embeddings = this.fastEmbedModel.embed([text], 1); for await (const batch of embeddings) { return batch[0] ?? []; } return []; } private async fallbackEmbedBatch(texts: string[]): Promise<number[][]> { if (!this.fastEmbedModel) return texts.map(() => []); const results: number[][] = []; const generator = this.fastEmbedModel.embed(texts, texts.length); for await (const batch of generator) { results.push(...batch); } return results; } }

import { WorkspaceClient, Config, ApiError } from "@databricks/sdk-experimental"; import { config } from "../lib/config.js"; import { DatabricksApiError } from "../types/index.js"; import { recordLlmCall } from "../lib/cost-telemetry.js"; export class DatabricksService { private client: WorkspaceClient; constructor() { const cfg = new Config({ host: config.databricks.host, token: config.databricks.token, }); this.client = new WorkspaceClient(cfg); } async generateAnswer( prompt: string, ): Promise<{ text: string; usage: { inputTokens: number; outputTokens: number } }> { if (!prompt.trim()) { throw new Error("Prompt cannot be empty"); } const url = `${config.databricks.host}/serving-endpoints/${config.databricks.servingEndpoint}/invocations`; const response = await fetch(url, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${config.databricks.token}`, }, body: JSON.stringify({ messages: [{ role: "user", content: prompt }], max_tokens: 1024, temperature: 0.2, }), }); if (!response.ok) { const errorBody = await response.text(); const apiErr = new ApiError( `Databricks API error: ${String(response.status)} ${errorBody}`, String(response.status), response.status, response, [], ); throw new DatabricksApiError(apiErr.message, apiErr.statusCode); } const body = (await response.json()) as { choices?: Array<{ message?: { content?: string } }>; usage?: { prompt_tokens?: number; completion_tokens?: number }; }; const text = body.choices?.[0]?.message?.content ?? ""; const usage = { inputTokens: body.usage?.prompt_tokens ?? 0, outputTokens: body.usage?.completion_tokens ?? 0, }; recordLlmCall( "openai", config.databricks.servingEndpoint, usage.inputTokens, usage.outputTokens, "default", "generateAnswer", ); return { text, usage }; } }

import { CacheEngine, InMemoryAdapter, OpenAIEmbedder, } from "@reaatech/llm-cache"; import { config } from "../lib/config.js"; export class CacheService { private engine: CacheEngine; constructor() { this.engine = new CacheEngine({ storage: new InMemoryAdapter(), vectorStorage: new InMemoryAdapter(), embedder: new OpenAIEmbedder({ provider: "openai", model: "text-embedding-3-small", dimensions: 1536, apiKey: process.env.OPENAI_API_KEY ?? "", }), config: { storage: { adapter: "memory" }, vectorStorage: { adapter: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", dimensions: 1536, batchSize: 100, maxRetries: 3, }, similarity: { threshold: config.cache.similarityThreshold, metric: "cosine", maxResults: 10, }, ttl: { default: config.cache.ttlSeconds, factual: 1800, creative: 7200, analytical: 3600, sensitive: 600, byUseCase: {}, }, segmentation: { enabled: true, defaultUseCase: "insurance-rag" }, cost: { enabled: true, currency: "USD" }, observability: { metrics: true, tracing: false, logging: "info" }, }, }); } async get( prompt: string, tenantId: string, ): Promise<{ hit: boolean; type?: "exact" | "semantic"; entry?: unknown; reason?: string }> { const result = await this.engine.get(prompt, { useCase: tenantId, model: "databricks", }); if (result.hit) { return { hit: true, type: result.type, entry: result.entry }; } return { hit: false, reason: result.reason }; } async set( prompt: string, response: string, tenantId: string, usage: { prompt: number; completion: number }, ): Promise<void> { await this.engine.set( prompt, response, { useCase: tenantId, model: "databricks" }, { tokens: usage }, ); } async invalidateTenant(tenantId: string): Promise<void> { await this.engine.invalidate({ useCase: tenantId }); } }

import { type NextRequest, NextResponse } from "next/server"; import { z } from "zod"; import { QueryService } from "../../../src/services/query-service.js"; import { CacheService } from "../../../src/services/cache-service.js"; import { EmbeddingService } from "../../../src/services/embedding-service.js"; import { StorageService } from "../../../src/services/storage-service.js"; import { DatabricksService } from "../../../src/services/databricks-service.js"; import { ContextPlanner } from "../../../src/lib/context-planner.js"; import { repairLlmOutput } from "../../../src/lib/repair.js"; import { recordLlmCall } from "../../../src/lib/cost-telemetry.js"; import { traceLlmCall } from "../../../src/lib/observability.js"; const querySchema = z.object({ question: z.string().min(1), tenantId: z.string().min(1), }); let queryService: QueryService | null = null; async function getQueryService(): Promise<QueryService> { if (!queryService) { const storageService = new StorageService(false); storageService.initialize(); const cacheService = new CacheService(); const embeddingService = new EmbeddingService(); await embeddingService.initialize(); const contextPlanner = new ContextPlanner(); const databricksService = new DatabricksService(); queryService = new QueryService( cacheService, embeddingService, storageService, contextPlanner, databricksService, repairLlmOutput, recordLlmCall, traceLlmCall, ); } return queryService; } export async function POST(req: NextRequest): Promise<NextResponse> { try { const body = await req.json() as Record<string, unknown>; const parsed = querySchema.safeParse(body); if (!parsed.success) { return NextResponse.json( { error: "Invalid request", details: parsed.error.issues }, { status: 400 }, ); } const { question, tenantId } = parsed.data; const service = await getQueryService(); const result = await service.query(question, tenantId); return NextResponse.json(result, { status: 200 }); } catch (err) { const message = err instanceof Error ? err.message : "Query failed"; return NextResponse.json({ error: message }, { status: 500 }); } }

Databricks RAG Pipeline for Insurance Policy Analysis

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and configure environment variables

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and configure environment variables

Step 2: Define shared types and configuration

Step 3: Initialize the database connection and pgvector extension

Step 4: Build the embedding service with VoyageAI and fastembed fallback

Step 5: Build the storage service with PostgresMemoryStorage

Step 6: Build the context window planner

Step 7: Build the Databricks LLM service

Step 8: Build the LLM cache service

Step 9: Build structured repair and cost telemetry

Step 10: Build the ingestion pipeline

Step 11: Build the query pipeline (the RAG loop)

Step 12: Create the API routes

Step 13: Run the tests

Next steps