Vertex AI Knowledge Agent for BigQuery SMB Data Queries

Index business data from BigQuery into a vector knowledge base and let users ask natural language questions, with budget-aware caching and step-by-step answer generation.

vertex-ai bigquery rag knowledge-agent gemini pgvector nextjs typescript semantic-cache budget-enforcement

The problem

Small businesses store critical data in BigQuery but non-technical staff can't write SQL; they wait days for reports or make decisions with stale information.

Built from

Intro

This recipe builds a RAG-powered knowledge agent that syncs business data from Google BigQuery into a pgvector knowledge base and answers natural language questions using Vertex AI (Gemini). It includes semantic caching to avoid redundant LLM calls, per-user budget enforcement, session continuity for multi-turn conversations, and confidence routing that escalates low-certainty answers for human review. You’ll use six REAA (Rapid Enterprise AI Architecture) packages and several third-party tools, all wired together in a Next.js 16 App Router project.

Prerequisites

Node.js >= 22 and pnpm 10 — the project uses pnpm workspaces and ES modules
Google Cloud Platform project with BigQuery and Vertex AI API enabled
PostgreSQL instance with the pgvector extension (local or cloud, e.g. postgres://user:***@localhost:5432/knowledge_base)
VoyageAI API key — for generating embeddings (VOYAGE_API_KEY)
OpenAI API key — for semantic cache embeddings (OPENAI_API_KEY)
Langfuse account (free tier works) — for observability (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY)
Familiarity with TypeScript, Next.js App Router, and basic SQL

Step 1: Scaffold the project and install dependencies

Create a new Next.js project with TypeScript and App Router. The recipe pins Next.js 16 and React 19 — using --use-pnpm with create-next-app ensures the project starts with the right package manager:

terminal

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

187 kB·115 tests·95.9% coverage·vitest passing

SHA-2560cc3670087120a5bda1c3b20369e9fe28cc3570f81e1c1fbb07bb0750a7df5d8

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 and pnpm 10 — the project uses pnpm workspaces and ES modules
Google Cloud Platform project with BigQuery and Vertex AI API enabled
PostgreSQL instance with the pgvector extension (local or cloud, e.g. postgres://user:***@localhost:5432/knowledge_base)
VoyageAI API key — for generating embeddings (VOYAGE_API_KEY)
OpenAI API key — for semantic cache embeddings (OPENAI_API_KEY)
Langfuse account (free tier works) — for observability (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY)
Familiarity with TypeScript, Next.js App Router, and basic SQL

Step 1: Scaffold the project and install dependencies

terminal

import { google } from "googleapis"; import type { BigQueryRow } from "../types.js"; export class BigQueryError extends Error { code: number; errors: unknown[]; constructor(message: string, code: number, errors: unknown[] = []) { super(message); this.name = "BigQueryError"; this.code = code; this.errors = errors; } } export class BigQueryClient { private bigquery: ReturnType<typeof google.bigquery>; private projectId: string; constructor(projectId: string) { this.projectId = projectId; const auth = new google.auth.GoogleAuth({ scopes: ["https://www.googleapis.com/auth/bigquery"], }); this.bigquery = google.bigquery({ version: "v2", auth }); } async query(sql: string): Promise<BigQueryRow[]> { try { const res = await this.bigquery.jobs.query({ projectId: this.projectId, requestBody: { query: sql, useLegacySql: false }, }); const rows = res.data.rows ?? []; const fields = res.data.schema?.fields ?? []; return rows.map((row) => { const mapped: BigQueryRow = {}; const values = row.f ?? []; fields.forEach((field, i: number) => { const name = (field as { name?: string }).name ?? `col_${String(i)}`; mapped[name] = String(values[i]?.v ?? ""); }); return mapped; }); } catch (err: unknown) { const bqErr = err as { code?: number; errors?: unknown[] }; throw new BigQueryError( err instanceof Error ? err.message : String(err), bqErr.code ?? 500, bqErr.errors ?? [], ); } } async *streamRows( datasetId: string, tableId: string, batchSize: number = 500, ): AsyncGenerator<BigQueryRow[]> { const tableRes = await this.bigquery.tables.get({ projectId: this.projectId, datasetId, tableId, }); const fields = tableRes.data.schema?.fields ?? []; let pageToken: string | undefined; do { const res = await this.bigquery.tabledata.list({ projectId: this.projectId, datasetId, tableId, maxResults: batchSize, pageToken, }); const rows = res.data.rows ?? []; const mapped: BigQueryRow[] = rows.map((row) => { const item: BigQueryRow = {}; const values = row.f ?? []; fields.forEach((field, i: number) => { const name = (field as { name?: string }).name ?? `col_${String(i)}`; item[name] = String(values[i]?.v ?? ""); }); return item; }); if (mapped.length > 0) yield mapped; pageToken = res.data.pageToken ?? undefined; } while (pageToken); } } export function createBigQueryClient( projectId: string, ): BigQueryClient { return new BigQueryClient(projectId); }

import { VoyageAIClient, VoyageAIError } from "voyageai"; export class Embedder { private client: VoyageAIClient; private cache: Map<string, number[]>; constructor(apiKey: string) { this.client = new VoyageAIClient({ apiKey }); this.cache = new Map(); } async embed(text: string): Promise<number[]> { const cached = this.cache.get(text); if (cached) return cached; return this.withRetry(async () => { const result = await this.client.embed({ input: text, model: "voyage-3", }); const embedding = result.data?.[0]?.embedding ?? []; if (embedding.length > 0) { this.cache.set(text, embedding); } return embedding; }); } async embedBatch(texts: string[]): Promise<number[][]> { const uncached: string[] = []; const results: (number[] | undefined)[] = texts.map((t) => { const cached = this.cache.get(t); if (cached) return cached; uncached.push(t); return undefined; }); if (uncached.length > 0) { const batchResults = await this.withRetry(async () => { const result = await this.client.embed({ input: uncached, model: "voyage-3", }); return result.data?.map((d) => d.embedding ?? []) ?? []; }); let idx = 0; for (let i = 0; i < texts.length; i++) { if (results[i] === undefined) { const emb = batchResults[idx++] ?? []; results[i] = emb; this.cache.set(texts[i], emb); } } } return results as number[][]; } private async withRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (err) { if (err instanceof VoyageAIError && attempt < maxRetries) { const delay = Math.min(1000 * Math.pow(2, attempt), 8000); await new Promise((resolve) => setTimeout(resolve, delay)); continue; } throw err; } } throw new Error("Unreachable"); } } export function createEmbedder(apiKey: string): Embedder { return new Embedder(apiKey); }

import crypto from "node:crypto"; import { loadConfig } from "../config.js"; import type { SyncResult } from "../types.js"; import { createBigQueryClient } from "./bigquery-client.js"; import { createEmbedder } from "./embedder.js"; import { createVectorStore } from "./vector-store.js"; export async function runSync( options?: { dataset?: string; table?: string; limit?: number; }, ): Promise<SyncResult> { const start = performance.now(); const config = loadConfig(); const dataset = options?.dataset ?? config.bigqueryDataset; const table = options?.table ?? config.bigqueryTable; const bqClient = createBigQueryClient(config.projectId); const embedder = createEmbedder(config.voyageApiKey); const vectorStore = createVectorStore(config.databaseUrl); await vectorStore.ensureSchema(); let rowsProcessed = 0; let embeddingsGenerated = 0; let upsertedCount = 0; try { const stream = bqClient.streamRows(dataset, table, 500); for await (const batch of stream) { rowsProcessed += batch.length; const texts: string[] = []; const ids: string[] = []; const metadataList: Record<string, unknown>[] = []; for (const row of batch) { const text = Object.entries(row) .map(([k, v]) => `${k}: ${String(v ?? "")}`) .join("\n"); texts.push(text); const hash = crypto .createHash("sha256") .update(text) .digest("hex") .slice(0, 32); ids.push(hash); metadataList.push({ sourceTable: dataset, sourceRowId: String(row[Object.keys(row)[0]] ?? hash), }); } try { const embeddings = await embedder.embedBatch(texts); embeddingsGenerated += embeddings.length; const items = ids.map((id, i) => ({ id, content: texts[i], embedding: embeddings[i], metadata: metadataList[i], })); await vectorStore.upsertBatch(items); upsertedCount += items.length; } catch { continue; } } } catch { // BigQuery error during streaming - return partial results } finally { await vectorStore.close(); } const durationMs = Math.round(performance.now() - start); return { rowsProcessed, embeddingsGenerated, upsertedCount, durationMs, }; }

import { VertexAI } from "@google-cloud/vertexai"; const SYSTEM_PROMPT = "You are a business data analyst. Answer questions using the provided context rows from a BigQuery table. Each row contains business metrics. When numbers are present, cite them. When confidence is low, say so. Keep answers concise."; export class ContentBlockedError extends Error { constructor(message: string) { super(message); this.name = "ContentBlockedError"; } } export class LlmService { private model: ReturnType<VertexAI["getGenerativeModel"]>; constructor(project: string, location: string) { const vertexAI = new VertexAI({ project, location }); this.model = vertexAI.getGenerativeModel({ model: "gemini-1.5-flash", systemInstruction: { role: "system" as const, parts: [{ text: SYSTEM_PROMPT }], }, }); } async generateAnswer( query: string, context: string[], conversationHistory?: Array<{ role: string; content: string }>, ): Promise<{ text: string; usage: { inputTokens: number; outputTokens: number } }> { const contextBlock = context.map((c) => `- ${c}`).join("\n"); const userMessage = `Context rows:\n${contextBlock}\n\nQuestion: ${query}`; const contents: Array<{ role: string; parts: Array<{ text: string }> }> = []; if (conversationHistory) { for (const msg of conversationHistory) { contents.push({ role: msg.role, parts: [{ text: msg.content }] }); } } contents.push({ role: "user", parts: [{ text: userMessage }] }); const result = await this.withRetry(async () => { const res = await this.model.generateContent({ contents }); return res.response; }); if (String(result.candidates?.[0]?.finishReason) === "SAFETY") { throw new ContentBlockedError("Response blocked due to safety concerns"); } const text = result.candidates?.[0]?.content?.parts?.[0]?.text ?? ""; const usage = result.usageMetadata ?? { promptTokenCount: 0, candidatesTokenCount: 0 }; return { text, usage: { inputTokens: usage.promptTokenCount ?? 0, outputTokens: usage.candidatesTokenCount ?? 0, }, }; } private async withRetry<T>( fn: () => Promise<T>, maxRetries = 3, ): Promise<T> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (err) { if (attempt < maxRetries) { const delay = Math.min(1000 * Math.pow(2, attempt), 8000); await new Promise((resolve) => setTimeout(resolve, delay)); continue; } throw err; } } throw new Error("Unreachable"); } } export function createLlmService( project: string, location: string, ): LlmService { return new LlmService(project, location); }

import { CacheEngine, InMemoryAdapter, OpenAIEmbedder } from "@reaatech/llm-cache"; export type CacheResult = | { hit: true; type: "exact" | "semantic"; entry: { response: object }; confidence?: number; cachedAt: Date } | { hit: false; reason: string }; export class CacheService { private engine: CacheEngine; constructor(apiKey: string, threshold: number, ttlSeconds: number) { this.engine = new CacheEngine({ storage: new InMemoryAdapter(), vectorStorage: new InMemoryAdapter(), embedder: new OpenAIEmbedder({ provider: "openai", model: "text-embedding-3-small", dimensions: 1536, apiKey, }), config: { storage: { adapter: "memory" }, vectorStorage: { adapter: "memory" }, similarity: { threshold, metric: "cosine", maxResults: 10 }, ttl: { default: ttlSeconds, factual: ttlSeconds, creative: ttlSeconds, analytical: ttlSeconds, sensitive: ttlSeconds, byUseCase: {}, }, segmentation: { enabled: true, defaultUseCase: "general" }, embedding: { provider: "openai", model: "text-embedding-3-small", dimensions: 1536, batchSize: 100, maxRetries: 3, }, cost: { enabled: false, currency: "USD" }, observability: { metrics: false, tracing: false, logging: "info" }, }, }); } async getCached(prompt: string): Promise<CacheResult> { const result = await this.engine.get(prompt, { model: "gemini-1.5-flash", modelVersion: "gemini-1.5-flash", useCase: "query", }); if (result.hit) { return { hit: true, type: result.type, entry: { response: result.entry.response as object }, confidence: result.confidence, cachedAt: result.cachedAt, }; } return { hit: false, reason: result.reason }; } async setCache(prompt: string, response: object): Promise<void> { await this.engine.set(prompt, response, { model: "gemini-1.5-flash", modelVersion: "gemini-1.5-flash", useCase: "query", }); } } export function createCacheService( apiKey: string, threshold: number, ttlSeconds: number, ): CacheService { return new CacheService(apiKey, threshold, ttlSeconds); }

import { z } from "zod"; import { NextRequest, NextResponse } from "next/server"; import { Langfuse } from "langfuse"; import { loadConfig } from "../../../src/config.js"; import { createLlmService } from "../../../src/api/llm-service.js"; import { createVectorStore } from "../../../src/sync/vector-store.js"; import { createCacheService } from "../../../src/api/cache-service.js"; import { createBudgetService } from "../../../src/api/budget-service.js"; import { createConfidenceService } from "../../../src/api/confidence-service.js"; import { createSessionService } from "../../../src/api/session-service.js"; import { createCostTelemetryService } from "../../../src/api/cost-telemetry.js"; import { createEmbedder } from "../../../src/sync/embedder.js"; import { createQueryService } from "../../../src/api/query-service.js"; const bodySchema = z.object({ query: z.string().min(1).max(2000), userId: z.string().min(1), sessionId: z.string().optional(), }); const config = loadConfig(); const llm = createLlmService(config.projectId, config.location); const vectorStore = createVectorStore(config.databaseUrl); const cacheService = createCacheService( config.openaiApiKey, config.cacheThreshold, config.cacheTtl, ); const langfuse = new Langfuse({ publicKey: config.langfusePublicKey, secretKey: config.langfuseSecretKey, baseUrl: config.langfuseHost, }); const budgetLogger = { warn: (msg: string, data?: unknown) => { langfuse.trace({ name: msg, metadata: data as Record<string, unknown> | undefined }); }, error: (msg: string, data?: unknown) => { langfuse.trace({ name: msg, metadata: data as Record<string, unknown> | undefined }); }, }; const budgetService = createBudgetService(budgetLogger); const confidenceService = createConfidenceService(); const sessionService = createSessionService(); const costTelemetry = createCostTelemetryService(); const embedder = createEmbedder(config.voyageApiKey); const queryService = createQueryService({ llm, vectorStore, cacheService, budgetService, confidenceService, sessionService, costTelemetry, embedder, }); export async function POST(req: NextRequest): Promise<NextResponse> { try { const raw: unknown = await req.json(); const parsed = bodySchema.safeParse(raw); if (!parsed.success) { return NextResponse.json( { error: "invalid_request", details: parsed.error.issues }, { status: 400 }, ); } const data = parsed.data; const result = await queryService.handleQuery(data); if (result.error) { const status = result.code === "BUDGET_EXCEEDED" ? 429 : 400; return NextResponse.json({ error: result.error, code: result.code }, { status }); } return NextResponse.json(result); } catch (err: unknown) { const message = err instanceof Error ? err.message : String(err); return NextResponse.json( { error: "internal_error", message }, { status: 500 }, ); } }

Metric	Covered / Total	Percentage
Lines	304 / 317	95.89%
Branches	137 / 149	91.94%
Functions	82 / 89	92.13%
Statements	320 / 334	95.80%

Vertex AI Knowledge Agent for BigQuery SMB Data Queries

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Configure environment variables with Zod

Step 3: Set up shared types

Step 4: Build the BigQuery sync pipeline

BigQuery client (`src/sync/bigquery-client.ts`)

Embedder (`src/sync/embedder.ts`)

Vector store (`src/sync/vector-store.ts`)

Step 5: Wire up the sync orchestrator

Step 6: Create the Vertex AI LLM service

Step 7: Wire up the REAA service layer

Cache service (`src/api/cache-service.ts`)

Budget service (`src/api/budget-service.ts`)

Confidence service (`src/api/confidence-service.ts`)

Session service (`src/api/session-service.ts`)

Cost telemetry (`src/api/cost-telemetry.ts`)

Step 8: Build the query orchestrator

Step 9: Expose API routes

Health check (`app/api/health/route.ts`)

Query endpoint (`app/api/query/route.ts`)

Sync endpoint (`app/api/bigquery-sync/route.ts`)

Langfuse instrumentation (`src/instrumentation.ts`)

Step 10: Run the tests

Next steps

Vertex AI Knowledge Agent for BigQuery SMB Data Queries

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Example artifact

Intro

Prerequisites

Step 1: Scaffold the project and install dependencies

Step 2: Configure environment variables with Zod

Step 3: Set up shared types

Step 4: Build the BigQuery sync pipeline

BigQuery client (src/sync/bigquery-client.ts)

Embedder (src/sync/embedder.ts)

Vector store (src/sync/vector-store.ts)

Step 5: Wire up the sync orchestrator

Step 6: Create the Vertex AI LLM service

Step 7: Wire up the REAA service layer

Cache service (src/api/cache-service.ts)

Budget service (src/api/budget-service.ts)

Confidence service (src/api/confidence-service.ts)

Session service (src/api/session-service.ts)

Cost telemetry (src/api/cost-telemetry.ts)

Step 8: Build the query orchestrator

Step 9: Expose API routes

Health check (app/api/health/route.ts)

Query endpoint (app/api/query/route.ts)

Sync endpoint (app/api/bigquery-sync/route.ts)

Langfuse instrumentation (src/instrumentation.ts)

Step 10: Run the tests

Next steps

BigQuery client (`src/sync/bigquery-client.ts`)

Embedder (`src/sync/embedder.ts`)

Vector store (`src/sync/vector-store.ts`)

Cache service (`src/api/cache-service.ts`)

Budget service (`src/api/budget-service.ts`)

Confidence service (`src/api/confidence-service.ts`)

Session service (`src/api/session-service.ts`)

Cost telemetry (`src/api/cost-telemetry.ts`)

Health check (`app/api/health/route.ts`)

Query endpoint (`app/api/query/route.ts`)

Sync endpoint (`app/api/bigquery-sync/route.ts`)

Langfuse instrumentation (`src/instrumentation.ts`)