Support teams waste time digging through scattered Confluence pages, PDFs, and SharePoint folders to answer customer tickets. Answers are slow and inconsistent, eroding trust and raising costs.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build a customer support knowledge base that answers questions by searching your help articles with AI. When a customer asks “How do I reset my password?”, the app embeds the query, finds the closest matching help articles in Pinecone, and uses Vercel AI Gateway to generate a plain-English answer with citations. Along the way you’ll add per-query budget controls, multi-turn conversation history, and an automated evaluation harness that measures retrieval quality. By the end you’ll have a working API route handler, an ingestion worker that processes markdown files, and a test suite you can run with one command.
Prerequisites
Node.js >= 22 — check with node --version
pnpm 10.x — check with pnpm --version. If you don’t have it, run corepack enable && corepack prepare pnpm@10.0.0 --activate
A Pinecone account with an API key and an existing index
An OpenAI API key (used for embeddings and generation)
Vercel AI Gateway access (a gateway URL — defaults to https://api.vercel.ai)
Familiarity with TypeScript, Next.js App Router, and environment variables
Step 1: Scaffold the project
Create an empty directory and add the three root config files your project needs. These pin every dependency version, configure TypeScript for Next.js, and wire environment variables from the shell into your Next.js server-side code.
The serverComponentsExternalPackages setting tells Next.js not to bundle the Pinecone and embedding libraries for the client — they stay on the server where native modules and API keys belong. Finally, create the source directory structure and a minimal Next.js App Router shell:
import type { Metadata } from 'next';import type { ReactNode } from 'react';export const metadata: Metadata = { title: 'Customer Support RAG', description: 'AI-powered customer support knowledge base using Vercel AI Gateway and Pinecone vector search.',};export default function RootLayout({ children }: { children: ReactNode }) { return ( <html lang="en"> <body>{
Create src/app/page.tsx:
tsx
export default function Page() { return ( <main style={{ padding: '2rem', fontFamily: 'system-ui' }}> <h1>Customer Support RAG</h1> <p> AI-powered customer support knowledge base using Vercel AI Gateway and Pinecone vector search. </p> <h2>API Endpoints</h2> <ul> <li> <code>POST /api/rag</code> — Query the knowledge base </li> <li> <code>GET /health</code> — Health check </li> </ul> </main> );}
Step 2: Install dependencies
With the package.json in place, install every dependency in one shot. This pulls in the Vercel AI SDK (ai@6.0.177), the Pinecone TypeScript client, and the full set of REAA agent packages that handle embedding, retrieval, session management, budget enforcement, and evaluation.
terminal
pnpm install
Expected output: pnpm resolves the lockfile, downloads packages, and prints a summary of installed dependencies. You should see no errors.
Step 3: Set environment variables
Create .env.example first as a template, then copy it to .env.local and fill in your real values. The app reads from .env.local automatically in development.
Create .env.example:
code
# Pinecone vector database configuration
PINECONE_API_KEY=
PINECONE_ENVIRONMENT=
PINECONE_INDEX_NAME=
# Embedding provider configuration (openai, cohere, or huggingface)
EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=
# Vercel AI Gateway (model routing)
VERCEL_AI_GATEWAY_URL=
# Budget engine: maximum cost allowed per query (USD)
BUDGET_PER_QUERY_MAX=0.05
# Session management (Firestore)
SESSION_FIRESTORE_COLLECTION=sessions
# Evaluation: path to golden dataset YAML/JSON file
EVAL_DATASET_PATH=./data/golden-dataset.yaml
Now copy and fill in your keys:
terminal
cp .env.example .env.local
Open .env.local and replace the empty values:
PINECONE_API_KEY — your Pinecone API key
PINECONE_ENVIRONMENT — your Pinecone environment (e.g. us-east-1-aws)
PINECONE_INDEX_NAME — your Pinecone index host string (e.g. my-index-abc123.svc.aped-1234.pinecone.io)
OPENAI_API_KEY — your OpenAI API key (starts with sk-)
VERCEL_AI_GATEWAY_URL — your Vercel AI Gateway endpoint (or leave it at the default)
The other fields have sensible defaults: embedding provider defaults to openai, budget cap defaults to $0.05 per query, and sessions default to a Firestore collection named sessions.
Step 4: Define TypeScript types and the environment loader
Every interface your app uses goes into a single src/types.ts file. This keeps request shapes, response shapes, chunk metadata, and eval result structures consistent across the API route, ingestion worker, and eval runner.
Next, create src/env.ts — a small module that reads and validates environment variables. Required variables throw if missing; optional ones fall back to sensible defaults.
Create src/env.ts:
ts
function requireEnv(name: string): string { const value = process.env[name]; if (!value) { throw new Error(`Missing required environment variable: ${name}`); } return value;}function optionalEnv(name: string
Step 5: Build the utility modules
Two utilities power the RAG pipeline: the chunker splits long documents into overlapping segments for embedding, and the formatter turns retrieved chunks and session history into prompts the model can consume.
Create src/utils/chunker.ts:
ts
import type { Chunk, ChunkMetadata } from '@/types';const DEFAULT_CHUNK_SIZE = 1000;const DEFAULT_CHUNK_OVERLAP = 200;
The chunker respects paragraph boundaries and uses a 200-character overlap between chunks so retrieval doesn’t lose context at chunk boundaries. Each chunk gets an ID like help/password-reset.md:0 and carries its source, title, and section metadata through to Pinecone.
Create src/utils/formatter.ts:
ts
import type { Citation, SessionTurn } from '@/types';export function formatContextFromCitations(citations: Citation[]): string { if (citations.length === 0) { return ''; } const lines = citations.map( (c, i) => `[Source ${i
Step 6: Create the RAG API route
This is the heart of the app — a route handler at POST /api/rag. It accepts a JSON body with a query field and an optional sessionId, runs a budget check, embeds the query, searches Pinecone for the top 5 matching chunks, formats the context and conversation history into a prompt, calls Vercel AI Gateway to generate an answer, persists the turn to the session, and returns the answer with citations.
Create src/api/rag/route.ts:
A few details worth noting: the embedding provider is cached in memory (5,000 entries, 5-minute TTL) so repeated queries don’t re-embed the same text. Session creation and turn persistence are best-effort — if Firestore is down the route still returns a valid answer with a locally-generated session ID. The budget controller records spend after the answer is generated so subsequent queries can be throttled if the limit is reached.
Step 7: Create the ingestion worker
The ingestion worker scans a ./knowledge-base/ directory for markdown files, parses their YAML frontmatter for metadata, chunks the body text, embeds each chunk, and upserts everything into Pinecone. It runs as a standalone script with tsx.
Create src/workers/ingest.ts:
The worker reads the INGEST_DIR environment variable (defaulting to ./knowledge-base), discovers all .md files recursively, parses each one’s YAML frontmatter for title, section, and source fields, chunks the body text, embeds each chunk with OpenAI’s text-embedding-3-small model, and upserts the vectors into Pinecone with trimmed metadata. Errors at the file and chunk level are captured in the progress report without stopping the entire batch.
Step 8: Create the eval runner
The eval runner loads a golden dataset of test questions with expected answers and sources, runs each through a simulated RAG endpoint, scores the results with a word-overlap comparison, and feeds everything into the agent-eval-harness-suite for metric aggregation and statistical comparison against a baseline.
Create src/eval/run.ts:
The runner uses a mock knowledge base so evaluation runs offline — no Pinecone or API keys needed. The compareAnswers function scores with word overlap (Jaccard-like over words longer than 2 characters). buildEvalResult maps the trajectory result into the format the eval harness expects. runEvaluation orchestrates the full pipeline: load dataset, run tests, aggregate metrics, compare against a synthetic baseline, and return a structured output with pass/fail counts and overall scores.
Step 9: Write and run the test suite
The project uses Vitest with happy-dom as the test environment. All external dependencies are mocked so tests run without network access. First, create the Vitest configuration and global setup.
This produces a vitest-report.json with merged coverage data and enforces thresholds of 90% lines, 89% branches, 90% functions, and 90% statements.
Step 10: Try it end to end
You need some markdown help articles to ingest. Create a sample knowledge base:
terminal
mkdir -p knowledge-base/help
Create knowledge-base/help/password-reset.md:
code
---
title: Password Reset Guide
section: account
source: help/password-reset.md
---
## Password Reset
You can reset your password via the Settings page.
Follow these steps:
1. Go to Settings
2. Click Security
3. Choose Reset Password
If you don't remember your current password, use the "Forgot Password" link on the login page.
Create knowledge-base/help/billing.md:
code
---
title: Billing FAQ
section: billing
source: help/billing.md
---
## Billing
Billing cycles run from the 1st to the last day of each month.
Invoices are generated on the 3rd of the following month. You can view past invoices
in the Billing section of your account settings.
Now ingest everything into Pinecone:
terminal
pnpm ingest
Expected output: JSON describing each file processed, chunk counts, and any errors:
Expected output: Next.js starts and prints Ready in ...s with a http://localhost:3000 URL.
In another terminal, send a query to the RAG endpoint:
terminal
curl -X POST http://localhost:3000/api/rag \ -H "Content-Type: application/json" \ -d '{"query": "How do I reset my password?"}'
Expected output: a JSON response with an answer, citations, and a session ID:
json
{ "answer": "You can reset your password via the Settings page. ...", "citations": [ { "source": "help/password-reset.md", "title": "Password Reset Guide", "section": "account", "excerpt": "You can reset your password via the Settings page." } ], "sessionId": "abc-123-def", "messageCount": 2}
Send a follow-up query with the same sessionId to test multi-turn conversation — the system prompt will include the previous exchange so the model can maintain context.
Finally, run the evaluation harness to measure retrieval quality:
terminal
pnpm eval
Expected output: the runner loads the golden dataset, scores each test case, exports the aggregated results as JSON, and prints a comparison summary:
code
Eval complete: 3/3 passed (1)
Next steps
Write your own golden dataset in data/golden-dataset.yaml with real support questions and expected answers to track retrieval quality over time as your knowledge base grows
Expand the page at src/app/page.tsx with a chat UI that calls the /api/rag endpoint — the route handler already returns sessionId so you can thread conversations
Replace the OpenAI embedding provider with Cohere or HuggingFace by changing the import in src/api/rag/route.ts and updating EMBEDDING_PROVIDER in .env.local
Deploy to Vercel and set your environment variables in the project dashboard — the serverComponentsExternalPackages in next.config.js already knows which packages stay on the server