Construction SMBs store technical specs in scattered PDFs and drive‑by memory, forcing builders to pause work and call the office. A simple RAG system would give them hands‑free answers, but off‑the‑shelf tools return shallow results from dense technical documents.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You will build a hybrid RAG (Retrieval-Augmented Generation) pipeline that lets construction field crews query building specifications, codes, and submittals stored as PDFs in S3. The system uses Voyage AI embeddings, Qdrant as the vector store, BM25 keyword search, a Cohere cross-encoder reranker, and Claude Haiku to generate answers constrained to the source documents.
By the end you will have a Next.js API with two endpoints: one to ingest PDFs from S3 into Qdrant, and one to query the knowledge base with natural language.
Prerequisites
Node.js 22+ installed
pnpm 10+ installed (the project uses pnpm)
A Qdrant instance running (easiest with Docker: docker run -p 6333:6333 qdrant/qdrant)
An S3 bucket with at least one PDF file (or a bucket you can upload test PDFs to)
API keys for: Anthropic, Voyage AI, Cohere, and AWS S3
A Langfuse account for observability (optional — the code degrades gracefully without it)
Step 1: Clone the project and install dependencies
The project already has a scaffolded Next.js 16 (App Router) structure. You do not need to create it from scratch.
terminal
cd /home/rick/solutions-worker/builds/24186da5-ecbc-4687-b235-c3dac0a40bf3pnpm install
Expected output: pnpm lists all installed packages and finishes without errors. If you see peer dependency warnings they are safe to ignore.
Step 2: Configure environment variables
Copy the example env file and fill in your API keys.
terminal
cp .env.example .env
Open .env and set the values. Here is what each variable controls:
env
# Qdrant connectionQDRANT_URL=http://localhost:6333QDRANT_API_KEY= # leave blank if Qdrant has no auth# Anthropic ClaudeANTHROPIC_API_KEY=# Voyage AI embeddingsVOYAGE_API_KEY=# Cohere rerankerCOHERE_API_KEY=# AWS S3AWS_REGION=us-east-1AWS_ACCESS_KEY_ID=AWS_SECRET_ACCESS_KEY=S3_BUCKET_NAME= # your bucket name, e.g. my-construction-specs# Protect the ingest endpointINGEST_API_KEY=change-me-to-a-secret-value# Langfuse observability (optional — leave blank to disable)LANGFUSE_PUBLIC_KEY=LANGFUSE_SECRET_KEY=
Step 3: Explore the source layout
Here is what was built for you:
text
app/api/ chat/route.ts # POST /api/chat — query the knowledge base ingest/route.ts # POST /api/ingest — trigger PDF ingestionsrc/ lib/ embedding.ts # VoyageEmbeddingAdapter wraps voyageai client ingestion.ts # IngestionService: S3 → pdf-parse → chunk → Qdrant retrieval.ts # HybridRetriever wiring + VoyageEmbeddingBridge generation.ts # Claude Haiku answer generation (sync + streaming) observability.ts # Langfuse tracing (no-op when keys are absent) scripts/ ingest.ts # CLI: pnpm ingest [prefix] types/ index.ts # ChatRequest, ChatResponse, IngestionResult types
Step 4: Run the test suite
The project ships with a full test suite covering every module. Run it to verify the build is healthy.
terminal
pnpm test
Expected output: vitest reports all tests passing, with a vitest-report.json showing numFailedTests: 0 and coverage thresholds above 90% for lines, branches, functions, and statements on the src/ and app/api/ paths.
Step 5: Type-check and lint
Two independent quality gates guard the codebase.
terminal
pnpm typecheckpnpm lint
Expected output: both commands exit 0 with no errors or warnings.
Step 6: Ingest PDF specs from S3
With your S3 bucket set up and credentials in .env, run the CLI ingestion script.
terminal
pnpm ingest
You can optionally narrow the scope to a prefix:
terminal
pnpm ingest "specs/2024/"
What happens inside the script (src/scripts/ingest.ts):
Expected output: the script prints a summary such as Ingested 3 files, 47 chunks created, 0 errors.
The IngestionService class (src/lib/ingestion.ts) lists PDF objects from the bucket, downloads each one, extracts text with pdf-parse, splits the text into overlapping chunks, generates embeddings, and upserts each chunk into Qdrant.
Step 7: Query the knowledge base
Start the dev server in one terminal:
terminal
pnpm dev
In another terminal, send a natural-language query to the chat endpoint:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -d '{"query": "What is the concrete compressive strength requirement?"}'
Expected output:
json
{ "answer": "The concrete compressive strength is 3000 psi at 28 days (per Section 3.2 of doc-001).", "sources": [ { "content": "Concrete compressive strength: 3000 psi minimum at 28 days per ASTM C39.", "documentId": "specs/2024/section-3.pdf", "score": 0.94, "source": "vector" } ]}
The POST handler in app/api/chat/route.ts validates the request body against a Zod schema, calls getRetriever() to get the lazy singleton HybridRetriever, invokes retriever.retrieve(query, { retrievalMode, topK }), maps the results, and passes them to generateAnswer. If the Accept header includes text/event-stream, the route returns a streaming response instead.
Step 8: Inspect the embedding adapter
The VoyageEmbeddingAdapter in src/lib/embedding.ts wraps the raw VoyageAIClient from the voyageai package. It adds cost tracking and automatic batching with 429 retry handling.
Expected output: calling adapter.embed("concrete specs") returns a 1024-dimensional vector for voyage-3-lite (or 2048 for voyage-3). Calling adapter.embedBatch with 250 texts makes exactly 3 API calls (100 + 100 + 50).
Step 9: Understand the hybrid retriever
The VoyageEmbeddingBridge in src/lib/retrieval.ts extends the abstract EmbeddingService from @reaatech/hybrid-rag-embedding. The provider: 'vertex' config is the extension point for plugging in a custom embedder.
To receive answers token-by-token instead of waiting for the full response, send the Accept: text/event-stream header:
terminal
curl -X POST http://localhost:3000/api/chat \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"query": "What is the rebar spacing spec?"}' \ --no-buffer
Expected output: the response streams back as Server-Sent Events. Each chunk is a plain text fragment that accumulates into the final answer.
The streaming implementation in src/lib/generation.ts uses the Anthropic SDK’s messages.stream() and yields content_block_delta events with text_delta type: