Small business owners manually read every signed contract to identify renewal dates, liability clauses, and payment terms, a process prone to missed deadlines and compliance gaps.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial you’ll build a contract review pipeline that automatically extracts, summarizes, and validates key clauses from incoming DocuSign contracts. You’ll connect DocuSign’s eSignature API, extract PDF text (with optional OCR preprocessing for scanned documents), chunk and embed the content in a hybrid RAG store, query contracts with natural language using Google Gemini, and run regression evaluations against golden trajectory datasets. The pipeline is exposed as a set of Next.js API routes, making it easy to hook into DocuSign Connect webhooks or call programmatically.
This recipe is for developers who want a reference implementation of a document AI pipeline — from ingestion through hybrid retrieval to LLM-powered clause extraction — using the @reaatech/* package family alongside Google’s Vertex AI (Gemini) and DocuSign APIs.
Prerequisites
Node.js >= 22 and pnpm 10.x
A GCP project with the Vertex AI API enabled (for Gemini model access)
Now create a .env.example that captures every environment variable the pipeline needs:
env
# Env vars used by vertex-ai-document-pipeline-for-docusign-smb-contract-review.# Keep placeholders only — never commit real values.NODE_ENV=development# Vertex AI (Gemini)GOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1GOOGLE_GENAI_USE_VERTEXAI=trueGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json# DocuSign eSignature APIDOCUSIGN_ACCESS_TOKEN=<your-docusign-access-token>DOCUSIGN_ACCOUNT_ID=<your-docusign-account-id>DOCUSIGN_BASE_URL=https://demo.docusign.net/restapiDOCUSIGN_HMAC_SECRET=<your-webhook-hmac-key># EmbeddingsOPENAI_API_KEY=<your-openai-key># API authenticationAPI_KEY=<your-api-key-for-eval-endpoint>
Expected output: A working Next.js project with all dependencies installed. Running pnpm typecheck should pass with no errors.
Step 2: Define the domain types
Create src/types/contract.ts with the core domain interfaces that represent contracts, clauses, ingestion results, review responses, and evaluation runs. These types are used throughout the entire pipeline.
Create src/types/index.ts to re-export your types plus the ones from @reaatech/hybrid-rag that you’ll use across services:
ts
export type { ClauseType, ContractClause, ContractDocument, IngestionResult, ContractReviewRequest, ContractReviewResponse, EvaluationRun,} from "./contract.js";export type { Document, Chunk, ChunkingConfig, ChunkingStrategy, RetrievalResult, HybridResult, EvaluationSample, EvaluationResult,} from "@reaatech/hybrid-rag";
Expected output: TypeScript types for the full domain model. The ClauseType union covers the seven clause categories the pipeline recognizes.
Step 3: Build the error hierarchy
Create src/lib/errors.ts with a typed PipelineError base class and four concrete subclasses. Each error carries a machine-readable code and an HTTP status code that the route handlers use when returning error responses.
Expected output: Five error classes you can throw and catch in the route handlers. DocumentParseError maps to 400, ExternalServiceError to 502, and the others to 500.
Step 4: Implement the BM25 keyword searcher
Create src/lib/bm25.ts with an in-memory BM25 scorer. This provides the keyword-based arm of the hybrid retrieval system — no external search index required.
Expected output: A self-contained BM25 implementation using the classic Okapi BM25 formula. The source field on each result is set to "bm25" so the fusion step can identify which arm produced it.
Step 5: Implement vector search and reciprocal rank fusion
Create src/lib/vector-search.ts with cosine similarity, a vector search function, and a reciprocal rank fusion (RRF) function that combines vector and BM25 results into a single ranked list.
ts
import type { RetrievalResult, HybridResult } from "@reaatech/hybrid-rag";export function cosineSimilarity(a: number[], b: number[]): number { const aNorm = Math.sqrt(a.reduce((s, v) => s + v * v, 0)); const bNorm = Math.sqrt(b.reduce((s, v) => s + v * v, 0)); if (aNorm === 0 || bNorm === 0) return 0;
Expected output: A complete RRF implementation with constant k=60 and alpha=0.5 balance. Results that only appear in one arm get a default rank of 60 for the missing arm, ensuring they still appear in the combined output.
Step 6: Build the DocuSign API service
Create src/services/docusign-service.ts. This service wraps the docusign-esign SDK to fetch envelopes and documents, and parses webhook events.
ts
import docusign from "docusign-esign";import pRetry from "p-retry";import type { ContractDocument } from "../types/contract.js";import { ExternalServiceError } from "../lib/errors.js";const { ApiClient, EnvelopesApi } = docusign;export class DocuSignService { private readonly apiClient: InstanceType<typeof ApiClient>; private readonly envelopesApi: InstanceType<typeof EnvelopesApi>; private readonly accountId: string; constructor( accessToken: string, accountId: string, baseUrl: string, ) { this.accountId = accountId; this.apiClient = new ApiClient(); this.apiClient.setBasePath(baseUrl); this.apiClient.addDefaultHeader("Authorization", `Bearer ${accessToken}`); this.envelopesApi = new EnvelopesApi(this.apiClient); } async fetchEnvelope(envelopeId: string): Promise<ContractDocument> { const envelope = await pRetry( () => this.envelopesApi.getEnvelope(this.accountId, envelopeId), { retries: 3, onFailedAttempt: (context: { error: Error }) => { throw new ExternalServiceError( `Failed to fetch envelope ${envelopeId}: ${context.error.message}`, context.error, ); }, }, ); return { id: envelope.envelopeId, title: envelope.emailSubject, parties: [], envelopeId: envelope.envelopeId, }; } async fetchDocument(envelopeId: string): Promise<Buffer> { const response = await pRetry( () => this.envelopesApi.getDocument(this.accountId, envelopeId, "combined"), { retries: 3, onFailedAttempt: (context: { error: Error }) => { throw new ExternalServiceError( `Failed to fetch document for envelope ${envelopeId}: ${context.error.message}`, context.error, ); }, }, ); return Buffer.from(response); } processWebhookEvent(payload: unknown): { envelopeId: string; status: string } { if (!payload || typeof payload !== "object") { throw new ExternalServiceError( "Invalid webhook payload: expected an object", payload, ); } const data = payload as Record<string, unknown>; const envelopeId = (data.envelopeId as string | undefined) ?? ((data.envelope as Record<string, unknown> | undefined)?.envelopeId as string | undefined); const status = (data.status as string | undefined) ?? ((data.envelope as Record<string, unknown> | undefined)?.status as string | undefined); if (!envelopeId || !status) { throw new ExternalServiceError( "Invalid webhook payload: missing envelopeId or status", payload, ); } return { envelopeId, status }; }}
Create src/types/docusign.d.ts to provide type declarations for the docusign-esign CommonJS module:
Expected output: A service that can fetch DocuSign envelope metadata, download the signed PDF as a Buffer, and parse a DocuSign Connect webhook event. All API calls are wrapped with p-retry for resilience. The CJS interop module declaration handles the docusign-esign CommonJS default export.
Step 7: Build the document processor
Create src/services/document-processor.ts. This service extracts text from PDFs (using pdf-parse), normalizes scanned documents with sharp, and validates output with @reaatech/hybrid-rag-ingestion’s DocumentValidator.
Expected output: A processor that handles both digital PDFs (direct text extraction) and scanned documents (sharp grayscale → normalise → threshold pipeline). Each document gets a nanoid ID and is validated for size and content length before being returned as a @reaatech/hybrid-ragDocument.
Step 8: Wire the hybrid RAG store
Create src/services/rag-store.ts. This is the in-memory hybrid store that uses @reaatech/hybrid-rag-ingestion’s ChunkingEngine to split documents, @reaatech/hybrid-rag-embedding’s EmbeddingService to generate vectors, and your custom BM25 + vector search to support both keyword and semantic retrieval.
ts
import type { Document, Chunk, ChunkingConfig, RetrievalResult, HybridResult } from "@reaatech/hybrid-rag";import { ChunkingStrategy } from "@reaatech/hybrid-rag";import { ChunkingEngine } from "@reaatech/hybrid-rag-ingestion";import type { EmbeddingService } from "@reaatech/hybrid-rag-embedding";import { searchByVector, fuseHybridResults } from "../lib/vector-search.js";import { BM25Searcher } from "../lib/bm25.js";export class RagStore { private readonly embeddingService: EmbeddingService; private readonly documents = new Map<string, Document>(); private
Expected output: A complete in-memory hybrid RAG store. ingestDocument chunks via ChunkingEngine, embeds each chunk via EmbeddingService, and stores both. hybridSearch runs vector cosine similarity AND BM25 in parallel, then fuses them with RRF.
Step 9: Build the Gemini clause extractor
Create src/services/clause-extractor.ts. This service uses Google’s @google/genai SDK to call Gemini 2.5 Flash for extracting clauses, answering review queries, and summarizing documents. It uses jsonrepair to fix common LLM JSON errors (trailing commas, unquoted keys).
ts
import { GoogleGenAI } from "@google/genai";import { jsonrepair } from "jsonrepair";import type { ContractClause, ClauseType } from "../types/contract.js";import { ReviewError } from "../lib/errors.js";function isValidClause(value: unknown): value is ContractClause { if (typeof value !== "object" || value === null) return false; const obj = value as Record<string, unknown>; if (typeof
Expected output: A clause extractor that sends structured prompts to Gemini 2.5 Flash, repairs and validates the JSON response, and returns typed ContractClause[]. The answerReviewQuery method uses a similar pattern with a context-aware review prompt. summarizeDocument produces a single-paragraph summary of any contract.
Step 10: Build the evaluation service
Create src/services/evaluation-service.ts. This service loads golden trajectory datasets and runs regression checks using @reaatech/agent-eval-harness-golden.
Now create a golden trajectory dataset at ./golden-set.jsonl with three sample entries covering common clause types:
jsonl
{"id":"golden-001","metadata":{"version":"1.0","tags":["termination","payment"],"description":"Contract with termination and payment clauses"},"trajectory":[{"turnId":"1","role":"user","content":"Extract clauses from this contract."},{"turnId":"2","role":"assistant","content":"I found a termination clause and a payment clause.","clauses":[{"type":"termination","text":"Either party may terminate this Agreement upon 30 days notice.","confidence":0.95},{"type":"payment","text":"All payments are due within 30 days of invoice.","confidence":0.92}]}]}{"id":"golden-002","metadata":{"version":"1.0","tags":["liability","indemnification"],"description":"Contract with liability and indemnification clauses"},"trajectory":[{"turnId":"1","role":"user","content":"Review liability terms."},{"turnId":"2","role":"assistant","content":"The contract limits liability to fees paid in the preceding 12 months.","clauses":[{"type":"liability","text":"Neither party shall be liable for indirect damages.","confidence":0.94},{"type":"indemnification",{"id":"golden-003","metadata":{"version":"1.0","tags":["confidentiality","renewal"],"description":"Contract with confidentiality and renewal provisions"},"trajectory":[{"turnId":"1","role":"user","content":"Summarize this agreement."},{"turnId":"2","role":"assistant","content":"This is a master services agreement with confidentiality obligations lasting 3 years and automatic renewal.","clauses":[{"type":"confidentiality","text":"Each party agrees to maintain confidentiality for 3 years.","confidence":0.96},{"type":"renewal"
Expected output: An evaluation service that loads .jsonl golden trajectories, scores pipeline output against them, detects regressions below a 0.85 similarity threshold, and generates formatted text reports.
Step 11: Create the pipeline orchestrator and wire it together
Create src/services/pipeline-orchestrator.ts that orchestrates the end-to-end pipeline — from fetching a DocuSign envelope through processing, chunking, embedding, and storing, to reviewing and evaluating.
ts
import type { IngestionResult, ContractReviewResponse, EvaluationRun } from "../types/contract.js";import type { DocuSignService } from "./docusign-service.js";import type { DocumentProcessor } from "./document-processor.js";import type { RagStore } from "./rag-store.js";import type { ClauseExtractor } from "./clause-extractor.js";import type { EvaluationService } from "./evaluation-service.js";import type { EmbeddingService } from "@reaatech/hybrid-rag-embedding";export class ContractPipelineOrchestrator { constructor( private readonly deps: {
Now create src/lib/orchestrator-instance.ts as a singleton factory that wires all the services together:
ts
import { EmbeddingService } from "@reaatech/hybrid-rag-embedding";import { DocuSignService } from "../services/docusign-service.js";import { DocumentProcessor } from "../services/document-processor.js";import { RagStore } from "../services/rag-store.js";import { ClauseExtractor } from "../services/clause-extractor.js";import { EvaluationService } from "../services/evaluation-service.js";import { ContractPipelineOrchestrator } from "../services/pipeline-orchestrator.js";let instance: ContractPipelineOrchestrator | null = null;export function getOrchestrator(): ContractPipelineOrchestrator { if (instance) return instance; const embeddingService = new EmbeddingService({ provider: "openai", model: "text-embedding-3-small", apiKey: process.env.OPENAI_API_KEY ?? "", }); const docusign = new DocuSignService( process.env.DOCUSIGN_ACCESS_TOKEN ?? "", process.env.DOCUSIGN_ACCOUNT_ID ?? "", process.env.DOCUSIGN_BASE_URL ?? "https://demo.docusign.net/restapi", ); const documentProcessor = new DocumentProcessor(); const ragStore = new RagStore(embeddingService); const clauseExtractor = new ClauseExtractor( process.env.GOOGLE_CLOUD_PROJECT ?? "", process.env.GOOGLE_CLOUD_LOCATION ?? "us-central1", ); const evaluationService = new EvaluationService("./golden-set.jsonl"); instance = new ContractPipelineOrchestrator({ docusign, documentProcessor, ragStore, clauseExtractor, evaluationService, embeddingService, }); return instance;}export function getRagStore(): RagStore { const orchestrator = getOrchestrator(); return orchestrator.getRagStore();}
Expected output: A fully wired pipeline. getOrchestrator() returns a singleton with all services instantiated from environment variables. It’s called by the route handlers to perform ingestion, review, and evaluation.
Step 12: Add the route handlers
Create four API routes under app/api/contracts/. Start with the ingestion endpoint.
app/api/contracts/ingest/route.ts — Accepts an envelopeId or webhookPayload. Validates with Zod, calls the orchestrator, returns 202 with the IngestionResult.
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { getOrchestrator } from "@/src/lib/orchestrator-instance";import { PipelineError } from "@/src/lib/errors";const ingestSchema = z.object({ envelopeId: z.string().optional(), webhookPayload: z.unknown().optional(),});export async function POST(req: NextRequest) { try { const body: unknown = await req.json(); const parsed = ingestSchema.parse(body); const orchestrator = getOrchestrator(); let result; if (parsed.webhookPayload !== undefined) { result = await orchestrator.ingestFromWebhook(parsed.webhookPayload); } else if (parsed.envelopeId) { result = await orchestrator.ingestContract(parsed.envelopeId); } else { return NextResponse.json( { error: "Must provide envelopeId or webhookPayload" }, { status: 400 }, ); } return NextResponse.json(result, { status: 202 }); } catch (e) { if (e instanceof SyntaxError) { return NextResponse.json( { error: "Invalid JSON in request body" }, { status: 400 }, ); } if (e instanceof z.ZodError) { return NextResponse.json( { error: "Invalid request body", details: e.issues }, { status: 400 }, ); } if (e instanceof PipelineError) { return NextResponse.json( { error: e.message, code: e.code }, { status: e.statusCode }, ); } throw e; }}
app/api/contracts/review/route.ts — Accepts a query string and optional topK/envelopeId. Returns the review answer, clauses, and sources.
ts
import { type NextRequest, NextResponse } from "next/server";import { z } from "zod";import { getOrchestrator } from "@/src/lib/orchestrator-instance";import { PipelineError } from "@/src/lib/errors";const reviewSchema = z.object({ query: z.string().min(1), topK: z.number().optional(), envelopeId: z.string().optional(),});export async function POST(req: NextRequest) { try { const body: unknown = await req.json(); const parsed = reviewSchema.parse(body); const orchestrator = getOrchestrator(); const result = await orchestrator.reviewContract( parsed.query, parsed.topK, parsed.envelopeId, ); return NextResponse.json(result); } catch (e) { if (e instanceof SyntaxError) { return NextResponse.json( { error: "Invalid JSON in request body" }, { status: 400 }, ); } if (e instanceof z.ZodError) { return NextResponse.json( { error: "Invalid request body", details: e.issues }, { status: 400 }, ); } if (e instanceof PipelineError) { return NextResponse.json( { error: e.message, code: e.code }, { status: e.statusCode }, ); } throw e; }}
app/api/contracts/evaluate/route.ts — Protected by X-API-Key header. Runs the full golden evaluation batch.
ts
import { type NextRequest, NextResponse } from "next/server";import { getOrchestrator } from "@/src/lib/orchestrator-instance";import { PipelineError } from "@/src/lib/errors";export async function POST(req: NextRequest) { try { const apiKey = req.headers.get("x-api-key"); if (apiKey !== process.env.API_KEY) { return NextResponse.json( { error: "Unauthorized" }, { status: 401 }, ); } const orchestrator = getOrchestrator(); const result = await orchestrator.runEvaluationBatch(); return NextResponse.json(result); } catch (e) { if (e instanceof PipelineError) { return NextResponse.json( { error: e.message, code: e.code }, { status: e.statusCode }, ); } throw e; }}
app/api/contracts/health/route.ts — Returns pipeline status including document and chunk counts.
ts
import { NextResponse } from "next/server";import { getRagStore } from "@/src/lib/orchestrator-instance";export function GET() { const ragStore = getRagStore(); return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), stats: { documentCount: ragStore.documentCount, chunkCount: ragStore.chunkCount, }, });}
Expected output: Four API routes. /ingest POST, /review POST, /evaluate POST (auth-protected), and /health GET. All use NextResponse.json() for proper Content-Type headers and return typed error responses for PipelineError subclasses.
Step 13: Add the instrumentation
Create src/instrumentation.ts to warm up the Gemini connection on server startup. The dynamic import() inside register() ensures the module only loads in the Node.js runtime, not Edge.
Expected output: The pipeline pre-warms the Gemini connection during server startup, reducing cold-start latency on the first real request.
Step 14: Add tests and verify
Create a test fixture at tests/fixtures/sample-contract.txt with a realistic mock contract covering all seven clause types. Here’s a representative test for the BM25 searcher:
Expected output: All tests passing with numFailedTests: 0. Coverage thresholds of 90%+ on lines, branches, functions, and statements for runtime code (files under src/ and app/**/route.ts).
Next steps
Add a vector database backend — Replace the in-memory Map<string, number[]> with Qdrant or Pinecone for production-scale document stores. The @reaatech/hybrid-rag-qdrant adapter is a drop-in replacement.
Add DocuSign Connect webhook verification — Validate incoming webhooks using DOCUSIGN_HMAC_SECRET to verify HMAC signatures before processing.
Extend the evaluation harness — Add more golden trajectory samples covering edge cases (multi-page contracts, non-standard clause language, scanned PDFs) and automate nightly evaluation runs with a cron schedule.
let
dot
=
0
;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
}
return dot / (aNorm * bNorm);
}
export function searchByVector(
queryVector: number[],
embeddings: Map<string, number[]>,
topK: number,
): RetrievalResult[] {
const scored: { chunkId: string; score: number }[] = [];
const prompt = `Extract all legal clauses from the following contract text. Return ONLY a valid JSON array of objects with fields: "type" (one of: termination, liability, payment, renewal, confidentiality, indemnification, other), "text" (the clause text), "pageRef" (optional page number), and "confidence" (0.0 to 1.0).