AWS Bedrock Contract Clause Extraction for SMB Legal

Automatically extract key clauses, dates, and parties from contracts using AWS Textract and Bedrock, with structured output and cost tracking.

aws-bedrock aws-textract contract-clause-extraction document-pipeline structured-output cost-tracking express typescript

The problem

Small law firms and contract-heavy SMBs manually review every agreement to find renewal dates, liability caps, and termination clauses. Missing a deadline or misreading a clause leads to unbilled work and client disputes.

Built from

Intro

This tutorial walks through building a contract clause extraction pipeline for small law firms and SMBs. You’ll build a Next.js application that ingests PDF contracts, runs OCR via AWS Textract, extracts 12 standard clause types using AWS Bedrock (Claude Sonnet 4), repairs malformed LLM JSON output automatically, and tracks per-contract cost against a daily budget. By the end you’ll have a fully tested API that produces structured contract summaries.

Prerequisites

Node.js 22+ and pnpm 10 (corepack enable && corepack prepare pnpm@10 --activate)
AWS account with access to Amazon Bedrock (Claude Sonnet 4) and Amazon Textract
AWS credentials configured via environment variables or ~/.aws/credentials
PostgreSQL database (local or remote)
Basic familiarity with Next.js App Router, TypeScript, and AWS SDK concepts

Step 1: Scaffold the Next.js project

Create a new Next.js project with TypeScript and the App Router. Pin all dependencies to exact versions.

First create the project directory and initialize it:

terminal

mkdir aws-bedrock-contract-clause-extraction
cd aws-bedrock-contract-clause-extraction
pnpm init

Add "type": "module" to your package.json, then install the core dependencies:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

166 kB·127 tests·99.6% coverage·vitest passing

SHA-256978c10dfacba31a10b67d5d56d5facbeb39c63744bc07de981b4087203fbe372

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js 22+ and pnpm 10 (corepack enable && corepack prepare pnpm@10 --activate)
AWS account with access to Amazon Bedrock (Claude Sonnet 4) and Amazon Textract
AWS credentials configured via environment variables or ~/.aws/credentials
PostgreSQL database (local or remote)
Basic familiarity with Next.js App Router, TypeScript, and AWS SDK concepts

Step 1: Scaffold the Next.js project

Create a new Next.js project with TypeScript and the App Router. Pin all dependencies to exact versions.

First create the project directory and initialize it:

terminal

mkdir aws-bedrock-contract-clause-extraction
cd aws-bedrock-contract-clause-extraction
pnpm init

Add "type": "module" to your package.json, then install the core dependencies:

// src/lib/artifact-store.ts import { Readable } from "stream"; import { ArtifactRegistry } from "@reaatech/media-pipeline-mcp-core"; import type { ArtifactMeta, ArtifactStore, StorageResult } from "@reaatech/media-pipeline-mcp-storage"; export class InMemoryArtifactRegistry extends ArtifactRegistry { } export class InMemoryArtifactStore implements ArtifactStore { private store = new Map<string, Buffer>(); async put( id: string, data: Buffer | NodeJS.ReadableStream | string, _meta: ArtifactMeta ): Promise<string> { void _meta; if (data instanceof Buffer) { this.store.set(id, data); } else if (data instanceof Readable) { const chunks: Buffer[] = []; for await (const chunk of data) { if (Buffer.isBuffer(chunk)) { chunks.push(chunk); } else { chunks.push(Buffer.from(String(chunk))); } } this.store.set(id, Buffer.concat(chunks)); } else { this.store.set(id, Buffer.from(data as string)); } return `in-memory://artifacts/${id}`; } get(id: string): Promise<StorageResult> { const data = this.store.get(id); if (!data) { return Promise.reject(new Error(`Artifact not found: ${id}`)); } return Promise.resolve({ data, meta: { id, type: "document", mimeType: "application/octet-stream", }, }); } getSignedUrl(_id: string, _expiresIn?: number): Promise<string> { void _id; void _expiresIn; return Promise.reject(new Error("getSignedUrl not supported for in-memory store")); } delete(id: string): Promise<void> { this.store.delete(id); return Promise.resolve(); } list(prefix?: string): Promise<ArtifactMeta[]> { const entries: ArtifactMeta[] = []; for (const key of this.store.keys()) { if (!prefix || key.startsWith(prefix)) { entries.push({ id: key, type: "document", mimeType: "application/octet-stream", }); } } return Promise.resolve(entries); } healthCheck(): Promise<boolean> { return Promise.resolve(true); } }

// src/telemetry/wrapper.ts import { generateId, now, loadConfig, calculateCostFromTokens, getWindowStart, getWindowEnd, type CostSpan, type Provider, } from "@reaatech/llm-cost-telemetry"; const INPUT_PRICE_PER_MILLION = 3.0; const OUTPUT_PRICE_PER_MILLION = 15.0; export class CostTracker { private spans: CostSpan[] = []; private config: ReturnType<typeof loadConfig>; constructor() { this.config = loadConfig(); } trackCall(args: { provider: string; model: string; inputTokens: number; outputTokens: number; tenant?: string; feature?: string; }): CostSpan { const inputCost = calculateCostFromTokens( args.inputTokens, INPUT_PRICE_PER_MILLION, ); const outputCost = calculateCostFromTokens( args.outputTokens, OUTPUT_PRICE_PER_MILLION, ); const span: CostSpan = { id: generateId(), provider: args.provider as Provider, model: args.model, inputTokens: args.inputTokens, outputTokens: args.outputTokens, costUsd: inputCost + outputCost, tenant: args.tenant, feature: args.feature, timestamp: now(), }; this.spans.push(span); return span; } getLastSpan(): CostSpan | undefined { return this.spans[this.spans.length - 1]; } getDailySpend(): number { const todayStart = getWindowStart(new Date(), "day"); const todayEnd = getWindowEnd(new Date(), "day"); return this.spans .filter( (s) => s.timestamp && s.timestamp >= todayStart && s.timestamp < todayEnd, ) .reduce((sum, s) => sum + s.costUsd, 0); } checkBudget(): { exceeded: boolean; dailySpend: number; limit: number } { const dailySpend = this.getDailySpend(); const limit = this.config.budget.global?.daily ?? 0; return { exceeded: dailySpend > limit, dailySpend, limit, }; } } export class BudgetExceededError extends Error { code = "BUDGET_EXCEEDED"; } export async function withCostTracking<T>( tracker: CostTracker, label: string, fn: () => Promise<{ result: T; tokens: { input: number; output: number }; }>, ): Promise<T> { const budget = tracker.checkBudget(); if (budget.exceeded) { throw new BudgetExceededError( "Daily budget of $" + String(budget.limit) + " exceeded", ); } const outcome = await fn(); tracker.trackCall({ provider: "bedrock", model: label, inputTokens: outcome.tokens.input, outputTokens: outcome.tokens.output, feature: label, }); return outcome.result; }

// app/api/extract/route.ts import { type NextRequest, NextResponse } from "next/server"; import { extractFromPdf, extractFromText } from "../../../src/pipeline/contract.js"; import { insertContract } from "../../../src/db/index.js"; import { BudgetExceededError } from "../../../src/telemetry/wrapper.js"; export async function POST(req: NextRequest) { try { const contentType = req.headers.get("content-type") ?? ""; if (contentType.includes("multipart/form-data")) { const formData = await req.formData(); const file = formData.get("file"); if (!file || !(file instanceof File)) { return NextResponse.json({ error: "No PDF file or text provided" }, { status: 400 }); } if (!file.name.toLowerCase().endsWith(".pdf")) { return NextResponse.json({ error: "Only PDF files are accepted" }, { status: 400 }); } const buffer = new Uint8Array(await file.arrayBuffer()); const result = await extractFromPdf(buffer, file.name); await insertContract({ id: result.id, filename: result.filename, rawText: result.rawText, summaryJson: result.summary, costUsd: String(result.costUsd), totalTokens: String(result.totalTokens), repairApplied: result.repairApplied, repairSteps: result.repairSteps !== undefined ? String(result.repairSteps) : null, errorMessage: result.error ?? null, createdAt: new Date(result.createdAt), }); if (result.error) { return NextResponse.json({ error: result.error, extractionResult: result }, { status: 422 }); } return NextResponse.json(result, { status: 201 }); } else { const body = await req.json() as { text: string }; if (!body.text) { return NextResponse.json({ error: "No PDF file or text provided" }, { status: 400 }); } const result = await extractFromText(body.text); await insertContract({ id: result.id, filename: result.filename, rawText: result.rawText, summaryJson: result.summary, costUsd: String(result.costUsd), totalTokens: String(result.totalTokens), repairApplied: result.repairApplied, repairSteps: result.repairSteps !== undefined ? String(result.repairSteps) : null, errorMessage: result.error ?? null, createdAt: new Date(result.createdAt), }); if (result.error) { return NextResponse.json({ error: result.error, extractionResult: result }, { status: 422 }); } return NextResponse.json(result, { status: 201 }); } } catch (error) { if (error instanceof BudgetExceededError) { return NextResponse.json({ error: "Daily extraction budget exceeded" }, { status: 429 }); } throw error; } }

AWS Bedrock Contract Clause Extraction for SMB Legal

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project

Step 2: Define contract Zod schemas

Step 3: Create the Bedrock provider

Step 4: Create the Textract OCR client

Step 5: Build the doc-extraction layer

Step 6: Build the JSON repair layer

Step 7: Add cost tracking and budget enforcement

Step 8: Build the pipeline orchestrator

Step 9: Set up the database with Drizzle

Step 10: Create the API routes

Step 11: Add the instrumentation hook

Step 12: Configure environment and run the tests

Next steps