A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This recipe builds a document pipeline that accepts lease PDFs and scanned images via a Next.js API, extracts text using unpdf or Tesseract OCR, sends the text to AWS Bedrock (Anthropic Claude) for structured data extraction, then pushes the result to the AppFolio property management platform. You’ll wire up a budget-enforcement layer via @reaatech/agent-budget-engine, a human-approval gate via @reaatech/tool-use-firewall-core, a JSON repair fallback via @reaatech/structured-repair-core, and cost telemetry via @reaatech/llm-cost-telemetry. The result replaces hours of manual data entry with an automated extraction flow.
Prerequisites
Node.js 22+ and pnpm 10 installed
An AWS account with Bedrock access and Claude Sonnet 4 enabled in your region
An AppFolio instance with an API key and base URL
Familiarity with Next.js App Router route handlers and TypeScript
Step 1: Scaffold the project and install dependencies
Create a Next.js project with TypeScript, then install all dependencies with exact versions:
Open .env.example and add the following entries — these are the environment variables the pipeline reads at runtime:
env
# Env vars used by aws-bedrock-document-pipeline-for-appfolio-lease-data-extraction.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentAWS_REGION=<your-aws-region>AWS_ACCESS_KEY_ID=<your-access-key>AWS_SECRET_ACCESS_KEY=<your-secret>BEDROCK_MODEL_ID=anthropic.claude-sonnet-4-v1:0APPFOLIO_API_KEY=<your-appfolio-api-key>APPFOLIO_BASE_URL=<your-appfolio-instance-url>MAX_UPLOAD_SIZE_MB=20DEFAULT_DAILY_BUDGET=10.0
Expected output: A Next.js project with all dependencies installed. Running pnpm typecheck should succeed with no errors before you’ve added any source code.
Step 2: Define the LeaseData schema with Zod
The central data structure is a lease record that matches the AppFolio API shape. Create src/types/lease.ts:
Expected output: Error classes with HTTP-friendly statusCode properties and file-type detection from raw bytes.
Step 5: Build the document text extractors
Create three extraction services. First, the PDF extractor using unpdf at src/services/pdf-extractor.ts:
ts
import { extractText, getDocumentProxy } from "unpdf";import { DocumentProcessingError } from "../lib/errors";export async function extractTextFromPdf(buffer: Uint8Array): Promise<string> { try { const pdf = await getDocumentProxy(new Uint8Array(buffer)); const result = await extractText(pdf, { mergePages: true }); return result.text; } catch (error) { const message = error instanceof Error ? error.message : String(error); throw new DocumentProcessingError(`PDF extraction failed: ${message}`); }}
Next, the image preprocessor using sharp at src/services/image-preprocessor.ts:
ts
import sharp from "sharp";export async function preprocessImage(buffer: Buffer): Promise<Buffer> { return sharp(buffer).grayscale().normalize().toBuffer();}
Then the OCR extractor using Tesseract at src/services/ocr-extractor.ts:
ts
import { createWorker } from "tesseract.js";import { DocumentProcessingError } from "../lib/errors";export async function extractTextFromImage(buffer: Buffer, language = "eng"): Promise<string> { let worker; try { worker = await createWorker(language); const ret = await worker.recognize(buffer); return ret.data.text; } catch (error) { const message = error instanceof Error ? error.message : String(error); throw new DocumentProcessingError(`OCR extraction failed: ${message}`); } finally { if (worker) { await worker.terminate(); } }}
Expected output: Three files — one for PDF text extraction, one for image preprocessing, one for OCR — each wrapping its library and converting errors to your DocumentProcessingError.
Step 6: Route documents to the right extractor
Create src/pipeline/extract.ts which inspects the buffer and file metadata to decide between PDF and OCR processing:
ts
import { isPdf, isImage } from "../lib/file-utils";import { extractTextFromPdf } from "../services/pdf-extractor";import { preprocessImage } from "../services/image-preprocessor";import { extractTextFromImage } from "../services/ocr-extractor";import type { ProcessingResult } from "../types/pipeline";import { DocumentProcessingError } from "../lib/errors";export async function processDocument( buffer: Buffer, filename: string, mimeType?: string,): Promise<ProcessingResult> { const buf = new Uint8Array(buffer); if (isPdf(buf) || filename.endsWith(".pdf") || mimeType === "application/pdf") { const text = await extractTextFromPdf(buf); return { text, method: "pdf" }; } if (isImage(buf) || /\.(png|jpg|jpeg|tiff|bmp)$/i.test(filename) || (mimeType && mimeType.startsWith("image/"))) { const preprocessed = await preprocessImage(buffer); const text = await extractTextFromImage(preprocessed); return { text, method: "ocr" }; } const ext = filename.includes(".") ? filename.split(".").pop() : "unknown"; throw new DocumentProcessingError(`unsupported file type: ${ext ?? "unknown"}`);}
Expected output: A processDocument function that returns { text, method } — method is "pdf" or "ocr" so downstream code can log the extraction path.
Step 7: Extract lease data with AWS Bedrock
Create the Bedrock client singleton at src/services/bedrock.ts:
ts
import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";import { loadConfig } from "../config";let client: BedrockRuntimeClient | null = null;export function getClient(): BedrockRuntimeClient { if (!client) { client = new BedrockRuntimeClient({ region: loadConfig().awsRegion }); } return client;}
Now create the lease extraction service at src/services/lease-extractor.ts. This sends the extracted text to Claude via the Bedrock Converse API with a prompt that asks for structured JSON output:
ts
import { ConverseCommand } from "@aws-sdk/client-bedrock-runtime";import { getClient } from "./bedrock";import { loadConfig } from "../config";import { ExtractionError } from "../lib/errors";import { verifyBudget } from "./budget-provider";import { recordBedrockCall } from "./cost-tracker";export async function extractLeaseFromText(rawText: string): Promise<string> { const config = loadConfig(); const client = getClient(); verifyBudget(0.05, config.bedrockModelId); const systemPrompt = "You are a document data extraction assistant. Extract the following fields from the lease document text: tenantName, propertyAddress, leaseStart (ISO date), leaseEnd (ISO date), monthlyRent (number), securityDeposit (number), lateFee (number, optional), utilitiesIncluded (string array, optional), notes (string, optional). Return ONLY a valid JSON object matching these fields. Do not include markdown fences or explanatory text."; const userMessage = rawText.length > 100000 ? rawText.slice(0, 100000) : rawText; const command = new ConverseCommand({ modelId: config.bedrockModelId, messages: [ { role: "user", content: [{ text: systemPrompt + "\n\n" + userMessage }], }, ], inferenceConfig: { maxTokens: 4096, temperature: 0.1, }, }); let response; try { response = await client.send(command); } catch (error) { const message = error instanceof Error ? error.message : String(error); throw new ExtractionError(`Bedrock request failed: ${message}`); } const content = response.output?.message?.content; if (Array.isArray(content) && content.length === 0) { throw new ExtractionError("Bedrock returned empty content"); } const text = content?.[0]?.text ?? ""; if (Array.isArray(content) && content.length > 0 && !text) { return ""; } const inputTokens = response.usage?.inputTokens ?? 0; const outputTokens = response.usage?.outputTokens ?? 0; recordBedrockCall(config.bedrockModelId, inputTokens, outputTokens, "bedrock"); return text;}
Expected output: A function that takes raw text, calls Bedrock, and returns a raw JSON string. It also checks the budget before the call and records cost telemetry after.
Step 8: Add structured repair with @reaatech/structured-repair-core
Bedrock sometimes returns malformed JSON (trailing commas, truncated output, unquoted keys). The @reaatech/structured-repair-core package handles repair against a Zod schema. Create src/services/repair-service.ts:
ts
import { repair, repairOutput, UnrepairableError } from "@reaatech/structured-repair-core";import { LeaseSchema, type LeaseData } from "../types/lease";import { RepairError } from "../lib/errors";export async function repairLeaseOutput(rawJson: string): Promise<LeaseData> { try { const data = await repair(LeaseSchema, rawJson); return data; } catch (error) { if (error instanceof UnrepairableError) { throw new RepairError(error.message); } throw error; }}export function repairWithDiagnostics(rawJson: string) { const result = repairOutput({ schema: LeaseSchema, input: rawJson, debug: true, onFailure: (ctx) => { console.error("Repair failed", ctx.errors); }, }); return result;}
Expected output:repairLeaseOutput takes the raw JSON string from Bedrock and returns validated LeaseData — or throws RepairError if the JSON is beyond repair.
Step 9: Set up budget enforcement and cost telemetry
The budget engine prevents runaway costs. Create src/services/budget-provider.ts:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { BudgetScope } from "@reaatech/agent-budget-types";import { loadConfig } from "../config";import { BudgetExceededError } from "../lib/errors";let _controller: BudgetController | null = null;let _initialized = false;export function getController(): BudgetController { if (!_controller) { const store = new SpendStore(); _controller = new BudgetController({ spendTracker: store }); } return _controller;}export function initBudget(): BudgetController { const ctrl = getController(); if (!_initialized) { const config = loadConfig(); ctrl.defineBudget({ scopeType: BudgetScope.User, scopeKey: "*", limit: config.dailyBudget, policy: { softCap: 0.8, hardCap: 1.0, autoDowngrade: [], disableTools: [] }, }); _initialized = true; } return ctrl;}export function verifyBudget(estimatedCost: number, modelId: string): { allowed: boolean; action: string } { const ctrl = getController(); if (!_initialized) { initBudget(); } const result = ctrl.check({ scopeType: BudgetScope.User, scopeKey: "*", estimatedCost, modelId, tools: [], }); if (!result.allowed) { throw new BudgetExceededError("Budget exceeded: estimated cost " + String(estimatedCost) + " for model " + modelId); } return { allowed: result.allowed, action: result.action };}
Now create src/services/cost-tracker.ts to record each Bedrock call back to the budget controller:
Expected output: A budget engine that rejects calls that would exceed the daily budget (configurable via DEFAULT_DAILY_BUDGET), and a cost tracker that logs every Bedrock call’s token usage.
Step 10: Build the approval store and firewall
Destructive operations (like deleting a lease) require human approval. The @reaatech/tool-use-firewall-core package provides the validation layer. Create src/services/approval-store.ts:
ts
import { randomUUID } from "crypto";import type { AppFolioAction, ApprovalRequest } from "../types/approval";import { ApprovalError } from "../lib/errors";const approvals = new Map<string, ApprovalRequest>();export function createApprovalRequest( jobId: string, action: AppFolioAction, payload: unknown,): ApprovalRequest { const request: ApprovalRequest = { id: randomUUID(), jobId, action, payload, requestedAt: new Date(), }; approvals.set(request.id, request); return request;}export function getPendingApprovals(): ApprovalRequest[] { return Array.from(approvals.values()).filter((a) => !a.approvedAt);}export function approveApproval(id: string): ApprovalRequest { const request = approvals.get(id); if (!request) { throw new ApprovalError(`Approval request ${id} not found`); } request.approvedAt = new Date(); request.approvedBy = "system"; return request;}export function rejectApproval(id: string): ApprovalRequest { const request = approvals.get(id); if (!request) { throw new ApprovalError(`Approval request ${id} not found`); } approvals.delete(id); return request;}
Create the firewall service at src/services/firewall-service.ts:
ts
import { ValidationError, ApprovalRequiredError, Logger } from "@reaatech/tool-use-firewall-core";import { createApprovalRequest } from "./approval-store";const log = new Logger("AppFolioFirewall");const ALLOWED_ACTIONS = ["create_lease", "update_lease", "get_lease", "delete_lease"] as const;export function validateAppfolioAction(action: string, preApproved = false): void { if (!ALLOWED_ACTIONS.includes(action as typeof ALLOWED_ACTIONS[number])) { log.warn("Unknown action", { action }); throw new ValidationError({ message: `unknown action: ${action}` }); } if (action === "delete_lease" && !preApproved) { const request = createApprovalRequest("*", "delete_lease", {}); log.warn("Approval required", { action, approvalId: request.id }); throw new ApprovalRequiredError({ message: "Destructive action requires human approval", approvalId: request.id, }); } log.info("Action allowed", { action });}
Expected output: An in-memory approval store and a firewall that blocks unknown actions and requires explicit pre-approval for destructive operations like delete_lease.
Step 11: Build the AppFolio API client
Create src/api/appfolio-client.ts to communicate with the AppFolio REST API:
Expected output: A singleton AppfolioClient with typed methods for CRUD operations, each gated by the firewall service.
Step 12: Wire the pipeline orchestrator
The orchestrator ties all the services together into a linear pipeline. Create src/services/pipeline-orchestrator.ts:
ts
import { randomUUID } from "crypto";import { processDocument } from "../pipeline/extract";import { extractLeaseFromText } from "./lease-extractor";import { repairLeaseOutput } from "./repair-service";import { getAppfolioClient } from "../api/appfolio-client";import type { PipelineJob } from "../types/pipeline";export class PipelineOrchestrator { private jobs: Map<string, PipelineJob> = new Map(); async submitJob( buffer: Buffer, filename: string, mimeType: string, autoSubmit = false, ): Promise<string> { const id = randomUUID(); const job: PipelineJob = { id, status: "pending", filename, createdAt: new Date(), }; this.jobs.set(id, job); try { job.status = "processing"; const { text } = await processDocument(buffer, filename, mimeType); job.extractedText = text; const rawJson = await extractLeaseFromText(text); const structuredData = await repairLeaseOutput(rawJson); job.structuredData = structuredData; if (autoSubmit) { job.status = "awaiting_approval"; } else { job.status = "completed"; job.completedAt = new Date(); } } catch (error) { job.status = "failed"; job.error = error instanceof Error ? error.message : String(error); } return id; } getJob(id: string): PipelineJob | undefined { return this.jobs.get(id); } listJobs(): PipelineJob[] { return Array.from(this.jobs.values()); } async submitApprovedJob(jobId: string): Promise<void> { const job = this.jobs.get(jobId); if (!job) { throw new Error(`Job ${jobId} not found`); } if (!job.structuredData) { throw new Error(`Job ${jobId} has no structured data`); } await getAppfolioClient().createLease(job.structuredData); job.status = "completed"; job.completedAt = new Date(); }}export const orchestrator = new PipelineOrchestrator();
Expected output: The orchestrator accepts a file buffer, runs it through the full pipeline (extract → Bedrock → repair), and stores the result. If autoSubmit is true, it halts at awaiting_approval until submitApprovedJob is called.
Step 13: Create the Next.js API routes
Create three route handlers under app/api/. Start with the health check at app/api/health/route.ts:
ts
import { NextResponse } from "next/server";export function GET(): NextResponse { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString() });}
Create the upload endpoint at app/api/upload/route.ts:
Expected output: Three route handlers — GET /api/health returns status, POST /api/upload accepts a multipart file and returns a jobId, and GET/POST /api/approvals lists and resolves approval requests.
Step 14: Create the barrel export
Create src/index.ts so consumers can import from a single path:
ts
export { loadConfig, type Config } from "./config";export { LeaseSchema, type LeaseData } from "./types/lease";export type { ProcessingStatus, PipelineJob, ProcessingResult } from "./types/pipeline";export type { AppFolioAction, ApprovalRequest } from "./types/approval";export { detectMimeType, isPdf, isImage } from "./lib/file-utils";export { PipelineError, DocumentProcessingError, ExtractionError, RepairError, BudgetExceededError, ApprovalError, ApiError,} from "./lib/errors";export { extractTextFromPdf } from "./services/pdf-extractor";export { preprocessImage } from "./services/image-preprocessor";export { extractTextFromImage } from "./services/ocr-extractor";export { processDocument } from "./pipeline/extract";export { getClient } from "./services/bedrock";export { extractLeaseFromText } from "./services/lease-extractor";export { repairLeaseOutput, repairWithDiagnostics } from "./services/repair-service";export { getController, initBudget, verifyBudget } from "./services/budget-provider";export { recordBedrockCall } from "./services/cost-tracker";export { createApprovalRequest, getPendingApprovals, approveApproval, rejectApproval } from "./services/approval-store";export { validateAppfolioAction } from "./services/firewall-service";export { PipelineOrchestrator, orchestrator } from "./services/pipeline-orchestrator";export { AppfolioClient, getAppfolioClient } from "./api/appfolio-client";
Step 15: Run the test suite
The project ships with 97 tests covering every service, route handler, and utility. Run them with:
terminal
pnpm vitest run --coverage
Expected output: All 97 tests pass, with 100% line coverage, ~99% statement coverage, ~98% function coverage, and ~97% branch coverage across runtime code.
You now have a complete document pipeline that property management teams can deploy. The pipeline accepts PDFs and scanned images, extracts text, calls Bedrock for structured data, repairs the output, enforces daily budgets, gates destructive operations behind human approval, and pushes the final record to AppFolio.
Next steps
Add a polling endpoint — expose GET /api/jobs/:id so frontends can poll for job completion status rather than receiving it inline
Persist jobs to a database — swap the in-memory Map for SQLite or PostgreSQL so jobs survive restarts and can be audited
Add webhook notifications — call a configurable webhook URL when a job completes or requires approval, using the pipeline’s approval store events