A superintendent walks a job site with a clipboard, snapping photos and scribbling notes. Back at the trailer, they manually type punch items into Procore or Buildertrend. Items get lost, photos are mislabeled, and the owner's rep waits days for a consolidated list.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial, you’ll build a Punch-List Field Capture Agent — a voice-powered web app that lets construction superintendents snap photos, dictate observations, and auto-sync structured punch items to project management software. The app uses @reaatech/voice-agent-stt (backed by Deepgram) for speech-to-text, GPT-4o vision via the media pipeline for photo analysis, an intent classifier to route observations to the right trade category, Instructor for structured data extraction, and ElevenLabs for text-to-speech read-back. Everything is wired through Hono routes mounted inside the Next.js App Router.
This tutorial is for TypeScript developers comfortable with Node.js 22+, Next.js App Router patterns, and cloud APIs (OpenAI, Deepgram, ElevenLabs). You’ll end up with a fully tested voice agent you can extend for your own job sites.
Prerequisites
Node.js 22+ and pnpm 10 installed
A terminal with access to an empty project directory
OpenAI API key — set as OPENAI_API_KEY (used by the AI service, media pipeline, and Instructor)
Deepgram API key — set as DEEPGRAM_API_KEY (speech-to-text transcription)
ElevenLabs API key — set as ELEVENLABS_API_KEY (text-to-speech synthesis)
Vercel Blob token — set as BLOB_READ_WRITE_TOKEN (media storage)
Langfuse credentials (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL) for LLM observability (optional — the app runs without them)
A PM software API URL and key (PM_SOFTWARE_API_URL, PM_SOFTWARE_API_KEY) for the sync adapter (optional for local testing)
All env vars are documented in .env.example with placeholder values. Copy it to .env and fill in your keys.
Step 1: Scaffold the Next.js project and install dependencies
Start by creating the Next.js project with TypeScript and the App Router, then pin every dependency to an exact version.
Expected output: All packages install cleanly. Your package.json has exact versions (no ^ or ~ prefixes) for every dependency.
Next, copy .env.example to .env:
terminal
cp .env.example .env
The .env.example should contain at minimum these entries:
env
NODE_ENV=developmentOPENAI_API_KEY=<your-openai-key>DEEPGRAM_API_KEY=<your-deepgram-key>ELEVENLABS_API_KEY=<your-elevenlabs-key>BLOB_READ_WRITE_TOKEN=<your-vercel-blob-token>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=<your-langfuse-base-url>PM_SOFTWARE_API_URL=<your-pm-software-api-url>PM_SOFTWARE_API_KEY=<your-pm-software-api-key>OTLP_ENDPOINT=http://localhost:4318/v1/tracesNEXT_PUBLIC_APP_NAME=Punch-List Field Capture Agent
Step 2: Define domain types, validation schemas, and constants
The domain revolves around PunchItem — a construction defect or incomplete task. Create src/lib/types.ts with the core interfaces:
And the classifier’s category definitions and latency budget in src/lib/constants.ts. Each AgentConfig represents one punch-list trade category with examples the classifier uses for matching:
ts
import type { AgentConfig } from "@reaatech/agent-mesh";export const PUNCH_CATEGORIES = [ "structural", "electrical", "plumbing", "finish", "safety", "hvac", "other",] as const;export const PUNCH_PRIORITIES = [ "critical", "high", "medium", "low",] as const;export const DEFAULT_SYNC_PROVIDER = "generic-pm";
Expected output:pnpm typecheck passes with no errors. Each agent config has three examples the classifier compares against incoming transcripts.
Step 3: Build the voice capture service
The VoiceCaptureService wraps @reaatech/voice-agent-core and @reaatech/voice-agent-stt to transcribe audio via Deepgram. Create src/services/voice-capture-service.ts:
ts
import { initializeSessionManager, LatencyBudgetEnforcer, createCostTracker, createRecordingManager, type Session, type AudioChunk, type Utterance,} from "@reaatech/voice-agent-core";import { createSTTProvider, STTProviderInterface,} from "@reaatech/voice-agent-stt";import type { VoiceCaptureResult } from "../lib/types.js";export class VoiceCaptureService { private sessionManager: ReturnType<typeof initializeSessionManager>; private latencyEnforcer: LatencyBudgetEnforcer; private costTracker
Expected output: The service handles the full STT lifecycle — session creation, streaming audio through Deepgram’s Nova-2 model, collecting final utterances, tracking latency and cost, and closing the provider cleanly.
While you’re here, create the text-to-speech service at src/services/tts-service.ts. It uses the ElevenLabs client to convert text to spoken audio:
Expected output:pnpm typecheck passes. The TTS service streams ElevenLabs audio into a Buffer the API routes can return as an HTTP response.
Step 4: Build the photo analysis pipeline
The PhotoService uses @reaatech/media-pipeline-mcp-core and @reaatech/media-pipeline-mcp-openai to run an image through GPT-4o vision. Create src/services/photo-service.ts:
ts
import { PipelineExecutor, ArtifactRegistry, createEventBus, type Provider, type PipelineDefinition,} from "@reaatech/media-pipeline-mcp-core";import { OpenAIProvider, createOpenAIProvider,} from "@reaatech/media-pipeline-mcp-openai";import type { PhotoAnalysisResult } from "../lib/types.js";class OpenAIProviderAdapter implements Provider { readonly name = "openai"; readonly supportedOperations = ["image.describe"]; private inner: OpenAIProvider; constructor(apiKey
Expected output: The PhotoService builds a one-step pipeline that sends the image (as base64) to GPT-4o vision, extracts a text description, and scans for construction-relevant keywords like “crack” or “leak”.
Step 5: Build the classification and AI extraction services
The ClassificationService uses @reaatech/agent-mesh-classifier to route a transcript to the correct punch-list trade category. Create src/services/classification-service.ts:
ts
import { classifierService, detectLanguage, isRateLimitError,} from "@reaatech/agent-mesh-classifier";import { AgentConfigSchema, ClassifierOutputSchema, IncomingRequestSchema, type ClassifierOutput,} from "@reaatech/agent-mesh";import { PUNCH_CATEGORY_AGENTS } from "../lib/constants.js";export class ClassificationService { async classifyIntent( transcript: string, priorLanguage?: string, ): Promise<ClassifierOutput> { if (!transcript || transcript.trim().length === 0) { return { agent_id: "other-punch", confidence: 0.5, ambiguous: false, detected_language: "en", intent_summary: "Unspecified punch list item", entities: {}, }; } for (const agent of PUNCH_CATEGORY_AGENTS) { AgentConfigSchema.parse(agent); } try { IncomingRequestSchema.parse({ input: transcript }); const result = await classifierService.classify( transcript, PUNCH_CATEGORY_AGENTS, priorLanguage, ); const validated = ClassifierOutputSchema.parse(result); return validated; } catch (err) { if (isRateLimitError(err)) { await new Promise((resolve) => setTimeout(resolve, 1000)); const result = await classifierService.classify( transcript, PUNCH_CATEGORY_AGENTS, priorLanguage, ); const validated = ClassifierOutputSchema.parse(result); return validated; } throw err; } } detectInputLanguage(text: string): string { return detectLanguage(text); }}
Now create the AI extraction service using Instructor + OpenAI. It takes the transcript, optional photo description, and category, then returns a structured PunchItem via a Zod schema. Create src/services/ai-service.ts:
ts
import { generateText } from "ai";import { openai } from "@ai-sdk/openai";import Instructor from "@instructor-ai/instructor";import OpenAI from "openai";import { PunchItemSchema } from "../lib/validation.js";import type { PunchItem } from "../lib/types.js";function buildExtractionPrompt( transcript: string, photoDescription?: string, category?: string,): string { let prompt = `Extract a punch list item from the following transcript as JSON:\n\n${transcript}`; if (photoDescription) { prompt += `\n\nPhoto description: ${photoDescription}`; } if (category) { prompt += `\n\nCategory: ${category}`; } prompt += `\n\nRespond with a JSON object only.`; return prompt;}function buildSummaryPrompt(item: PunchItem): string { return `Generate a concise summary for punch item: ${item.title} - ${item.description}`;}export class AIService { private instructor: ReturnType<typeof Instructor>; constructor() { this.instructor = Instructor({ client: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }), mode: "TOOLS", }); } async extractPunchItem( transcript: string, photoDescription?: string, category?: string, ): Promise<PunchItem> { const prompt = buildExtractionPrompt(transcript, photoDescription, category); const result = await this.instructor.chat.completions.create({ model: "gpt-5.2-nano", messages: [{ role: "user", content: prompt }], response_model: { schema: PunchItemSchema, name: "PunchItem" }, max_retries: 2, }); return result; } async generateSummary(item: PunchItem): Promise<string> { const prompt = buildSummaryPrompt(item); const result = await generateText({ model: openai("gpt-5.2-nano"), prompt, }); return result.text; }}
Expected output:pnpm typecheck passes. The classifier routes transcripts to the correct category agent, and Instructor extracts structured punch items from natural-language descriptions.
Step 6: Build the sync service with a PM adapter
The sync layer pushes pending punch items to external project management software. Create src/services/sync-service.ts:
Expected output: The SyncService manages a pending queue. Each item is pushed through registered adapters; failed items remain pending for retry. The GenericPMSyncAdapter calls your PM software’s API with a Bearer token.
Step 7: Build the storage and observability services
Create src/services/storage-service.ts for Vercel Blob uploads:
Expected output:pnpm typecheck passes. Storage uploads photos and recordings to Vercel Blob under a punch-list-capture/ prefix. Observability traces key events (item creation, sync, audio processing, classification) to Langfuse.
Step 8: Build the core punch list orchestration service
PunchListService is the central orchestrator. It accepts a transcript and photos, classifies the intent, analyzes photos, extracts a structured punch item, uploads media, enqueues the item for sync, and traces the event. Create src/services/punch-list-service.ts:
ts
import type { PunchItem, PunchCategory, PunchPriority, PunchStatus } from "../lib/types.js";import type { VoiceCaptureService } from "./voice-capture-service.js";import type { PhotoService } from "./photo-service.js";import type { ClassificationService } from "./classification-service.js";import type { AIService } from "./ai-service.js";import type { StorageService } from "./storage-service.js";import type { SyncService } from "./sync-service.js";import type { ObservabilityService } from "./observability-service.js";export class PunchListService { private items
Expected output: The service connects the full pipeline: classify, analyze photos, extract, upload, persist, queue sync, trace. You’ll wire it into HTTP routes next.
Step 9: Wire up Hono API routes
The API uses Hono to expose each service as REST endpoints. Create the route files under src/api/.
Start with src/api/punch-item-routes.ts — the main CRUD endpoint for punch items:
Finally, create src/api/index.ts to compose all routes into a single Hono app:
ts
import { Hono } from "hono";import { createSTTProvider } from "@reaatech/voice-agent-stt";import { createPunchItemRoutes, type PunchListServiceShape } from "./punch-item-routes.js";import { createVoiceRoutes, type VoiceServiceShape } from "./voice-routes.js";import { createPhotoRoutes, type PhotoServiceShape } from "./photo-routes.js";import { createSyncRoutes, type SyncServiceShape } from "./sync-routes.js";import { createTtsRoutes, type TtsServiceShape } from "./tts-routes.js";export function createApp( punchListService: PunchListServiceShape, voiceCaptureService: VoiceServiceShape, photoService: PhotoServiceShape, syncService: SyncServiceShape, ttsService: TtsServiceShape, sttProvider: ReturnType<typeof createSTTProvider>,) { const app = new Hono(); app.route("/api/voice", createVoiceRoutes(voiceCaptureService, sttProvider)); app.route("/api/photo", createPhotoRoutes(photoService)); app.route("/api/punch-items", createPunchItemRoutes(punchListService)); app.route("/api/sync", createSyncRoutes(syncService)); app.route("/api/tts", createTtsRoutes(ttsService)); return app;}
Expected output: The five route modules each export a factory function that takes a typed service shape and returns a Hono router. createApp mounts them all under their respective path prefixes.
Step 10: Mount the API in Next.js and build the home page
Next.js App Router’s catch-all route forwards every API request to Hono. Create app/api/[[...route]]/route.ts:
ts
import { type NextRequest } from "next/server";import { initializeSessionManager, createLatencyBudget, LatencyBudgetEnforcer, createCostTracker, createRecordingManager,} from "@reaatech/voice-agent-core";import { createSTTProvider } from "@reaatech/voice-agent-stt";import { createApp } from "../../../src/api/index.js";import { PunchListService } from "../../../src/services/punch-list-service.js";import { VoiceCaptureService } from "../../../src/services/voice-capture-service.js";import { PhotoService } from "../../../src/services/photo-service.js";import { SyncService } from "../../../src/services/sync-service.js";import { GenericPMSyncAdapter } from "../../../src/services/sync-service.js";import { TTSService } from "../../../src/services/tts-service.js";import { ClassificationService } from "../../../src/services/classification-service.js";import { AIService } from "../../../src/services/ai-service.js";import { StorageService } from "../../../src/services/storage-service.js";import { ObservabilityService } from "../../../src/services/observability-service.js";const syncAdapter = new GenericPMSyncAdapter();const syncService = new SyncService([syncAdapter]);const classificationService = new ClassificationService();const aiService = new AIService();const storageService = new StorageService();const observabilityService = new ObservabilityService();const budget = createLatencyBudget({ stt: 200, mcp: 400, tts: 200, target: 800, hardCap: 1200,});const sessionManager = initializeSessionManager({ defaultTTL: 3600, maxTurns: 20, maxTokens: 4000,});const latencyEnforcer = new LatencyBudgetEnforcer(budget);const costTracker = createCostTracker({ enabled: true, currency: "USD", providers: { deepgram: { stt: { pricePerMinute: 0.0059 }, tts: { pricePerCharacter: 0.000015 } }, },});const recordingManager = createRecordingManager({ enabled: true, storage: "memory" });const sttProvider = createSTTProvider({ provider: "deepgram", config: { provider: "deepgram", apiKey: process.env.DEEPGRAM_API_KEY ?? "", model: "nova-2", language: "en", sampleRate: 8000, encoding: "mulaw", },});const voiceCaptureService = new VoiceCaptureService( sessionManager, latencyEnforcer, costTracker, recordingManager,);const photoService = new PhotoService();const punchListService = new PunchListService( voiceCaptureService, photoService, classificationService, aiService, storageService, syncService, observabilityService,);const ttsService = new TTSService();const app = createApp( punchListService, voiceCaptureService, photoService, syncService, ttsService, sttProvider,);export const GET = (req: NextRequest) => app.fetch(req);export const POST = (req: NextRequest) => app.fetch(req);export const PUT = (req: NextRequest) => app.fetch(req);export const PATCH = (req: NextRequest) => app.fetch(req);export const DELETE = (req: NextRequest) => app.fetch(req);
Expected output: All API routes are accessible at /api/voice/transcribe, /api/photo/analyze, /api/punch-items, /api/sync, and /api/tts. The catch-all handler instantiates every service once at module scope — the module is cached by Next.js across requests.
Now create the home page at app/page.tsx — a client component that lets you transcribe audio, analyze photos, create punch items, and view sync status:
Expected output:pnpm dev starts the server. Open http://localhost:3000 to see the punch-list capture dashboard. Type a transcript, upload a photo, and create punch items — the app classifies the intent by trade category and extracts structured data.
Step 11: Run the tests
The project includes a test suite covering all services and API routes. Tests mock external providers (Deepgram, OpenAI, ElevenLabs, Vercel Blob) so they run without live API keys:
terminal
pnpm test
Expected output: All tests pass with zero failures. The test suite validates happy paths (creating punch items, transcribing audio, analyzing photos), error paths (missing fields, not-found IDs, classification failures), and boundary cases (empty transcripts, rate-limit retries).
Next steps
Add more sync adapters — implement ISyncAdapter for Procore, Buildertrend, or your own PM software API. Register them in the SyncService constructor for multi-provider push.
Persist punch items to a database — replace the in-memory Map in PunchListService with SQLite, Postgres, or Vercel KV so items survive server restarts.
Add multi-language support — the ClassificationService.detectInputLanguage() method already detects language. Extend the STT provider config and classification prompts for Spanish, French, or Japanese job sites.
Enable real-time audio streaming — the @reaatech/voice-agent-core package supports WebSocket-based streaming. Replace the POST /transcribe endpoint with a WebSocket handler for live dictation.