Punch-List Field Capture Agent for Superintendents

Snap photos, record voice memos, and auto-sync punch items to your PM software in real time.

voice-agent construction nextjs hono deepgram openai elevenlabs punch-list field-capture

The problem

A superintendent walks a job site with a clipboard, snapping photos and scribbling notes. Back at the trailer, they manually type punch items into Procore or Buildertrend. Items get lost, photos are mislabeled, and the owner's rep waits days for a consolidated list.

Built from

Intro

In this tutorial, you’ll build a Punch-List Field Capture Agent — a voice-powered web app that lets construction superintendents snap photos, dictate observations, and auto-sync structured punch items to project management software. The app uses @reaatech/voice-agent-stt (backed by Deepgram) for speech-to-text, GPT-4o vision via the media pipeline for photo analysis, an intent classifier to route observations to the right trade category, Instructor for structured data extraction, and ElevenLabs for text-to-speech read-back. Everything is wired through Hono routes mounted inside the Next.js App Router.

This tutorial is for TypeScript developers comfortable with Node.js 22+, Next.js App Router patterns, and cloud APIs (OpenAI, Deepgram, ElevenLabs). You’ll end up with a fully tested voice agent you can extend for your own job sites.

Prerequisites

Node.js 22+ and pnpm 10 installed
A terminal with access to an empty project directory
OpenAI API key — set as OPENAI_API_KEY (used by the AI service, media pipeline, and Instructor)
Deepgram API key — set as DEEPGRAM_API_KEY (speech-to-text transcription)
ElevenLabs API key — set as ELEVENLABS_API_KEY (text-to-speech synthesis)
Vercel Blob token — set as BLOB_READ_WRITE_TOKEN (media storage)
Langfuse credentials (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL) for LLM observability (optional — the app runs without them)
A PM software API URL and key (PM_SOFTWARE_API_URL, PM_SOFTWARE_API_KEY) for the sync adapter (optional for local testing)

All env vars are documented in .env.example with placeholder values. Copy it to .env and fill in your keys.

Step 1: Scaffold the Next.js project and install dependencies

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

212 kB·163 tests·100.0% coverage·vitest passing

SHA-256806683787c954ecb1c34a136c2d442f68909153af1ccdf4ad01c67704b418723

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js 22+ and pnpm 10 installed

A terminal with access to an empty project directory

OpenAI API key — set as OPENAI_API_KEY (used by the AI service, media pipeline, and Instructor)

Deepgram API key — set as DEEPGRAM_API_KEY (speech-to-text transcription)

ElevenLabs API key — set as ELEVENLABS_API_KEY (text-to-speech synthesis)

Vercel Blob token — set as BLOB_READ_WRITE_TOKEN (media storage)

Langfuse credentials (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL) for LLM observability (optional — the app runs without them)

A PM software API URL and key (PM_SOFTWARE_API_URL, PM_SOFTWARE_API_KEY) for the sync adapter (optional for local testing)

All env vars are documented in .env.example with placeholder values. Copy it to .env and fill in your keys.

import type { PunchItem, SyncStatus } from "../lib/types.js"; import { DEFAULT_SYNC_PROVIDER } from "../lib/constants.js"; export interface ISyncAdapter { readonly providerName: string; syncItem(item: PunchItem): Promise<{ success: boolean; externalId?: string }>; } export class GenericPMSyncAdapter implements ISyncAdapter { readonly providerName = "generic-pm"; async syncItem( item: PunchItem, ): Promise<{ success: boolean; externalId?: string }> { const apiUrl = process.env.PM_SOFTWARE_API_URL ?? ""; const apiKey = process.env.PM_SOFTWARE_API_KEY ?? ""; const response = await fetch(apiUrl, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify(item), }); if (!response.ok) { throw new Error(`Sync failed with status ${String(response.status)}`); } const data = (await response.json()) as { id?: string }; return { success: true, externalId: data.id }; } } export class SyncService { private adapters: ISyncAdapter[]; private pendingItems: PunchItem[] = []; private syncedCount = 0; private failedCount = 0; constructor(adapters: ISyncAdapter[]) { this.adapters = adapters; } async syncPunchItem(item: PunchItem): Promise<{ success: boolean }> { for (const adapter of this.adapters) { try { const result = await adapter.syncItem(item); if (result.success) { item.syncedAt = new Date(); this.syncedCount++; return { success: true }; } } catch { continue; } } this.failedCount++; return { success: false }; } async syncAllPending(): Promise<SyncStatus> { const pending = [...this.pendingItems]; this.pendingItems = []; for (const item of pending) { const result = await this.syncPunchItem(item); if (!result.success) { this.pendingItems.push(item); } } return { lastSyncedAt: new Date(), pendingCount: this.pendingItems.length, syncedCount: this.syncedCount, failedCount: this.failedCount, provider: DEFAULT_SYNC_PROVIDER, }; } addPendingItem(item: PunchItem): void { this.pendingItems.push(item); } getSyncStatus(): SyncStatus { return { lastSyncedAt: null, pendingCount: this.pendingItems.length, syncedCount: this.syncedCount, failedCount: this.failedCount, provider: DEFAULT_SYNC_PROVIDER, }; } }

import Langfuse from "langfuse"; import { initializeObservability } from "@reaatech/voice-agent-core"; import type { PunchItem } from "../lib/types.js"; export class ObservabilityService { private langfuse: Langfuse; private otel: ReturnType<typeof initializeObservability>; constructor() { this.langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "", secretKey: process.env.LANGFUSE_SECRET_KEY ?? "", baseUrl: process.env.LANGFUSE_BASE_URL ?? "https://cloud.langfuse.com", }); this.otel = initializeObservability({ serviceName: "punch-list-field-capture", serviceVersion: "1.0.0", enabled: true, otlpEndpoint: process.env.OTLP_ENDPOINT ?? "http://localhost:4318/v1/traces", }); } tracePunchItemCreation(item: PunchItem): void { try { this.langfuse.trace({ name: "punch-item-creation", input: { itemId: item.id, jobSiteId: item.jobSiteId, category: item.category }, output: { title: item.title, status: item.status }, }); } catch { // Langfuse errors are non-critical } } traceSyncEvent( punchItemId: string, provider: string, success: boolean, durationMs: number, ): void { try { this.langfuse.trace({ name: "sync-event", input: { punchItemId, provider }, output: { success, durationMs }, }); } catch { // Langfuse errors are non-critical } } traceAudioProcessing(sessionId: string, durationMs: number): void { try { this.langfuse.trace({ name: "audio-processing", input: { sessionId }, output: { durationMs }, }); } catch { // Langfuse errors are non-critical } } traceClassification(input: string, output: string, latencyMs: number): void { try { this.langfuse.trace({ name: "classification", input: { input }, output: { output, latencyMs }, }); } catch { // Langfuse errors are non-critical } } }

import { Hono } from "hono"; import type { PunchStatus, PunchCategory, PunchPriority, PunchItem } from "../lib/types.js"; export type PunchListServiceShape = { createPunchItem(input: { transcript: string; photos: Buffer[]; jobSiteId: string }): Promise<PunchItem>; listPunchItems(jobSiteId: string, filters?: { status?: PunchStatus; category?: PunchCategory; priority?: PunchPriority }): Promise<PunchItem[]>; getPunchItem(id: string): Promise<PunchItem | undefined>; updatePunchItem(id: string, updates: Partial<PunchItem>): Promise<PunchItem>; deletePunchItem(id: string): Promise<boolean>; }; export function createPunchItemRoutes(punchListService: PunchListServiceShape) { const app = new Hono(); app.post("/", async (c) => { const body = await c.req.json<{ transcript?: string; photos?: string[]; jobSiteId: string; }>(); const { transcript, photos, jobSiteId } = body; if (!jobSiteId) { return c.json({ error: "Missing jobSiteId" }, 400); } const photoBuffers: Buffer[] = (photos ?? []).map( (p: string) => Buffer.from(p, "base64"), ); const item = await punchListService.createPunchItem({ transcript: transcript ?? "", photos: photoBuffers, jobSiteId, }); return c.json(item, 201); }); app.get("/", async (c) => { const jobSiteId = c.req.query("jobSiteId") ?? ""; const status = c.req.query("status") as PunchStatus | undefined; const category = c.req.query("category") as PunchCategory | undefined; const priority = c.req.query("priority") as PunchPriority | undefined; const filters: { status?: PunchStatus; category?: PunchCategory; priority?: PunchPriority; } = {}; if (status) filters.status = status; if (category) filters.category = category; if (priority) filters.priority = priority; const items = await punchListService.listPunchItems(jobSiteId, filters); return c.json(items); }); app.get("/:id", async (c) => { const id = c.req.param("id"); const item = await punchListService.getPunchItem(id); if (!item) { return c.json({ error: "Not found" }, 404); } return c.json(item); }); app.patch("/:id", async (c) => { const id = c.req.param("id"); const updates = await c.req.json<Partial<PunchItem>>(); const item = await punchListService.updatePunchItem(id, updates); return c.json(item); }); app.delete("/:id", async (c) => { const id = c.req.param("id"); const deleted = await punchListService.deletePunchItem(id); if (!deleted) { return c.json({ error: "Not found" }, 404); } return c.body(null, 204); }); return app; }

import { type NextRequest } from "next/server"; import { initializeSessionManager, createLatencyBudget, LatencyBudgetEnforcer, createCostTracker, createRecordingManager, } from "@reaatech/voice-agent-core"; import { createSTTProvider } from "@reaatech/voice-agent-stt"; import { createApp } from "../../../src/api/index.js"; import { PunchListService } from "../../../src/services/punch-list-service.js"; import { VoiceCaptureService } from "../../../src/services/voice-capture-service.js"; import { PhotoService } from "../../../src/services/photo-service.js"; import { SyncService } from "../../../src/services/sync-service.js"; import { GenericPMSyncAdapter } from "../../../src/services/sync-service.js"; import { TTSService } from "../../../src/services/tts-service.js"; import { ClassificationService } from "../../../src/services/classification-service.js"; import { AIService } from "../../../src/services/ai-service.js"; import { StorageService } from "../../../src/services/storage-service.js"; import { ObservabilityService } from "../../../src/services/observability-service.js"; const syncAdapter = new GenericPMSyncAdapter(); const syncService = new SyncService([syncAdapter]); const classificationService = new ClassificationService(); const aiService = new AIService(); const storageService = new StorageService(); const observabilityService = new ObservabilityService(); const budget = createLatencyBudget({ stt: 200, mcp: 400, tts: 200, target: 800, hardCap: 1200, }); const sessionManager = initializeSessionManager({ defaultTTL: 3600, maxTurns: 20, maxTokens: 4000, }); const latencyEnforcer = new LatencyBudgetEnforcer(budget); const costTracker = createCostTracker({ enabled: true, currency: "USD", providers: { deepgram: { stt: { pricePerMinute: 0.0059 }, tts: { pricePerCharacter: 0.000015 } }, }, }); const recordingManager = createRecordingManager({ enabled: true, storage: "memory" }); const sttProvider = createSTTProvider({ provider: "deepgram", config: { provider: "deepgram", apiKey: process.env.DEEPGRAM_API_KEY ?? "", model: "nova-2", language: "en", sampleRate: 8000, encoding: "mulaw", }, }); const voiceCaptureService = new VoiceCaptureService( sessionManager, latencyEnforcer, costTracker, recordingManager, ); const photoService = new PhotoService(); const punchListService = new PunchListService( voiceCaptureService, photoService, classificationService, aiService, storageService, syncService, observabilityService, ); const ttsService = new TTSService(); const app = createApp( punchListService, voiceCaptureService, photoService, syncService, ttsService, sttProvider, ); export const GET = (req: NextRequest) => app.fetch(req); export const POST = (req: NextRequest) => app.fetch(req); export const PUT = (req: NextRequest) => app.fetch(req); export const PATCH = (req: NextRequest) => app.fetch(req); export const DELETE = (req: NextRequest) => app.fetch(req);

Punch-List Field Capture Agent for Superintendents

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Next.js project and install dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Next.js project and install dependencies

Step 2: Define domain types, validation schemas, and constants

Step 3: Build the voice capture service

Step 4: Build the photo analysis pipeline

Step 5: Build the classification and AI extraction services

Step 6: Build the sync service with a PM adapter

Step 7: Build the storage and observability services

Step 8: Build the core punch list orchestration service

Step 9: Wire up Hono API routes

Step 10: Mount the API in Next.js and build the home page

Step 11: Run the tests

Next steps