A superintendent on a jobsite finds defects but has to manually photograph, write notes, and later transcribe into project management software. Items get lost or delayed, causing rework and owner frustration. The superintendent needs a mobile-first agent that ingests photos and voice memos, extracts actionable items, and syncs them to the PM system with status tracking.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a Field Punch-List Agent for general contractor superintendents. You’ll create a Next.js 16 + Hono voice-agent that accepts audio recordings and photos from a jobsite, transcribes them with Deepgram, extracts actionable punch items using an LLM, runs them through content guardrails, stores them in agent memory, and syncs them to an external project management tool via webhook. By the end, you’ll have a fully tested API that a mobile field app can call to capture and track construction defects in real time.
Prerequisites
Node.js 22+ and pnpm 10 installed on your machine
A Deepgram API key for speech-to-text transcription (Nova-2 model)
An OpenAI API key for punch-item extraction via generateText from the Vercel AI SDK
A Langfuse account (free tier works) for observability tracing
A webhook URL from your project management tool (or a placeholder for testing)
Familiarity with TypeScript, Next.js App Router, and basic Express/Hono patterns
Step 1: Scaffold the Next.js project and install dependencies
Start with a fresh Next.js 16 project using the App Router. Create the directory and initialize it:
Expected output: Your project root has the Next.js scaffold plus all deps installed. The src/ and tests/ directories are ready for the source files you’ll create in the next steps.
Step 2: Configure environment variables with Zod validation
Create src/config.ts to read every process.env.* value through a Zod schema. This gives you typed, validated configuration with sensible defaults:
Expected output: Importing config anywhere gives you a typed object where every field is either the env var value or its default. If DEEPGRAM_API_KEY is missing, Zod throws at import time — you catch misconfiguration immediately.
Now create .env.example with all required environment variables so the next developer knows what to configure:
Expected output: The AppError base class and its subclasses let every service throw errors that carry the right HTTP status code and a machine-readable code string — route handlers map these directly to JSON responses.
Step 4: Create the Deepgram transcription service
This service wraps @reaatech/media-pipeline-mcp-deepgram to provide speech-to-text and diarization. Create src/services/transcription-service.ts:
ts
import { DeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";import { ValidationError, TranscriptionError } from "../lib/errors.js";let provider: DeepgramProvider | null = null;export function createTranscriptionProvider(): DeepgramProvider { const apiKey = process.env.DEEPGRAM_API_KEY; if (!apiKey) { throw new ValidationError("DEEPGRAM_API_KEY is required"); } provider = new DeepgramProvider({ apiKey, models: { stt: "nova-2" } }); return provider;}export function getTranscriptionProvider(): DeepgramProvider { if (!provider) { return createTranscriptionProvider(); } return provider;}interface DeepgramSttParsed { transcript?: string; confidence?: number; segments?: unknown[]; error?: string;}interface TranscriptResult { transcript: string; confidence: number; segments: unknown[];}export async function transcribeAudio( audioBuffer: Buffer, language = "en",): Promise<TranscriptResult> { if (audioBuffer.length === 0) { throw new ValidationError("Audio buffer is empty"); } const prov = getTranscriptionProvider(); const result = await prov.execute({ operation: "audio.stt", params: { audio_data: audioBuffer, language, diarize: true }, config: {}, }); if (!Buffer.isBuffer(result.data)) { throw new TranscriptionError("Expected Buffer response from Deepgram"); } const parsed = JSON.parse(result.data.toString()) as DeepgramSttParsed; if (parsed.error) { throw new TranscriptionError(`Deepgram API error: ${parsed.error}`); } return { transcript: parsed.transcript ?? "", confidence: parsed.confidence ?? 0, segments: parsed.segments ?? [], };}interface DeepgramDiarizationParsed { speakers?: number; segments?: Array<{ speaker: number; text: string; start: number; end: number; confidence: number; }>;}interface DiarizationResult { speakers: number; segments: Array<{ speaker: number; text: string; start: number; end: number; confidence: number; }>;}export async function transcribeWithDiarization( audioBuffer: Buffer,): Promise<DiarizationResult> { const prov = getTranscriptionProvider(); const result = await prov.execute({ operation: "audio.diarize", params: { audio_data: audioBuffer, language: "en" }, config: {}, }); if (!Buffer.isBuffer(result.data)) { throw new TranscriptionError("Expected Buffer response from Deepgram"); } const parsed = JSON.parse(result.data.toString()) as DeepgramDiarizationParsed; return { speakers: parsed.speakers ?? 0, segments: parsed.segments ?? [], };}export async function checkProviderHealth(): Promise<{ healthy: boolean; latency: number }> { const prov = getTranscriptionProvider(); const health = await prov.healthCheck(); return { healthy: health.healthy, latency: health.latency ?? 0 };}
Expected output:transcribeAudio(buffer) sends a WAV/MP3 buffer to Deepgram’s Nova-2 model and returns the transcript with confidence score. Empty buffers throw ValidationError. API errors throw TranscriptionError.
Step 5: Build the LLM-powered punch-list extractor
This service uses Vercel AI SDK’s generateText with structured output to extract defect items from transcripts and photo descriptions. Create src/services/punch-list-extractor.ts:
ts
import { generateText, Output } from "ai";import { z } from "zod";import { PipelineError } from "../lib/errors.js";import type { PunchItem } from "../types.js";import { tracePunchItemExtraction } from "./observability-service.js";const PunchItemSchema = z.object({ title: z.string().min(3), description: z.string().min(5), location: z.string().optional(), severity: z.enum(["low", "medium",
Then add the photo description variant and severity classifier at the bottom of the same file:
Expected output: Passing a transcript like “Crack in foundation wall at northeast corner” returns an array of structured PunchItem objects with severity classifications. Empty input returns { items: [], summary: "" } without calling the LLM. Malformed items in the model output are logged and filtered out rather than crashing.
Step 6: Set up the agent memory service
Create src/services/memory-service.ts to store and retrieve punch items via @reaatech/agent-memory. The memory service uses OpenAI embeddings for semantic retrieval:
ts
import { AgentMemory, MemoryType, OpenAILLMProvider } from "@reaatech/agent-memory";import { config } from "../config.js";import type { PunchItem } from "../types.js";import type { ConversationTurn } from "@reaatech/agent-memory-core";let activeMemory: AgentMemory | null = null;export function createMemoryService(): AgentMemory { const memory = new AgentMemory({ storage: { provider: "memory" }, embedding: { provider: "openai", model: "text-embedding-3-small", apiKey: config.OPENAI_API_KEY, }, extraction: { llmProvider: new OpenAILLMProvider({ apiKey: config.OPENAI_API_KEY, model: "gpt-4o-mini", }), enabledTypes: [MemoryType.FACT, MemoryType.PREFERENCE], batchSize: 10, confidenceThreshold: 0.7, }, }); memory.events.on("memory:stored", () => { // logged via Langfuse observability }); activeMemory = memory; return memory;}export async function storePunchItemMemory(sessionId: string, punchItem: PunchItem): Promise<void> { if (!activeMemory) throw new Error("Memory service not initialized"); const conversation: ConversationTurn[] = [ { speaker: "user", content: punchItem.description, timestamp: new Date(), }, ]; await activeMemory.extractAndStore(conversation);}export async function retrieveContextForProject(projectId: string, query: string): Promise<unknown[]> { if (!activeMemory) throw new Error("Memory service not initialized"); return activeMemory.retrieve(query, { limit: 5 });}export async function runMemoryMaintenance(): Promise<void> { if (!activeMemory) throw new Error("Memory service not initialized"); await activeMemory.runMaintenance();}export async function closeMemoryService(): Promise<void> { if (!activeMemory) throw new Error("Memory service not initialized"); await activeMemory.close(); activeMemory = null;}
Expected output:createMemoryService() instantiates an in-memory agent memory backed by OpenAI embeddings. Calling storePunchItemMemory stores punch item descriptions as conversation turns; retrieveContextForProject returns semantically similar past items.
Step 7: Implement content guardrails and API auth
Create src/services/guardrail-service.ts to validate transcript content and extracted punch items before they reach the PM tool:
Now create src/services/auth-service.ts to authenticate incoming API requests using @reaatech/a2a-reference-auth:
ts
import { ApiKeyStrategy, NoneStrategy, extractScopes } from "@reaatech/a2a-reference-auth";import { config } from "../config.js";import { AuthError } from "../lib/errors.js";export function createAuthStrategy(): ApiKeyStrategy | NoneStrategy { if (process.env.NODE_ENV === "development") { return new NoneStrategy(); } return new ApiKeyStrategy({ keys: new Set([config.PUNCH_LIST_AUTH_API_KEY]) });}// Import referenced for scope extraction capabilityvoid extractScopes;export async function authenticateRequest(headers: Record<string, string>): Promise<{ authenticated: boolean; identity: string }> { const authHeader = headers.authorization; if (!authHeader || !authHeader.startsWith("Bearer ")) { throw new AuthError("Missing or invalid API key"); } const token = authHeader.slice(7); const strategy = createAuthStrategy(); const result = await strategy.authenticate({ headers: { "x-api-key": token } }); if (!result.authenticated) { throw new AuthError(result.reason ?? "Missing or invalid API key"); } return { authenticated: true, identity: result.principal ?? "unknown" };}import type { MiddlewareHandler } from "hono";export const authMiddleware: MiddlewareHandler = async (c, next) => { try { const headers: Record<string, string> = {}; for (const [key, value] of Object.entries(c.req.header())) { if (typeof value === "string") { headers[key] = value; } } const result = await authenticateRequest(headers); c.set("identity", result.identity); await next(); } catch { return c.json({ error: "unauthorized" }, 401); }};
Expected output: Guardrails run every transcript and extracted item through a budget-controlled chain with fail-open on timeout and fail-closed on validation errors. The auth service validates Bearer tokens against the API key — in development mode it uses NoneStrategy which allows any request.
Step 8: Build the webhook sync service with retry and dedup
Create src/services/webhook-sync-service.ts to sync punch items to your PM tool. It includes exponential backoff retry, HMAC signature verification, and in-flight deduplication:
ts
import { HMACSignatureValidator } from "@reaatech/webhook-relay-webhooks";import { config } from "../config.js";import { ValidationError, WebhookSyncError } from "../lib/errors.js";import type { PunchItem, PmSyncResult } from "../types.js";const dedupSet = new Set<string>();const validator = new HMACSignatureValidator();export async function syncPunchItemToPm(punchItem: PunchItem): Promise<PmSyncResult> { if (!config.PUNCH_LIST_PM_WEBHOOK_URL) { throw new ValidationError(
Expected output:syncPunchItemToPm POSTs the item as JSON to the configured webhook URL. On failure, syncWithRetry retries with 1s/2s/4s exponential backoff (capped at 30s). Duplicate webhook callbacks are silently skipped.
Step 9: Wire the voice pipeline
Create src/services/voice-pipeline.ts as the orchestration layer that connects @reaatech/voice-agent-core with your custom services:
ts
import { createPipeline, createLatencyBudget, initializeSessionManager, defineConfig, LatencyBudgetEnforcer, createRecordingManager, initializeObservability, MockSTTProvider, MockTTSProvider, MockMCPClient, type AudioChunk, type BargeInConfig,} from "@reaatech/voice-agent-core";import { extractPunchItemsFromTranscript } from "./punch-list-extractor.js";let currentPipeline: ReturnType<typeof createPipeline> | null = null;export function createVoiceSessionManager() { return initializeSessionManager({ defaultTTL: 3600, maxTurns:
Expected output:createVoicePipeline sets up a complete voice pipeline with a 2-second latency budget, 50-turn session history, and Deepgram STT/TTS providers. The pipeline fires punch-item extraction on every stt:final event automatically.
Step 10: Create the observability service
Create src/services/observability-service.ts to track extraction costs and performance with Langfuse:
ts
import Langfuse from "langfuse";import { config } from "../config.js";import { DeepgramPricingProvider } from "../lib/pricing-provider.js";import type { PunchListExtraction } from "../types.js";let langfuseInstance: Langfuse | null = null;export function getLangfuse(): Langfuse { if (!langfuseInstance) { langfuseInstance = new Langfuse({ publicKey: config.LANGFUSE_PUBLIC_KEY, secretKey: config.LANGFUSE_SECRET_KEY, baseUrl: config.LANGFUSE_HOST, }); } return langfuseInstance;}const pricingProvider = new DeepgramPricingProvider();export function tracePunchItemExtraction( transcript: string, result: PunchListExtraction, usage: { inputTokens: number; outputTokens: number },): void { const cost = pricingProvider.estimateCost({ model: "nova-2-stt", inputTokens: usage.inputTokens, outputTokens: usage.outputTokens, }); const trace = getLangfuse().trace({ name: "punch-item-extraction", metadata: { recipe: "punch-list-capture", itemCount: result.items.length, inputTokens: usage.inputTokens, outputTokens: usage.outputTokens, estimatedCost: cost, }, }); trace.span({ name: "extract-punch-items", input: transcript, output: result, });}
You’ll also need the pricing provider it depends on. Create src/lib/pricing-provider.ts:
Expected output: Every punch-item extraction creates a Langfuse trace with a span containing the transcript, extracted items, token counts, and estimated cost.
Step 11: Create the Hono mobile API server
Create src/hono-app.ts — this is the mobile-facing API that Hono runs inside the Next.js process:
ts
import { Hono } from "hono";import { HTTPException } from "hono/http-exception";import { cors } from "hono/cors";import { authMiddleware } from "./services/auth-service.js";import { transcribeAudio } from "./services/transcription-service.js";import { extractPunchItemsFromTranscript, extractPunchItemsFromPhotoDescription } from "./services/punch-list-extractor.js";import { createMemoryService, storePunchItemMemory } from "./services/memory-service.js";import { createVoiceSessionManager } from "./services/voice-pipeline.js";const app = new Hono();app.use("*", cors({
Expected output: The Hono app handles three routes — POST /voice/upload (audio file → transcript → items), POST /photo/upload (image → data URI → items), and GET /session/:id/status (session state). All routes are protected by authMiddleware and CORS is open for mobile field devices.
Step 12: Add the Next.js REST API routes for item management
Create app/api/punch-items/route.ts for listing and creating punch items through the standard Next.js App Router:
Create the simple in-memory store that backs these routes at src/lib/punch-item-store.ts:
ts
import type { PunchItem } from "../types.js";const store = new Map<string, PunchItem>();export function getAllPunchItems(): PunchItem[] { return Array.from(store.values());}export function getPunchItemById(id: string): PunchItem | undefined { return store.get(id);}export function createPunchItem(item: PunchItem): PunchItem { store.set(item.id, item); return item;}export function updatePunchItem(id: string, updates: Partial<PunchItem>): PunchItem | undefined { const existing = store.get(id); if (!existing) return undefined; const updated = { ...existing, ...updates, updatedAt: new Date().toISOString() }; store.set(id, updated); return updated;}
Expected output: These routes give you a CRUD API for punch items: GET /api/punch-items?projectId=X&status=open, POST /api/punch-items (creates and returns 201), GET /api/punch-items/:id, and PATCH /api/punch-items/:id. All use NextRequest/NextResponse and return structured error JSON.
Step 13: Set up Next.js instrumentation
Create src/instrumentation.ts to initialize Langfuse and OpenTelemetry at server startup:
Verify that next.config.ts has the correct instrumentation hook set:
ts
import type { NextConfig } from "next";const nextConfig = { experimental: { instrumentationHook: true, },} as NextConfig;export default nextConfig;
Expected output: When Next.js starts in Node.js mode, register() initializes Langfuse and OpenTelemetry observability before any requests are served. The NEXT_RUNTIME guard prevents failures in Edge runtime. The instrumentationHook: true flag (not instrumentation, not clientInstrumentationHook) must be exactly this spelling — without it, register() is dead code.
Step 14: Run the tests
The test suite covers every service with MSW-mocked HTTP, direct function invocation, and a full integration test. Here’s how a representative service test looks — tests/services/guardrail-service.test.ts tests content validation:
ts
import { describe, it, expect, vi } from "vitest";import { TimeoutError, BudgetExceededError, ValidationError } from "@reaatech/guardrail-chain";import type { PunchItem } from "../../src/types.js";describe("GuardrailService", () => { async function getGuardrailService() { return await import("../../src/services/guardrail-service.js"); } it("validateTranscriptContent passes on safe text", async () => { const { buildPunchListGuardrails, validateTranscriptContent } = await getGuardrailService(); const chain = buildPunchListGuardrails(); const result = await validateTranscriptContent("Nail pop at drywall joint in unit 302", chain); expect(result.allowed).toBe(true); }); it("handles TimeoutError with fail-open", async () => { const { buildPunchListGuardrails, validateTranscriptContent } = await getGuardrailService(); const mockChain = buildPunchListGuardrails(); vi.spyOn(mockChain, "execute").mockRejectedValue(new TimeoutError("Execution timed out")); const result = await validateTranscriptContent("test input", mockChain); expect(result.allowed).toBe(true); }); it("handles ValidationError with fail-closed", async () => { const { buildPunchListGuardrails, validateExtractedItem } = await getGuardrailService(); const mockChain = buildPunchListGuardrails(); vi.spyOn(mockChain, "execute").mockRejectedValue(new ValidationError("Invalid input")); const item: PunchItem = { id: "test", title: "Test", description: "Test description", severity: "low", status: "open", photoUrls: [], projectId: "p1", createdAt: new Date().toISOString(), updatedAt: new Date().toISOString(), }; const result = await validateExtractedItem(item, mockChain); expect(result.allowed).toBe(false); expect(result.failure).toBe("Invalid input"); });});
Run the full suite with:
terminal
pnpm test
Then run the type checker and linter:
terminal
pnpm typecheckpnpm lint
Expected output: All tests pass, TypeScript compiles without errors, and ESLint reports zero violations. Coverage thresholds of 90% across lines, branches, functions, and statements are met.
Next steps
Add a database store — replace the in-memory PunchItemStore Map with PostgreSQL or SQLite using Prisma or Drizzle, so items survive restarts
Deploy beyond Next.js — the Hono server (src/hono-app.ts) can run as a standalone Node process or be mounted in a serverless function independently
Add a mobile app client — build a React Native or Swift/Kotlin app that hits the Hono endpoints for real-time field capture with offline queue support
Integrate real vision — the photo endpoint currently sends base64 data URIs as text; wire a multimodal model directly for true visual defect detection
Add user authentication — replace the single API key with JWT-based auth and per-project access controls using the extractScopes utility already imported
"You are a construction punch-list extraction agent. Extract actionable defect items from jobsite voice memo transcripts. For each item: title the defect, describe its condition, estimate severity (low/medium/high/critical) based on safety and rework impact, and suggest a trade category.";
const PHOTO_SYSTEM_PROMPT =
"You are a construction punch-list extraction agent. Extract actionable defect items from visual inspection descriptions. For each item: title the defect, describe its visible condition, estimate severity (low/medium/high/critical) based on safety and rework impact, and suggest a trade category.";
const SEVERITY_SYSTEM_PROMPT = "Classify construction defect severity.";
export async function extractPunchItemsFromTranscript(
const photoAnalysisText = `A construction site image is provided at the following data URI: ${dataUri}. Identify any visible defects, damage, safety hazards, or punch-list items. For each defect, describe its location, estimated severity, and suggested trade category.`;