Files · Punch-List Field Capture Agent for Superintendents

84 (1 binary, 758.4 kB total)attempt 1

README.md·4107 B·markdown

markdown

# Punch-List Field Capture Agent for Superintendents
 
A Next.js voice-agent that lets construction superintendents snap photos, record voice memos, and auto-sync punch items to PM software in real time. Built with `@reaatech/voice-agent-core`, `@reaatech/voice-agent-stt`, `@reaatech/media-pipeline-mcp-core`, `@reaatech/media-pipeline-mcp-openai`, `@reaatech/agent-mesh`, and `@reaatech/agent-mesh-classifier`.
 
## Features
 
- Voice transcription via Deepgram STT
- Photo analysis via GPT-4o vision (media pipeline)
- Intent classification for punch-list categories
- Structured extraction of punch items via Instructor
- TTS read-back via ElevenLabs
- Vercel Blob storage for photos and recordings
- Langfuse + OpenTelemetry observability
- Modular sync adapter for PM software integration
 
## Architecture
 
The request pipeline flows as follows: Capture -> STT -> Classify -> Extract -> Persist -> Sync. Audio and photo data enter through a Hono API layer mounted in the Next.js App Router via a catch-all route. The media pipeline processes images through GPT-4o vision, Deepgram transcribes voice, the classifier routes to the correct punch-list category, Instructor extracts structured items, results are persisted to Vercel Blob, and a sync adapter pushes data to the configured PM software.
 
## Tech Stack
 
| Package | Purpose |
|---|---|
| `@reaatech/voice-agent-core` | Voice agent lifecycle and orchestration |
| `@reaatech/voice-agent-stt` | Speech-to-text via Deepgram |
| `@reaatech/media-pipeline-mcp-core` | Media processing pipeline framework |
| `@reaatech/media-pipeline-mcp-openai` | OpenAI integration (GPT-4o vision) |
| `@reaatech/agent-mesh` | Multi-agent coordination |
| `@reaatech/agent-mesh-classifier` | Intent classification for punch-list categories |
| `@ai-sdk/openai` | OpenAI SDK for AI calls |
| `@instructor-ai/instructor` | Structured data extraction |
| `@elevenlabs/elevenlabs-js` | Text-to-speech via ElevenLabs |
| `@vercel/blob` | Cloud storage for media |
| Hono | Lightweight HTTP framework for API routes |
| Next.js | Full-stack framework with App Router |
| Langfuse | LLM observability and tracing |
| OpenTelemetry | Distributed tracing exporter |
 
## Getting Started
 
```bash
pnpm install
cp .env.example .env
pnpm dev
```
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `OPENAI_API_KEY` | OpenAI API key for GPT-4o and embeddings |
| `DEEPGRAM_API_KEY` | Deepgram API key for speech-to-text |
| `ELEVENLABS_API_KEY` | ElevenLabs API key for text-to-speech |
| `BLOB_READ_WRITE_TOKEN` | Vercel Blob storage token for media uploads |
| `LANGFUSE_PUBLIC_KEY` | Langfuse public key for observability |
| `LANGFUSE_SECRET_KEY` | Langfuse secret key for observability |
| `LANGFUSE_BASE_URL` | Langfuse API base URL |
| `PM_SOFTWARE_API_URL` | PM software API endpoint |
| `PM_SOFTWARE_API_KEY` | PM software API key |
| `OTLP_ENDPOINT` | OpenTelemetry OTLP exporter endpoint |
| `NEXT_PUBLIC_APP_NAME` | Public-facing app name |
 
## API Reference
 
| Method | Path | Description | Request Body |
|---|---|---|---|---|
| POST | `/api/voice/transcribe` | Process a voice recording, returns transcription result | `{ audio: Blob }` |
| POST | `/api/photo/analyze` | Process a photo for punch-item analysis, returns analysis result | `{ image: Blob }` |
| POST | `/api/punch-items` | Create a punch item from transcript + photos | `{ transcript, photos, jobSiteId }` |
| GET | `/api/punch-items` | List all punch items (with optional query params: jobSiteId, status, category, priority) | -- |
| GET | `/api/punch-items/:id` | Get a specific punch item by ID | -- |
| PATCH | `/api/punch-items/:id` | Update a punch item (partial update) | `{ category?, description?, location?, status?, priority? }` |
| DELETE | `/api/punch-items/:id` | Delete a punch item | -- |
| POST | `/api/sync` | Trigger sync of pending items to PM software | `--` |
| GET | `/api/sync/status` | Get current sync status | -- |
| POST | `/api/tts` | Convert text to speech audio | `{ text: string, voiceId?: string }` |
 
## Testing
 
```bash
pnpm test
```
 
## License
 
MIT