Files · Vertex AI Multi-Agent Handoff for SMB Field Service Dispatch
84 (1 binary, 754.5 kB total)attempt 1
README.md·6599 B·markdown
markdown
# Vertex AI Multi-Agent Handoff for SMB Field Service Dispatch
> Routes incoming field service requests to scheduling, inventory, or billing specialist AI agents with confidence-based human fallback and budget-aware model selection on Vertex AI.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade multi-agent handoff systems with the `@reaatech/*` package family.
## What it does
This system provides an intelligent dispatch layer that:
- Accepts natural-language field service requests (e.g., "Schedule a repair for Tuesday", "Check stock for part #A100", "What's the invoice for job 451?")
- Classifies the intent using a `ConfidenceRouter` that maps utterances to one of three specialist agents: **scheduling**, **inventory**, or **billing**
- Routes the request to the appropriate agent with full session context preservation
- Selects a cost-optimized Vertex AI Gemini model (`gemini-2.5-pro`, `gemini-2.5-flash`, or `gemini-2.0-flash`) via the LLM Router Engine
- Enforces per-agent budget limits (`$5.00` scheduling, `$3.00` inventory, `$7.00` billing) with automatic model downgrade when spend approaches the soft cap
- Falls back to a clarification prompt when confidence is low, or to a human handoff message when confidence is below the fallback threshold
- Logs every handoff event to an external webhook and Langfuse for observability
## How it works
```
User Message → POST /api/dispatch → ConfidenceClassifier
├─ low confidence (< 0.3) → FALLBACK response
├─ medium confidence (0.3–0.8) → CLARIFY prompt with agent options
└─ high confidence (>= 0.8) → HandoffService → Specialist Agent
├─ ModelRouterService selects cheapest capable Gemini model
├─ BudgetController checks spend against per-agent limit
│ ├─ under soft cap → allow
│ ├─ over soft cap → downgrade to cheaper model
│ └─ over hard cap → block with budget-exceeded message
├─ SessionManager preserves conversation context
├─ Vertex AI Gemini generates the response
└─ WebhookLogger POSTs event to external webhook + Langfuse
```
## Prerequisites
- **Node.js** >= 22
- **pnpm** 10.x
- **GCP project** with Vertex AI API enabled
- **Langfuse account** (optional — for observability tracing)
## Getting Started
```bash
# Install dependencies
pnpm install
# Configure environment
cp .env.example .env
# Edit .env: set GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION,
# and optional LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, DISPATCH_WEBHOOK_URL
# Start the development server
pnpm dev
```
## Packages Used
| Package | Version | Purpose |
|---|---|---|
| `@google-cloud/vertexai` | 1.12.0 | Vertex AI Gemini model access |
| `@langchain/core` | 1.1.49 | LangChain message types |
| `@langchain/langgraph` | 1.4.2 | Multi-agent state graph |
| `@reaatech/agent-budget-engine` | 0.1.1 | Per-agent budget enforcement |
| `@reaatech/agent-handoff` | 0.1.0 | Typed handoff protocol with retry |
| `@reaatech/confidence-router` | 0.1.1 | Intent classification and routing |
| `@reaatech/llm-router-engine` | 1.0.1 | Cost-optimized model selection |
| `@reaatech/session-continuity` | 0.1.0 | Session context and sliding-window compression |
| `dotenv` | 17.4.2 | Environment variable loading |
| `express` | 5.2.1 | Webhook listener server |
| `langfuse-langchain` | 3.38.20 | Langfuse observability tracing |
| `next` | 16.2.9 | Next.js framework |
| `react` | 19.2.4 | UI library |
| `zod` | 4.4.3 | Request validation |
| `msw` (dev) | 2.14.6 | HTTP mocking in tests |
| `vitest` (dev) | 4.1.9 | Test runner |
| `typescript` (dev) | 5.9.3 | Type checking |
## API Reference
### `POST /api/dispatch`
Routes a field service request to the appropriate specialist agent.
**Request body:**
```json
{
"sessionId": "optional-existing-session-uuid",
"message": "Schedule a repair for customer ABC at 3pm Tuesday",
"userId": "user-123"
}
```
**Response — ROUTE:**
```json
{
"type": "ROUTE",
"target": "scheduling",
"sessionId": "uuid",
"response": "I've scheduled a repair for customer ABC at 3pm on Tuesday."
}
```
**Response — CLARIFY:**
```json
{
"type": "CLARIFY",
"prompt": "I'm not sure which area this relates to. Can you clarify?",
"options": ["scheduling", "inventory", "billing"]
}
```
**Response — FALLBACK:**
```json
{
"type": "FALLBACK",
"message": "I couldn't determine how to help. Please rephrase your request."
}
```
### `GET /api/health`
Returns the health status and available agents.
```json
{
"status": "ok",
"agents": ["scheduling", "inventory", "billing"]
}
```
## Architecture
### AgentType routing
The `ConfidenceClassifier` wraps `@reaatech/confidence-router` to classify incoming messages against three agent types. When confidence for a candidate agent exceeds `0.8`, the request is routed directly. Between `0.3` and `0.8`, the system asks the user to clarify their intent. Below `0.3`, a fallback message is returned suggesting human intervention.
### Budget enforcement state machine
The `BudgetController` (`@reaatech/agent-budget-engine`) manages per-agent spend through four states:
- **Active** — spend is under the soft cap (80% of limit); all models and tools available
- **Warned** — spend exceeds 80% of limit; `threshold-breach` event fires, logging a warning
- **Degraded** — spend exceeds soft cap but is under hard cap; expensive models are auto-downgraded (e.g., `gemini-2.5-pro` → `gemini-2.5-flash`)
- **Stopped** — spend hits the hard cap (100% of limit); `hard-stop` event fires, all further requests are blocked with a budget-exceeded message
This ensures no single agent can exhaust the project Vertex AI budget.
### Session continuity
The `SessionManager` (`@reaatech/session-continuity`) preserves conversation context across agent handoffs. When a session exceeds the token budget (4,096 tokens with 500 reserved), a sliding-window compression strategy targets 3,500 tokens by trimming the oldest messages. Handoff events (`agent:handoff`) and compression runs (`compression:applied`) are emitted for observability.
## Testing
```bash
# Run all tests with coverage
pnpm test
# Run tests with verbose output
pnpm vitest run --reporter=verbose
# Run type checker
pnpm typecheck
# Run linter
pnpm lint
```
## License
MIT — see [LICENSE](./LICENSE).