Files · Vertex AI Multi-Agent Handoff for SMB Field Service Dispatch

84 (1 binary, 754.5 kB total)attempt 1
README.md·6599 B·markdown
markdown
# Vertex AI Multi-Agent Handoff for SMB Field Service Dispatch
 
> Routes incoming field service requests to scheduling, inventory, or billing specialist AI agents with confidence-based human fallback and budget-aware model selection on Vertex AI.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade multi-agent handoff systems with the `@reaatech/*` package family.
 
## What it does
 
This system provides an intelligent dispatch layer that:
 
- Accepts natural-language field service requests (e.g., "Schedule a repair for Tuesday", "Check stock for part #A100", "What's the invoice for job 451?")
- Classifies the intent using a `ConfidenceRouter` that maps utterances to one of three specialist agents: **scheduling**, **inventory**, or **billing**
- Routes the request to the appropriate agent with full session context preservation
- Selects a cost-optimized Vertex AI Gemini model (`gemini-2.5-pro`, `gemini-2.5-flash`, or `gemini-2.0-flash`) via the LLM Router Engine
- Enforces per-agent budget limits (`$5.00` scheduling, `$3.00` inventory, `$7.00` billing) with automatic model downgrade when spend approaches the soft cap
- Falls back to a clarification prompt when confidence is low, or to a human handoff message when confidence is below the fallback threshold
- Logs every handoff event to an external webhook and Langfuse for observability
 
## How it works
 
```
User Message → POST /api/dispatch → ConfidenceClassifier
    ├─ low confidence (< 0.3) → FALLBACK response
    ├─ medium confidence (0.3–0.8) → CLARIFY prompt with agent options
    └─ high confidence (>= 0.8) → HandoffService → Specialist Agent
                        ├─ ModelRouterService selects cheapest capable Gemini model
                        ├─ BudgetController checks spend against per-agent limit
                        │   ├─ under soft cap → allow
                        │   ├─ over soft cap → downgrade to cheaper model
                        │   └─ over hard cap → block with budget-exceeded message
                        ├─ SessionManager preserves conversation context
                        ├─ Vertex AI Gemini generates the response
                        └─ WebhookLogger POSTs event to external webhook + Langfuse
```
 
## Prerequisites
 
- **Node.js** >= 22
- **pnpm** 10.x
- **GCP project** with Vertex AI API enabled
- **Langfuse account** (optional — for observability tracing)
 
## Getting Started
 
```bash
# Install dependencies
pnpm install
 
# Configure environment
cp .env.example .env
# Edit .env: set GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION,
# and optional LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, DISPATCH_WEBHOOK_URL
 
# Start the development server
pnpm dev
```
 
## Packages Used
 
| Package | Version | Purpose |
|---|---|---|
| `@google-cloud/vertexai` | 1.12.0 | Vertex AI Gemini model access |
| `@langchain/core` | 1.1.49 | LangChain message types |
| `@langchain/langgraph` | 1.4.2 | Multi-agent state graph |
| `@reaatech/agent-budget-engine` | 0.1.1 | Per-agent budget enforcement |
| `@reaatech/agent-handoff` | 0.1.0 | Typed handoff protocol with retry |
| `@reaatech/confidence-router` | 0.1.1 | Intent classification and routing |
| `@reaatech/llm-router-engine` | 1.0.1 | Cost-optimized model selection |
| `@reaatech/session-continuity` | 0.1.0 | Session context and sliding-window compression |
| `dotenv` | 17.4.2 | Environment variable loading |
| `express` | 5.2.1 | Webhook listener server |
| `langfuse-langchain` | 3.38.20 | Langfuse observability tracing |
| `next` | 16.2.9 | Next.js framework |
| `react` | 19.2.4 | UI library |
| `zod` | 4.4.3 | Request validation |
| `msw` (dev) | 2.14.6 | HTTP mocking in tests |
| `vitest` (dev) | 4.1.9 | Test runner |
| `typescript` (dev) | 5.9.3 | Type checking |
 
## API Reference
 
### `POST /api/dispatch`
 
Routes a field service request to the appropriate specialist agent.
 
**Request body:**
 
```json
{
  "sessionId": "optional-existing-session-uuid",
  "message": "Schedule a repair for customer ABC at 3pm Tuesday",
  "userId": "user-123"
}
```
 
**Response — ROUTE:**
 
```json
{
  "type": "ROUTE",
  "target": "scheduling",
  "sessionId": "uuid",
  "response": "I've scheduled a repair for customer ABC at 3pm on Tuesday."
}
```
 
**Response — CLARIFY:**
 
```json
{
  "type": "CLARIFY",
  "prompt": "I'm not sure which area this relates to. Can you clarify?",
  "options": ["scheduling", "inventory", "billing"]
}
```
 
**Response — FALLBACK:**
 
```json
{
  "type": "FALLBACK",
  "message": "I couldn't determine how to help. Please rephrase your request."
}
```
 
### `GET /api/health`
 
Returns the health status and available agents.
 
```json
{
  "status": "ok",
  "agents": ["scheduling", "inventory", "billing"]
}
```
 
## Architecture
 
### AgentType routing
 
The `ConfidenceClassifier` wraps `@reaatech/confidence-router` to classify incoming messages against three agent types. When confidence for a candidate agent exceeds `0.8`, the request is routed directly. Between `0.3` and `0.8`, the system asks the user to clarify their intent. Below `0.3`, a fallback message is returned suggesting human intervention.
 
### Budget enforcement state machine
 
The `BudgetController` (`@reaatech/agent-budget-engine`) manages per-agent spend through four states:
 
- **Active** — spend is under the soft cap (80% of limit); all models and tools available
- **Warned** — spend exceeds 80% of limit; `threshold-breach` event fires, logging a warning
- **Degraded** — spend exceeds soft cap but is under hard cap; expensive models are auto-downgraded (e.g., `gemini-2.5-pro` → `gemini-2.5-flash`)
- **Stopped** — spend hits the hard cap (100% of limit); `hard-stop` event fires, all further requests are blocked with a budget-exceeded message
 
This ensures no single agent can exhaust the project Vertex AI budget.
 
### Session continuity
 
The `SessionManager` (`@reaatech/session-continuity`) preserves conversation context across agent handoffs. When a session exceeds the token budget (4,096 tokens with 500 reserved), a sliding-window compression strategy targets 3,500 tokens by trimming the oldest messages. Handoff events (`agent:handoff`) and compression runs (`compression:applied`) are emitted for observability.
 
## Testing
 
```bash
# Run all tests with coverage
pnpm test
 
# Run tests with verbose output
pnpm vitest run --reporter=verbose
 
# Run type checker
pnpm typecheck
 
# Run linter
pnpm lint
```
 
## License
 
MIT — see [LICENSE](./LICENSE).