Vertex AI Multi-Agent Handoff for SMB Field Service Dispatch
Route incoming field service requests to specialist AI agents for scheduling, inventory, and billing, with confidence-based human fallback and spend-aware model selection on Vertex AI.
SMB field service dispatchers juggle multiple systems (booking, parts lookup, invoicing) and often lose context when transferring a customer. A single generic chatbot cannot handle domain-specific logic, leading to misrouted requests and missed up-sells.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a multi-agent handoff system for SMB field service dispatch on Vertex AI. You’ll create a Next.js application that routes incoming customer requests to specialist AI agents for scheduling, inventory, and billing, with confidence-based human fallback and spend-aware model selection. The dispatch orchestrator uses the REAA handoff protocol to transfer conversations between agents, preserves session history across handoffs via session continuity, and caps per-agent spend with budget-aware model downgrades. By the end, you’ll have a working dispatch API backed by Gemini models on Vertex AI.
Prerequisites
Node.js 22+ and pnpm 10 installed.
A Google Cloud project with the Vertex AI API enabled. You need your project ID and a default location (usually us-central1).
A Langfuse account (optional) for LLM observability tracing.
Basic familiarity with TypeScript, Next.js App Router, and REST APIs.
Step 1: Scaffold the project and install dependencies
Create a new Next.js project with the App Router and install all required dependencies. The scaffold provides the correct tsconfig.json, next.config.ts, vitest.config.ts, and linting configs — you won’t touch root configs.
Expected output:pnpm install completes without errors, and pnpm typecheck exits 0.
Step 2: Configure environment variables
Create a .env.example file with placeholders for every service the application connects to. The application reads these at runtime via process.env.
env
# Env vars used by vertex-ai-multi-agent-handoff-for-smb-field-service-dispatch.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# GCP Vertex AIGOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1# Langfuse (optional — for external LLM observability)LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key># External webhook for streaming handoff events (optional)DISPATCH_WEBHOOK_URL=<optional-webhook-url>
Copy it to .env and fill in your real values:
terminal
cp .env.example .env
Expected output: The .env file exists with your GCP project ID and location set.
Step 3: Define the core types
Create src/lib/types.ts with the discriminated unions and interfaces that model the dispatch domain. These types are shared across every service.
DispatchResponse is a discriminated union — the type field tells callers whether the system routed the request, needs clarification, or fell back to a human. AgentType is a string union of exactly three specialist roles. Zod schemas provide runtime validation for API inputs.
Expected output:pnpm typecheck exits 0 with the new file.
Step 4: Create the in-memory adapters for spend tracking and session storage
The REAA packages expect concrete implementations for storage and token counting. You’ll write lightweight in-memory adapters that implement the same interfaces.
Spend store (src/lib/spend-store.ts)
This adapter tracks cumulative spend per scope key and is consumed by BudgetController from @reaatech/agent-budget-engine.
This adapter implements the IStorageAdapter interface from @reaatech/session-continuity using Map-backed in-memory storage. It supports the full CRUD lifecycle for sessions and messages, including version conflict checking on updates. The code below shows the essential interface — the full implementation with all methods (updateMessage, deleteMessage, deleteAllMessages, getExpiredSessions, health, close, and getNextSequence) is included in the companion artifact.
ts
import type { IStorageAdapter, Session, SessionId, Message, MessageId, MessageQueryOptions, SessionFilters, UpdateSessionOptions, HealthStatus,} from "@reaatech/session-continuity";export class InMemorySessionStorage implements IStorageAdapter { private sessions = new Map<string, Session>(); private messages = new Map<string, Message[]>(); createSession(session: Omit<Session, "id" | "createdAt" |
Token counter (src/lib/token-counter.ts)
The TokenCounter interface provides character-based estimation — roughly 1 token per 4 characters — for session budget management.
ts
import type { TokenCounter, Message } from "@reaatech/session-continuity";class SimpleTokenCounter implements TokenCounter { readonly model = "simple-char-based"; readonly tokenizer = "character"; count(text: string): number { return Math.ceil(text.length / 4); } countTokens(text: string): number { return this.count(text); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { if (typeof msg.content === "string") { total += this.countTokens(msg.content); } else { for (const part of msg.content) { if (part.type === "text") total += this.countTokens(part.text); } } } return total; }}export function createSimpleTokenizer(): TokenCounter { return new SimpleTokenCounter();}
Each adapter exports a factory function so services can instantiate them without depending on constructor internals.
Expected output: All three files compile with pnpm typecheck — no type errors from the REAA interfaces.
Step 5: Build the Vertex AI client
Create src/lib/vertex-client.ts — the thin wrapper around @google-cloud/vertexai that your application uses to call Gemini models. It exposes a VertexClient interface with generateContent (non-streaming) and generateContentStream methods. Helper functions extract text and token counts from Vertex AI responses.
ts
// NOTE: @google-cloud/vertexai is deprecated since June 2025.// The recipe pins it explicitly per the package list. For new projects,// use @google/genai instead.import { VertexAI } from "@google-cloud/vertexai";export class VertexApiError extends Error { constructor(message: string, public readonly statusCode?: number) { super(message); this.name = "VertexApiError"; }}export interface GenerateContentResult { text: string; inputTokens: number; outputTokens: number;}
Model executor for LLM Router
Create src/lib/vertex-model-executor.ts — the callback that LLMRouter calls to actually execute a model after selecting it. This adapter bridges the VertexClient interface to the executeModel signature expected by LLMRouter.fromConfig().
Expected output:pnpm typecheck passes. The VertexClient interface is used throughout the service layer.
Step 6: Configure the handoff protocol
Create src/lib/handoff-config.ts — the central handoff configuration using @reaatech/agent-handoff. This defines default routing thresholds.
ts
import { createHandoffConfig, defaultHandoffConfig } from "@reaatech/agent-handoff";import type { HandoffConfig } from "@reaatech/agent-handoff";export { defaultHandoffConfig };export const handoffConfig: HandoffConfig = createHandoffConfig({ routing: { minConfidenceThreshold: 0.6 },});
The handoffConfig object sets the minimum confidence level for automatic routing.
Expected output:pnpm typecheck passes without errors.
Step 7: Build the service layer
The service layer contains six independent services, each wrapping a REAA package. They’re composed together in the DispatchService orchestrator in the next step.
Budgets are defined with a soft cap at 80% (triggers a warning) and a hard cap at 100% (blocks further requests). The auto-downgrade policy swaps gemini-2.5-pro for gemini-2.5-flash when a budget approaches its limit.
Session Service (src/services/session-service.ts)
Wraps SessionManager from @reaatech/session-continuity to manage conversation state, compression, and handoff transitions.
The compression strategy uses a sliding window that keeps the conversation under 3,500 tokens while retaining at least the 5 most recent messages. When the budget overflows, older messages are compressed rather than dropped.
Handoff Service (src/services/handoff-service.ts)
Wraps TypedEventEmitter, withRetry, and pickDefined from @reaatech/agent-handoff to orchestrate agent-to-agent handoffs with retry logic and event emission.
withRetry wraps the handoff operation with exponential backoff (up to 3 retries, doubling delay from 100ms to 5 seconds). pickDefined constructs the handoff payload with only non-undefined keys. Three lifecycle events are emitted so external loggers can observe transitions.
Webhook Logger (src/services/webhook-logger.ts)
Streams handoff events and dispatch requests to an external webhook and optional Langfuse tracing.
Expected output: Each service file type-checks independently. Run pnpm typecheck to confirm.
Step 8: Build the dispatch orchestrator
Create src/services/dispatch-service.ts — the main orchestrator that composes all six services and implements the end-to-end request lifecycle.
ts
import { ConfidenceClassifier } from "./confidence-classifier.js";import { ModelRouterService, ModelSelectionError } from "./model-router-service.js";import { BudgetService, BudgetExceededError } from "./budget-service.js";import { SessionService, SessionNotFoundError } from "./session-service.js";import { HandoffService } from "./handoff-service.js";import { WebhookLoggerService } from "./webhook-logger.js";import type { VertexClient } from "../lib/vertex-client.js";import type { AgentType, DispatchRequest, DispatchResponse, HandoffEvent } from "../lib/types.js";export class DispatchService { private confidenceClassifier: ConfidenceClassifier;
The processDispatch method implements a ten-step pipeline: session resolution, message storage, intent classification, CLARIFY/FALLBACK early returns, model selection, budget check, agent handoff, LLM execution, response storage and spend recording, and event logging. Every error path returns a typed FALLBACK response.
Expected output:pnpm typecheck passes for all service files.
Step 9: Create the Next.js API routes
Wire the DispatchService into Next.js App Router route handlers at app/api/dispatch/route.ts and app/api/health/route.ts.
Dispatch route (app/api/dispatch/route.ts)
ts
import { type NextRequest, NextResponse } from "next/server";import { DispatchRequestSchema } from "../../../src/lib/types.js";import { ConfidenceClassifier } from "../../../src/services/confidence-classifier.js";import { ModelRouterService } from "../../../src/services/model-router-service.js";import { BudgetService } from "../../../src/services/budget-service.js";import { SessionService } from "../../../src/services/session-service.js";import { HandoffService } from "../../../src/services/handoff-service.js";import { WebhookLoggerService } from "../../../src/services/webhook-logger.js";import { DispatchService } from "../../../src/services/dispatch-service.js";import { createSessionStorage } from "../../../src/lib/session-storage.js";import { createSimpleTokenizer } from "../../../src/lib/token-counter.js";import { createVertexClient } from "../../../src/lib/vertex-client.js";const sessionStorage = createSessionStorage();const tokenizer = createSimpleTokenizer();const vertexClient = createVertexClient();const confidenceClassifier = new ConfidenceClassifier();const modelRouterService = new ModelRouterService();const budgetService = await BudgetService.create();const sessionService = new SessionService(sessionStorage, tokenizer);const handoffService = new HandoffService();const webhookLoggerService = new WebhookLoggerService();const dispatchService = new DispatchService( confidenceClassifier, modelRouterService, budgetService, sessionService, handoffService, webhookLoggerService, vertexClient,);export async function POST(req: NextRequest) { try { const body = await req.json() as Record<string, unknown>; const parsed = DispatchRequestSchema.safeParse(body); if (!parsed.success) { return NextResponse.json({ error: "Invalid request", details: parsed.error.issues }, { status: 400 }); } const result = await dispatchService.processDispatch(parsed.data); return NextResponse.json(result); } catch (err) { return NextResponse.json({ error: err instanceof Error ? err.message : "Unknown error" }, { status: 500 }); }}export function GET() { return NextResponse.json({ status: "dispatch API ready" });}
Health route (app/api/health/route.ts)
ts
import { NextResponse } from "next/server";export function GET() { return NextResponse.json({ status: "ok", agents: ["scheduling", "inventory", "billing"] });}
API calls use NextRequest and NextResponse.json() — never bare Request/Response. This ensures proper Content-Type: application/json headers.
Expected output:pnpm dev starts without errors. curl http://localhost:3000/api/health returns {"status":"ok","agents":["scheduling","inventory","billing"]}.
Step 10: Create the LangGraph state machine
Create src/graph/agent-state-graph.ts — a LangGraph-based multi-agent state machine that provides structural orchestration alongside the REAA handoff protocol.
ts
import { StateGraph, MessagesAnnotation, Annotation } from "@langchain/langgraph";import { AIMessage } from "@langchain/core/messages";import type { AgentType } from "../lib/types.js";const AgentState = Annotation.Root({ messages: MessagesAnnotation.spec.messages, currentAgent: Annotation<AgentType | null>({ reducer: (left?: AgentType | null, right?: AgentType | null) => right ?? left ?? null, default: () => null, }),});
The graph routes the __start__ node to the dispatch node, which classifies the intent and sets currentAgent. A conditional edge then routes to the matching specialist node (scheduling, inventory, or billing) or ends the graph. Each specialist node uses dynamic import() so Node-only modules don’t cause Edge runtime failures. The safe state.messages.length > 0 guard prevents undefined access when the state starts empty.
Expected output:pnpm typecheck passes.
Step 11: Create the Express webhook server and instrumentation
Express server (src/server.ts)
A standalone Express server on port 3001 that receives external webhook payloads and forwards them to the logger service.
ts
import express from "express";import type { Request, Response } from "express";import { WebhookPayloadSchema } from "./lib/types.js";import { WebhookLoggerService } from "./services/webhook-logger.js";const expressApp = express();expressApp.use(express.json());expressApp.post("/webhook", (req: Request, res: Response) => { const parsed = WebhookPayloadSchema.safeParse(req.body); if (!parsed.success) { res.status(400).json({ error: "Invalid webhook payload" }); return; } const logger = new WebhookLoggerService(); void logger.logHandoffEvent({ type: "handoff_completed", sessionId: parsed.data.sessionId, fromAgent: null, toAgent: "scheduling" as const, timestamp: parsed.data.timestamp, metadata: { event: parsed.data.event, data: parsed.data.data }, }); res.json({ received: true });});expressApp.get("/health", (_req: Request, res: Response) => { res.json({ status: "ok", uptime: process.uptime() });});export { expressApp as app };export function startWebhookServer(port?: number): Promise<void> { return new Promise((resolve) => { expressApp.listen(port ?? parseInt(process.env.PORT ?? "3001", 10), () => { console.log(`Webhook server listening on port ${String(port ?? 3001)}`); resolve(); }); });}
Instrumentation (src/instrumentation.ts)
The Next.js register() function starts the webhook server and initializes services at boot.
ts
export async function register() { if (process.env.NEXT_RUNTIME === "nodejs") { const { initializeApp } = await import("./init.js"); await initializeApp(); }}
Init module (src/init.ts)
ts
import { CallbackHandler } from "langfuse-langchain";let initialized = false;export async function initializeApp(): Promise<void> { if (initialized) return; initialized = true; if (process.env.DISPATCH_WEBHOOK_URL) { const { startWebhookServer } = await import("./server.js"); await startWebhookServer(); } if (process.env.LANGFUSE_PUBLIC_KEY && process.env.LANGFUSE_SECRET_KEY) { try { new CallbackHandler({ publicKey: process.env.LANGFUSE_PUBLIC_KEY, secretKey: process.env.LANGFUSE_SECRET_KEY, }); console.log("[init] Langfuse tracing configured via CallbackHandler"); } catch (err) { console.warn("[init] Failed to initialize Langfuse CallbackHandler:", err instanceof Error ? err.message : String(err)); } } const { BudgetService } = await import("./services/budget-service.js"); await BudgetService.create(); console.log("[init] Vertex AI Multi-Agent Handoff initialized");}
The instrumentationHook flag is mandatory — without it, register() in src/instrumentation.ts is dead code and never executes. The exact key is instrumentationHook, not clientInstrumentationHook or instrumentation.
The test suite uses Vitest with MSW for HTTP mocking. The test setup at tests/setup.ts provides an MSW server that intercepts Vertex AI API calls and returns mock responses.
Run the full test suite using the script defined in package.json:
terminal
pnpm test
The test suite covers:
Unit tests for each in-memory adapter (spend-store, session-storage, token-counter).
Service tests for each REAA wrapper (confidence-classifier, model-router-service, budget-service, session-service, handoff-service).
Integration tests for the dispatch service orchestrator.
API route tests for both POST and GET endpoints with validation and error paths.
Each test file uses vi.mock() to mock external REAA packages and @google-cloud/vertexai so no live HTTP calls are made.
Expected output: All tests pass with numFailedTests=0 and coverage thresholds of 90% or higher on lines, branches, functions, and statements for runtime code in src/ and app/**/route.ts.
Next steps
Replace the in-memory adapters with production storage: swap InMemorySessionStorage for Redis or PostgreSQL via the IStorageAdapter interface, and replace SpendStore with a persistent budget tracker.
Extend the agent roster — add a parts-ordering or customer-followup agent by defining new AgentCapability entries and adding corresponding nodes to the LangGraph state machine.
Add real-time streaming — wire generateContentStream through a Server-Sent Events endpoint so the dispatch UI can show token-by-token responses during handoff.
Deploy to Google Cloud Run — containerize the Next.js application, configure Vertex AI service account credentials via Workload Identity, and scale based on dispatch request volume.