E-commerce support teams hosting cost-effective vLLM models find it hard to coordinate multiple specialist agents; misrouted questions cause customer frustration and agent loops.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a multi-agent e-commerce support routing system using vLLM for model serving, LangGraph for state machine orchestration, and the REAA agent-handoff protocol for compression and routing. You will create three specialist agents — Product, Order, and Returns — that share a single vLLM endpoint. Incoming customer messages go through a LangGraph workflow that routes to the best-fit agent, compresses context when it exceeds token limits, and persists conversation state in Upstash Redis with automatic retry on transient failures.
Prerequisites
Node.js 22+ and pnpm 10+ installed
An Upstash Redis account (free tier works) — you will need the REST URL and token
A running vLLM server with an OpenAI-compatible endpoint (or access to one) hosting at least one chat model
Basic familiarity with Next.js App Router, TypeScript, and LangGraph concepts
Step 1: Scaffold the project and install dependencies
Start from an empty directory and create the package.json with all dependencies exact-pinned. The project uses Next.js 16 with the App Router, Zod for configuration validation, LangGraph for orchestration, the AI SDK for vLLM communication, Upstash Redis for session persistence, p-retry for resilience, and four REAA agent-handoff packages for routing, compression, and protocol orchestration.
Expected output: pnpm resolves all dependencies and creates pnpm-lock.yaml and node_modules/. Every version string is bare semver — no ^ or ~ prefixes.
Step 2: Create the configuration schema and shared types
Create src/lib/config.ts — a Zod-validated config object that pulls every setting from process.env. This is the single source of truth for all environment variables in the application.
ts
import { z } from "zod";export const configSchema = z.object({ VLLM_BASE_URL: z.string().min(1), VLLM_API_KEY: z.string().min(1), PRODUCT_AGENT_MODEL: z.string().min(1), ORDER_AGENT_MODEL: z.string().min(1), RETURNS_AGENT_MODEL: z.string().min(1), UPSTASH_REDIS_REST_URL: z.string().min(1), UPSTASH_REDIS_REST_TOKEN: z.string().min(1), HANDOFF_CONFIDENCE_THRESHOLD: z.coerce.number().min(0).max(1).default(0.7), HANDOFF_AMBIGUITY_THRESHOLD: z.coerce.number().min(0).max(1).default(0.15), HANDOFF_MAX_ALTERNATIVES: z.coerce.number().int().min(1).default(3), CONTEXT_TOKEN_BUDGET: z.coerce.number().int().min(0).default(4000), PRESERVE_RECENT_MESSAGES: z.coerce.number().int().min(0).default(3),});export type AppConfig = z.infer<typeof configSchema>;export function loadConfig(): AppConfig { const result = configSchema.safeParse(process.env); if (!result.success) { const missing = result.error.issues.map((i) => i.path.join(".")).join(", "); throw new Error(`Configuration validation failed: ${missing}`); } return result.data;}
Now create src/lib/types.ts with the shared Zod schemas and TypeScript types used across all modules:
Expected output: Both files compile without type errors. ChatRequestSchema enforces that message is a non-empty string, and configSchema coerces numeric env vars from strings with sensible defaults.
Step 3: Set up environment variables
Create .env.example with placeholder entries for every variable your code reads. Never commit real values.
env
# Env vars used by vllm-multi-agent-handoff-for-e-commerce-support-routing.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# vLLM endpoint (OpenAI-compatible API)VLLM_BASE_URL=<https://your-vllm-server.example.com/v1>VLLM_API_KEY=<your-api-key># Per-agent model IDs hosted on vLLMPRODUCT_AGENT_MODEL=<model-name>ORDER_AGENT_MODEL=<model-name>RETURNS_AGENT_MODEL=<model-name># Upstash Redis for session persistenceUPSTASH_REDIS_REST_URL=<your-upstash-redis-url>UPSTASH_REDIS_REST_TOKEN=<your-upstash-redis-token># Handoff routing thresholdsHANDOFF_CONFIDENCE_THRESHOLD=0.7HANDOFF_AMBIGUITY_THRESHOLD=0.15HANDOFF_MAX_ALTERNATIVES=3# Context compressionCONTEXT_TOKEN_BUDGET=4000PRESERVE_RECENT_MESSAGES=3
Copy this to .env.local and fill in your actual values:
terminal
cp .env.example .env.local
Expected output: Every process.env.X referenced in your source code has a corresponding entry here. The optional numeric variables have defaults in the Zod schema, so you only need to provide the seven required vars (endpoints, keys, and model names).
Step 4: Register the three e-commerce specialist agents
Create src/handoff/agents.ts — an agent registry powered by AgentRegistry from @reaatech/agent-handoff-routing. This module defines the capabilities of your Product, Order, and Returns specialists and exports functions to query them.
Expected output:getRegisteredAgents() returns an array of three agents. Each agent object has all ten required AgentCapabilities fields (agentId, agentName, skills, domains, maxConcurrentSessions, currentLoad, languages, specializations, availability, version).
Step 5: Create the vLLM client adapter
Create src/handoff/vllm-client.ts — a thin wrapper around @ai-sdk/openai-compatible and the Vercel AI SDK. This module creates chat model instances pointing to your vLLM server and calls generateText to produce responses.
Expected output:chatWithAgent("product-model", [...], "You are a product specialist") calls the vLLM endpoint at VLLM_BASE_URL and returns an object with text (the model’s reply) and usage (token counts).
Step 6: Build the compression service
Create src/handoff/compression.ts — uses HybridCompressor from @reaatech/agent-handoff-compression with a SimpleTokenCounter to compress conversation history before handoff. This keeps context within vLLM token limits.
Expected output:compressBeforeHandoff(messages, 2000) returns a CompressedContext with summary, keyFacts, intents, entities, openItems, originalTokenCount, compressedTokenCount, and compressionRatio. If compression fails with a CompressionError, it falls back to a degraded summary that preserves all messages as-is.
Step 7: Implement the Redis-backed session store
Create src/handoff/session-store.ts — persists conversation state in Upstash Redis with p-retry for transient-failure resilience. Each session has a 30-minute TTL.
ts
import { Redis } from "@upstash/redis";import pRetry, { AbortError } from "p-retry";import type { SessionState } from "../lib/types.js";const redisUrl = process.env.UPSTASH_REDIS_REST_URL;const redisToken = process.env.UPSTASH_REDIS_REST_TOKEN;const redis = new Redis({ url: redisUrl ?? "", token: redisToken ?? "",});const SESSION_TTL_SECONDS = 1800;export function isNonRetryableError(error: unknown): boolean { return error instanceof AbortError;}export async function createSession(sessionId: string, state: SessionState): Promise<void> { await pRetry( async () => { const result: unknown = await redis.setex(sessionId, SESSION_TTL_SECONDS, JSON.stringify(state)); if (result !== "OK") { throw new AbortError(new Error("Non-retryable Redis error: " + JSON.stringify(result))); } }, { retries: 3 }, );}export async function getSession(sessionId: string): Promise<SessionState | null> { return pRetry(async () => { const raw = await redis.get(sessionId); if (raw === null || typeof raw !== "string") return null; return JSON.parse(raw) as SessionState; }, { retries: 3 });}export async function updateSession(sessionId: string, state: SessionState): Promise<void> { await pRetry( async () => { await redis.set(sessionId, JSON.stringify(state)); }, { retries: 3 }, );}export async function deleteSession(sessionId: string): Promise<void> { await redis.del(sessionId);}
Expected output:createSession("s1", state) writes to Redis with a 1800-second TTL. getSession("s1") retrieves and parses it. If the Redis setex call returns something other than "OK", the AbortError stops retries immediately instead of wasting retry budget.
Step 8: Wire the confidence-based handoff router
Create src/handoff/router.ts — builds a CapabilityBasedRouter that scores agents on skill match (40%), domain match (30%), load factor (20%), and language match (10%).
Expected output:routeMessage(messages, registeredAgents, config) returns a RoutingDecision discriminated union with type "primary" (best agent found), "clarification" (ambiguous between two agents), or "fallback" (no suitable agent).
Step 9: Build the HandoffManager orchestrator
Create src/handoff/manager.ts — wires together the HandoffManager from @reaatech/agent-handoff-protocol with routing, compression, and a pass-through transport. This layer handles the full handoff lifecycle: compress → route → validate → transport → accept/reject → fallback.
ts
import { HandoffManager, createHandoffConfig, CapabilityBasedRouter, TransportFactory } from "@reaatech/agent-handoff-protocol";import { HybridCompressor, SimpleTokenCounter } from "@reaatech/agent-handoff-compression";import type { HandoffContext, HandoffResult, TransportLayer, HandoffResponse, TransportCapabilities } from "@reaatech/agent-handoff";export function onHandoffStart({ handoffId, trigger }: { handoffId: string; trigger: { type: string } }) { console.log(`Handoff ${handoffId} started (${trigger.type})`);}export function onHandoffComplete({ handoffId, duration, receivingAgent }: { handoffId: string; duration: number; receivingAgent
Expected output:executeAgentHandoff(validContext) runs through the full handoff lifecycle and returns a HandoffResult. The four lifecycle listeners log start, completion, rejection, and error events to the console.
Step 10: Create the LangGraph state machine
Create src/handoff/graph.ts — the core orchestration layer. A six-node LangGraph workflow routes the user’s message, compresses context if needed, dispatches to the correct specialist agent, and persists the result back to Redis.
ts
import { Annotation, StateGraph, MessagesAnnotation } from "@langchain/langgraph";import { AIMessage, type BaseMessage } from "@langchain/core/messages";import { loadConfig } from "../lib/config.js";import { chatWithAgent } from "./vllm-client.js";import { routeMessage } from "./router.js";import { getRegisteredAgents } from "./agents.js";import { compressBeforeHandoff, estimateTokens } from "./compression.js";import { getSession, updateSession } from "./session-store.js";import type { ChatMessage } from "../lib/types.js";import type { RoutingDecision } from "@reaatech/agent-handoff";
Expected output: The graph compiles with six nodes (routing → compressor → conditional dispatch to productAgent, orderAgent, returnsAgent, or fallback). runConversation("s1", "Where is my order?") runs the full workflow and returns a reply from the order specialist.
Step 11: Write the API route handler
Create app/api/chat/route.ts — a Next.js App Router POST endpoint that accepts chat messages, validates them against the Zod schema, and delegates to the LangGraph workflow.
pnpm typecheckpnpm lintpnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: TypeScript compiles cleanly. ESLint reports zero warnings. Vitest reports numFailedTests: 0 and at least 50 total tests. Coverage thresholds hit at least 90% on lines, functions, and statements, and 70% on branches, for runtime code under src/**/*.ts and app/**/route.ts.
Next steps
Add language support — The language parameter is already threaded through runConversation and the ChatRequest schema. Wire it into the agent system prompts to enable multilingual responses.
Implement real vLLM transport — Replace the pass-through transport in manager.ts with an actual MCP or A2A transport for true agent-to-agent communication across network boundaries.
Add clarification handling — The CapabilityBasedRouter can return a "clarification" routing decision when two agents score similarly. Extend the graph to prompt the user for disambiguation instead of defaulting to the product agent.
Add human-in-the-loop escalation — When the "fallback" node fires, route the session to a human agent queue with the full compressed context as a briefing.
export async function executeAgentHandoff(context: HandoffContext): Promise<HandoffResult> {
return globalManager.executeHandoff(context);
}
const SYSTEM_PROMPTS: Record<string, string> = {
"product-agent": "You are an e-commerce product specialist. Help customers find products, compare items, check pricing and availability, and provide product details and recommendations.",
"order-agent": "You are an e-commerce order specialist. Help customers with order status, shipping tracking, payment issues, and order modifications.",
"returns-agent": "You are an e-commerce returns specialist. Help customers with returns, refunds, exchanges, RMA requests, and warranty claims.",
fallback: "I'm sorry, I couldn't find a suitable specialist for your request. Please contact human support for further assistance.",
};
function mapLangChainRole(role: string): "user" | "assistant" | "system" {
if (role === "human") return "user";
if (role === "ai") return "assistant";
return "system";
}
function baseMessageContent(m: BaseMessage): string {