A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You will build a security gateway that sits in front of a self-hosted vLLM instance. The gateway intercepts every request, runs it through a configurable guardrail pipeline (PII redaction, prompt injection screening, cost pre-check, and content moderation), then forwards safe requests to vLLM using the OpenAI-compatible API. A Next.js admin dashboard on port 3000 lets non-technical users toggle guardrails on and off, view live metrics, and browse the request log.
Prerequisites
Node.js 22 or later
pnpm 10.x
A running vLLM instance with an OpenAI-compatible endpoint (or a mock server for local development)
Familiarity with TypeScript, Express middleware, and Next.js App Router
Step 1: Initialize the project
Create a directory for your project and set up the package manifest. The gateway is a Node.js ESM package; the admin dashboard is a Next.js application in the same repo.
import type { NextConfig } from "next";const nextConfig: NextConfig = {};export default nextConfig;
Step 2: Install dependencies
Install the production dependencies. The @reaatech/guardrail-chain* packages provide the guardrail orchestrator and its observability interface. The @presidio-dev/hai-guardrails package provides the PII detection logic. Express, helmet, and cors form the gateway. The ai and @ai-sdk/openai-compatible packages handle the vLLM client. Pino is the logging adapter.
Note that react and react-dom are Next.js peer dependencies and must be installed explicitly even though Next.js does not list them as direct devDependencies in its own package.
Expected output: pnpm resolves and links all packages. No peer-dependency warnings should appear if your Node version matches the engine constraint.
Step 3: Configure environment variables
The gateway reads connection settings and budget limits from environment variables. Copy the example file and fill in your values.
Create .env:
env
# vLLM endpointVLLM_BASE_URL=http://localhost:8000# Optional: API key for vLLM authenticationVLLM_API_KEY=# Express gateway portEXPRESS_PORT=4000# Comma-separated list of CORS-allowed originsCORS_ORIGINS=http://localhost:3000# Guardrail chain budget overrides (optional)GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS=1000GUARDRAIL_CHAIN_BUDGET_MAX_TOKENS=8000
The VLLM_BASE_URL must point at your running vLLM instance. If vLLM requires an API key, set VLLM_API_KEY here. The CORS_ORIGINS list should include the origin of your admin dashboard.
Step 4: Define shared types
The entire codebase shares a small set of interfaces. Keeping them in one file avoids circular imports.
Create src/types.ts:
ts
import type { Guardrail, GuardrailResult, ChainContext, ChainResult,} from "@reaatech/guardrail-chain";/** * Gateway-level configuration — merges vLLM connection settings * with guardrail-chain budget and per-guardrail overrides. */export interface GatewayConfig { vllmBaseUrl: string; vllmApiKey: string; port: number; corsOrigins: string[];}/** A runtime toggle for a single guardrail. */export interface PolicyToggle { id: string; enabled: boolean;}/** * Latency histogram snapshot for the admin dashboard. * p50/p95/p99 are computed at query time from the raw array. */export interface LatencySnapshot { values: number[]; p50: number; p95: number; p99: number;}/** Aggregated metrics snapshot returned by the /admin/metrics endpoint. */export interface MetricsSnapshot { requests: number; passed: number; blocked: number; latencies: LatencySnapshot;}/** Thrown by the gateway when vLLM calls fail. */export class GatewayError extends Error { public readonly correlationId: string; public readonly statusCode: number; public readonly code: string; constructor( message: string, correlationId: string, statusCode: number, code: string ) { super(message); this.name = "GatewayError"; this.correlationId = correlationId; this.statusCode = statusCode; this.code = code; }}/** Thrown specifically when a vLLM call fails. */export class VLLMError extends GatewayError { constructor(message: string, correlationId: string, statusCode: number) { super(message, correlationId, statusCode, "UPSTREAM_ERROR"); this.name = "VLLMError"; }}/** Re-export guardrail-chain types for use across the codebase. */export type { Guardrail, GuardrailResult, ChainContext, ChainResult };
Step 5: Build the observability layer
The @reaatech/guardrail-chain-observability package defines interfaces for Logger, MetricsCollector, and Tracer. You need to provide concrete implementations so the guardrail chain can emit structured logs and record metrics.
Create src/observability.ts:
ts
import type { Logger, MetricsCollector, Tracer, Span } from "@reaatech/guardrail-chain-observability";import { setLogger, setMetrics, setTracer } from "@reaatech/guardrail-chain-observability";import pino from "pino";import type { MetricsSnapshot, LatencySnapshot } from "./types.js";// ── Pino logger adapter ────────────────────────────────────────────────────────/** * Adapter that maps the `Logger` interface from `@reaatech/guardrail-chain-observability` * to pino's API. */export class PinoLoggerAdapter implements Logger { private readonly logger: pino.Logger; constructor(logger: pino.Logger) {
The InMemoryMetricsCollector exposes a getSnapshot() method that the /admin/metrics endpoint calls. In production you could swap this for a Prometheus-compatible collector.
Step 6: Write the vLLM client and config loader
The vLLM client forwards validated requests to the upstream endpoint using the native fetch API. It handles retries via withRetry from @reaatech/guardrail-chain, which shares retry logic with the guardrail pipeline.
import { loadConfig, loadConfigFromEnv,} from "@reaatech/guardrail-chain-config";import { z } from "zod";import type { GatewayConfig } from "./types.js";export function loadGatewayConfig(): GatewayConfig { void loadConfigFromEnv("GUARDRAIL_CHAIN"); const vllmBaseUrl = process.env["VLLM_BASE_URL"] ?? "http://localhost:8000"; const vllmApiKey = process.env["VLLM_API_KEY"] ?? ""; const expressPort = parseInt(process.env["EXPRESS_PORT"] ?? "4000", 10); const corsOriginsStr = process.env["CORS_ORIGINS"] ?? "http://localhost:3000"; const corsOrigins = corsOriginsStr .split(",") .map((s) => s.trim()) .filter((s) => s.length > 0); return { vllmBaseUrl, vllmApiKey, port: expressPort, corsOrigins, };}const gatewayConfigSchema = z.object({ vllmBaseUrl: z.string().min(1, "vllmBaseUrl is required and must be a non-empty string"), vllmApiKey: z.string(), port: z.number().positive("port must be a positive number").int("port must be an integer"), corsOrigins: z.array(z.string()),});export function validateGatewayConfig( config: unknown): { success: true; config: GatewayConfig } | { success: false; error: z.ZodError } { const result = gatewayConfigSchema.safeParse(config); if (result.success) { return { success: true, config: result.data }; } return { success: false, error: result.error };}
The loadConfig function from @reaatech/guardrail-chain-config handles merging the budget settings (max latency, max tokens) from environment variables.
Step 7: Wire up the guardrail chain
The guardrail chain is the heart of the gateway. It runs four guardrails in sequence on the last user message before forwarding to vLLM. Each guardrail implements the Guardrail interface from @reaatech/guardrail-chain. The module exposes toggleGuardrail so the admin dashboard can enable or disable individual guardrails at runtime.
Create src/guardrail-chain.ts:
ts
import { GuardrailChain, createChainContext, withRetry, DEFAULT_RETRY_CONFIG, defaultRetryPredicate, type Guardrail, type ChainResult, type BudgetConfig,} from "@reaatech/guardrail-chain";import { PIIRedaction, PromptInjection, CostPrecheck, ContentModeration,} from "@reaatech/guardrail-chain-guardrails";import { getLogger } from "@reaatech/guardrail-chain-observability";// Module-level singleton guardrail chain instance.let _chain: GuardrailChain | null = null;// Stored budget for use in executeChain (avoids calling loadConfig per-request).
The four guardrails are:
pii-redaction — masks emails, phone numbers, and SSNs before the request leaves the gateway
prompt-injection — detects jailbreak and prompt injection patterns
cost-precheck — rejects requests that would exceed the token budget
The gateway is an Express app that runs on port 4000. It validates every incoming chat completions request, runs it through the guardrail chain, then forwards it to vLLM. It also exposes admin endpoints for metrics and logs.
Create src/gateway.ts:
ts
import express, { type Application, type Request, type Response, type NextFunction } from "express";import helmet from "helmet";import cors from "cors";import { generateCorrelationId } from "@reaatech/guardrail-chain";import { executeChain } from "./guardrail-chain.js";import { vllmGenerate } from "./vllm-client.js";import { getLogger } from "@reaatech/guardrail-chain-observability";import { metricsCollector } from "./observability.js";import type { GatewayConfig } from "./types.js";import { GatewayError }
Step 9: Write the Presidio sync worker and the entry point
The background worker periodically reads a local presidio-rules.json file and loads custom PII patterns. The entry point bootstraps every subsystem and wires up graceful shutdown handlers.
Create src/worker.ts:
ts
import { getLogger } from "@reaatech/guardrail-chain-observability";interface PresidioSyncResult { phase: string; outcome: "success" | "error"; rulesLoaded: number; error?: string;}// Path to the local Presidio rules file.const DEFAULT_RULES_PATH = "./presidio-rules.json";/** * Synchronize custom PII detection patterns from a local `presidio-rules.json` file. * Runs on an interval so rule updates are picked up without a restart. */export function startPresidioSyncLoop(intervalMs = 3_600_000): { stop: () => void } { const logger = getLogger(); let stopped = false; let intervalId: ReturnType<typeof setInterval>; async function syncOnce(): Promise<PresidioSyncResult> { try { const fs = await import("node:fs/promises"); const content = await fs.readFile(DEFAULT_RULES_PATH, "utf-8"); const rules = JSON.parse(content) as { patterns?: unknown[] }; const rulesLoaded = Array.isArray(rules.patterns) ? rules.patterns.length : 0; logger.info({ rulesLoaded, path: DEFAULT_RULES_PATH }, "Presidio sync completed"); return { phase: "presidio-sync", outcome: "success", rulesLoaded }; } catch (err) { const error = err as Error; logger.warn({ error: error.message }, "Presidio sync failed — next interval will retry"); return { phase: "presidio-sync", outcome: "error", rulesLoaded: 0, error: error.message }; } } // Run once immediately. void syncOnce(); // Then on the interval. intervalId = setInterval(() => { if (stopped) return; void syncOnce(); }, intervalMs); return { stop(): void { stopped = true; clearInterval(intervalId); }, };}
Create src/index.ts:
ts
import "dotenv/config";import { loadConfig } from "@reaatech/guardrail-chain-config";import { initObservability } from "./observability.js";import { loadGatewayConfig, validateGatewayConfig } from "./config.js";import { initGuardrailChain } from "./guardrail-chain.js";import { createVLLMClient } from "./vllm-client.js";import { createGatewayApp, startGateway } from "./gateway.js";import { startPresidioSyncLoop } from "./worker.js";// ── Startup sequence ──────────────────────────────────────────────────────────initObservability();const rawConfig = loadGatewayConfig();const validation = validateGatewayConfig(rawConfig);if (!validation.success) { console.error("Invalid gateway configuration:", validation.error.format()); process.exit(1);}const config = validation.config;// Initialize guardrail chain with budget from config (defaults if env not set).const chainConfig = await loadConfig({ useEnv: true, envPrefix: "GUARDRAIL_CHAIN" });initGuardrailChain(chainConfig.budget);// Initialize vLLM client.createVLLMClient(config);// Start Express gateway.const app = createGatewayApp(config);const server = await startGateway(app, config.port);// Start background Presidio sync worker.const { stop: stopWorker } = startPresidioSyncLoop(3_600_000);// ── Graceful shutdown ─────────────────────────────────────────────────────────function shutdown(signal: string): void { console.log(`\n${signal} received — shutting down gracefully`); stopWorker(); server.close(() => { console.log("Express server closed"); process.exit(0); }); // Force exit if graceful shutdown stalls. setTimeout(() => { console.error("Forced exit after timeout"); process.exit(1); }, 5000);}process.on("SIGTERM", () => shutdown("SIGTERM"));process.on("SIGINT", () => shutdown("SIGINT"));process.on("uncaughtException", (err: Error) => { console.error("Uncaught exception:", err.message); process.exit(1);});process.on("unhandledRejection", (reason: unknown) => { console.error("Unhandled rejection:", reason); process.exit(1);});export { app };
Step 10: Build the admin API routes
The Next.js admin dashboard needs two server-side API routes. One toggles guardrail policies via POST; the other fetches logs from the gateway via GET.
Create app/api/policies/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { toggleGuardrail, getGuardrailInstances } from "../../../src/guardrail-chain.js";interface PolicyBody { id: string; enabled: boolean;}// The minimal contract the POST handler actually reads from the request.type RequestBody = { json(): Promise<unknown> };export async function POST(req: RequestBody) { let body: PolicyBody; try { body = (await req.json()) as PolicyBody; } catch { return NextResponse.json( { error: { message: "invalid JSON", code: "INVALID_REQUEST" } }, { status: 400 } ); } if (typeof body.id !== "string" || typeof body.enabled !== "boolean") { return NextResponse.json( { error: { message: "id (string) and enabled (boolean) are required", code: "INVALID_REQUEST" } }, { status: 400 } ); } const guardrails = getGuardrailInstances(); if (!guardrails.has(body.id)) { return NextResponse.json( { error: { message: `Guardrail '${body.id}' not found`, code: "NOT_FOUND" } }, { status: 404 } ); } const success = toggleGuardrail(body.id, body.enabled); return NextResponse.json({ success, id: body.id, enabled: body.enabled });}
The admin dashboard is a Next.js App Router application with three pages: Dashboard (metrics), Policies (guardrail toggles), and Logs. The root page redirects to /admin.
Create app/page.tsx:
ts
import { redirect } from "next/navigation";export default function HomePage() { redirect("/admin");}
Create app/layout.tsx:
ts
import type { ReactNode } from "react";export const metadata = { title: "Recipe", description: "Tutorialized reference solution from reaatech.com",};export default function RootLayout({ children }: { children: ReactNode }) { return ( <html lang="en"> <body>{children}</body> </html> );}
The project ships with integration tests that verify the gateway’s routing, guardrail enforcement, metrics collection, and policy toggling. The test runner is vitest with v8 coverage.
Expected output: The vitest summary shows all test suites passing. The coverage report confirms the source files meet the 90% threshold across lines, branches, functions, and statements. The JSON report is written to vitest-report.json.
Step 13: Start the gateway and dashboard
You need two processes running at the same time: the Express gateway on port 4000 and the Next.js admin dashboard on port 3000.
Start the gateway first. In the project root, run:
terminal
node --import tsx src/index.ts
Expected output: The terminal prints “Express gateway started” with the port number, and pino logs show the GuardrailChain initialization message.
In a second terminal, start the Next.js dev server:
terminal
pnpm dev
Expected output: Next.js prints “ready” and the local URL http://localhost:3000. The root page redirects to /admin, which displays the Dashboard view showing zero requests.
Make a test request through the gateway:
terminal
curl -X POST http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "X-User-ID: user-123" \ -H "X-Session-ID: session-456" \ -d '{ "model": "llama-3", "messages": [{"role": "user", "content": "Hello, what is 2+2?"}] }'
If your vLLM instance is reachable at VLLM_BASE_URL, the response contains a chat completion object with the model’s answer. If vLLM is not running, the gateway returns a 502 with UPSTREAM_ERROR.
Open http://localhost:3000/admin/policies to see the four guardrails and their enable/disable buttons. Click “Disable” next to PII Redaction, then resend a request containing an email address — it will no longer be masked. Click “Enable” to re-engage it.
Next steps
Replace the InMemoryMetricsCollector with a Prometheus-compatible collector and expose a /metrics endpoint for scraping by Prometheus and Grafana.
Add authentication to the /admin/* routes using Next.js middleware so only authorized users can view metrics or toggle guardrails.
Extend the guardrail chain by adding a custom guardrail class that implements the Guardrail interface — for example, a domain-restriction guardrail that blocks requests containing competitor brand names.