vLLM Security Guardrails for SMB API Gateways

A drop-in API proxy that adds PII redaction, prompt injection defense, and content safety checks to any vLLM endpoint.

vllm security-guardrails express nextjs typescript pii-redaction prompt-injection

The problem

SMBs exposing self-hosted vLLM APIs risk sensitive data leaks and misuse, but lack the security expertise to build custom guardrails.

Built from

Intro

You will build a security gateway that sits in front of a self-hosted vLLM instance. The gateway intercepts every request, runs it through a configurable guardrail pipeline (PII redaction, prompt injection screening, cost pre-check, and content moderation), then forwards safe requests to vLLM using the OpenAI-compatible API. A Next.js admin dashboard on port 3000 lets non-technical users toggle guardrails on and off, view live metrics, and browse the request log.

Prerequisites

Node.js 22 or later
pnpm 10.x
A running vLLM instance with an OpenAI-compatible endpoint (or a mock server for local development)
Familiarity with TypeScript, Express middleware, and Next.js App Router

Step 1: Initialize the project

Create a directory for your project and set up the package manifest. The gateway is a Node.js ESM package; the admin dashboard is a Next.js application in the same repo.

Create package.json:

json

{
  "name": "vllm-security-guardrails-for-smb-api-gateways",
  "version": "0.1.0",
  "private"

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

186 tests·98.0% coverage·vitest passing

Book a conversation All solutions

Comments

Loading comments…

import type { GatewayConfig } from "./types.js"; import { VLLMError } from "./types.js"; import { withRetry, DEFAULT_RETRY_CONFIG, defaultRetryPredicate } from "@reaatech/guardrail-chain"; interface Message { role: string; content: string; } // Module-level config. let _vllmBaseUrl = ""; let _vllmApiKey = ""; export function createVLLMClient(config: GatewayConfig): void { _vllmBaseUrl = config.vllmBaseUrl; _vllmApiKey = config.vllmApiKey; } export async function vllmGenerate( _modelName: string, messages: Message[], correlationId: string, options?: { maxTokens?: number; temperature?: number } ): Promise<{ text: string; usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number } }> { const maxTokens = options?.maxTokens ?? 1024; const temperature = options?.temperature ?? 0.7; const callApi = async () => { const url = `${_vllmBaseUrl}/chat/completions`; const headers: Record<string, string> = { "Content-Type": "application/json", }; if (_vllmApiKey) { headers["Authorization"] = `Bearer ${_vllmApiKey}`; } const body = { model: _modelName, messages: messages.map((m) => ({ role: m.role, content: m.content })), max_tokens: maxTokens, temperature, }; const response = await fetch(url, { method: "POST", headers, body: JSON.stringify(body), }); if (!response.ok) { const text = await response.text().catch(() => ""); throw new VLLMError( `vLLM returned ${response.status}: ${text}`, correlationId, response.status >= 500 ? 502 : response.status ); } const data = (await response.json()) as { choices?: Array<{ message?: { content?: string } }>; usage?: { prompt_tokens?: number; completion_tokens?: number; total_tokens?: number }; }; const text = data.choices?.[0]?.message?.content ?? ""; const usage = data.usage ?? { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 }; return { text, usage: { prompt_tokens: usage.prompt_tokens ?? 0, completion_tokens: usage.completion_tokens ?? 0, total_tokens: usage.total_tokens ?? 0, }, }; }; try { const result = await withRetry(callApi, defaultRetryPredicate, DEFAULT_RETRY_CONFIG); return result; } catch (err: unknown) { if (err instanceof VLLMError) throw err; let msg = "vLLM call failed"; if (typeof err === "object" && err !== null && "message" in err) { const m = (err as Error).message; if (typeof m === "string" && m.trim() !== "") { msg = m; } } throw new VLLMError(msg, correlationId, 502); } } export function getVLLMBaseUrl(): string { return _vllmBaseUrl; }

import "dotenv/config"; import { loadConfig } from "@reaatech/guardrail-chain-config"; import { initObservability } from "./observability.js"; import { loadGatewayConfig, validateGatewayConfig } from "./config.js"; import { initGuardrailChain } from "./guardrail-chain.js"; import { createVLLMClient } from "./vllm-client.js"; import { createGatewayApp, startGateway } from "./gateway.js"; import { startPresidioSyncLoop } from "./worker.js"; // ── Startup sequence ────────────────────────────────────────────────────────── initObservability(); const rawConfig = loadGatewayConfig(); const validation = validateGatewayConfig(rawConfig); if (!validation.success) { console.error("Invalid gateway configuration:", validation.error.format()); process.exit(1); } const config = validation.config; // Initialize guardrail chain with budget from config (defaults if env not set). const chainConfig = await loadConfig({ useEnv: true, envPrefix: "GUARDRAIL_CHAIN" }); initGuardrailChain(chainConfig.budget); // Initialize vLLM client. createVLLMClient(config); // Start Express gateway. const app = createGatewayApp(config); const server = await startGateway(app, config.port); // Start background Presidio sync worker. const { stop: stopWorker } = startPresidioSyncLoop(3_600_000); // ── Graceful shutdown ───────────────────────────────────────────────────────── function shutdown(signal: string): void { console.log(`\n${signal} received — shutting down gracefully`); stopWorker(); server.close(() => { console.log("Express server closed"); process.exit(0); }); // Force exit if graceful shutdown stalls. setTimeout(() => { console.error("Forced exit after timeout"); process.exit(1); }, 5000); } process.on("SIGTERM", () => shutdown("SIGTERM")); process.on("SIGINT", () => shutdown("SIGINT")); process.on("uncaughtException", (err: Error) => { console.error("Uncaught exception:", err.message); process.exit(1); }); process.on("unhandledRejection", (reason: unknown) => { console.error("Unhandled rejection:", reason); process.exit(1); }); export { app };

import type { Metadata } from "next"; export const metadata: Metadata = { title: "Logs — Shield Gateway Admin", }; interface LogEntry { timestamp: string; correlationId: string; level: string; message: string; } async function fetchLogs(): Promise<LogEntry[]> { try { const res = await fetch("http://localhost:4000/admin/logs", { cache: "no-store" }); if (!res.ok) return []; return (await res.json()) as LogEntry[]; } catch { return []; } } export default async function LogsPage() { const logs = await fetchLogs(); if (logs.length === 0) { return ( <div> <h2 style={{ margin: "0 0 24px 0", fontSize: "24px", fontWeight: 600 }}>Logs</h2> <div style={{ background: "#161b22", border: "1px solid #30363d", borderRadius: "8px", padding: "40px", textAlign: "center", color: "#8b949e", fontSize: "14px" }}> No log entries yet. Make a request to the gateway to see entries here. </div> </div> ); } return ( <div> <h2 style={{ margin: "0 0 24px 0", fontSize: "24px", fontWeight: 600 }}>Logs</h2> <div style={{ background: "#161b22", border: "1px solid #30363d", borderRadius: "8px", overflow: "hidden", maxHeight: "600px", overflowY: "auto" }}> <table style={{ width: "100%", borderCollapse: "collapse", fontSize: "12px" }}> <thead style={{ position: "sticky", top: 0, background: "#0d1117" }}> <tr style={{ borderBottom: "1px solid #30363d" }}> <th style={{ padding: "8px 12px", color: "#8b949e", textAlign: "left" }}>Timestamp</th> <th style={{ padding: "8px 12px", color: "#8b949e", textAlign: "left" }}>Request ID</th> <th style={{ padding: "8px 12px", color: "#8b949e", textAlign: "left" }}>Level</th> <th style={{ padding: "8px 12px", color: "#8b949e", textAlign: "left" }}>Message</th> </tr> </thead> <tbody> {logs.map((entry, i) => { const levelColor = entry.level === "warn" ? "#f85149" : entry.level === "error" ? "#f85149" : "#8b949e"; return ( <tr key={i} style={{ borderBottom: "1px solid #21262d" }}> <td style={{ padding: "8px 12px", color: "#8b949e", fontFamily: "monospace" }}>{entry.timestamp}</td> <td style={{ padding: "8px 12px", color: "#58a6ff", fontFamily: "monospace", fontSize: "11px" }}>{entry.correlationId.slice(0, 8)}</td> <td style={{ padding: "8px 12px", color: levelColor, fontWeight: 600 }}>{entry.level}</td> <td style={{ padding: "8px 12px", color: "#e6edf3" }}>{entry.message}</td> </tr> ); })} </tbody> </table> </div> </div> ); }

vLLM Security Guardrails for SMB API Gateways

The problem

Built from

Intro

Prerequisites

Step 1: Initialize the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Initialize the project

Step 2: Install dependencies

Step 3: Configure environment variables

Step 4: Define shared types

Step 5: Build the observability layer

Step 6: Write the vLLM client and config loader

Step 7: Wire up the guardrail chain

Step 8: Build the Express gateway

Step 9: Write the Presidio sync worker and the entry point

Step 10: Build the admin API routes

Step 11: Write the admin dashboard pages

Step 12: Run the tests

Step 13: Start the gateway and dashboard

Next steps