Files · vLLM Security Guardrails for SMB API Gateways
54 (0 binary, 308.5 kB total)attempt 4
README.md·4429 B·markdown
markdown
# vLLM Security Guardrails for SMB API Gateways
A drop-in API proxy that adds PII redaction, prompt injection defense, and content safety checks to any vLLM endpoint, with a web admin dashboard for non-technical users to manage policies.
## Architecture
```
Client
|
v
+---------------------------+
| Express Gateway (4000) |
| - PII redaction |
| - Prompt injection block |
| - Cost pre-check |
| - Content moderation |
+---------------------------+
|
v
+---------------------------+
| vLLM (OpenAI-compatible) |
+---------------------------+
Browser
|
v
+---------------------------+
| Next.js Admin (3000) |
| - Dashboard (metrics) |
| - Policies (toggle) |
| - Logs viewer |
+---------------------------+
```
## Quick Start
### Prerequisites
- Node.js 22+
- pnpm 10.x
- A running vLLM instance (OpenAI-compatible API)
### 1. Install dependencies
```bash
pnpm install
```
### 2. Configure environment
```bash
cp .env.example .env
# Edit .env with your vLLM endpoint and settings
```
### 3. Start the gateway and dashboard
In two separate terminals:
```bash
# Terminal 1: Express API gateway on port 4000
pnpm run start:gateway
# Terminal 2: Next.js admin dashboard on port 3000
pnpm run dev
```
Or start both together (requires concurrently):
```bash
pnpm run dev:all
```
### 4. Make a request
```bash
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-User-ID: user-123" \
-H "X-Session-ID: session-456" \
-d '{
"model": "llama-3",
"messages": [{"role": "user", "content": "Hello, what is 2+2?"}]
}'
```
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `VLLM_BASE_URL` | `http://localhost:8000` | Base URL of your vLLM instance |
| `VLLM_API_KEY` | `` | API key for vLLM (optional) |
| `EXPRESS_PORT` | `4000` | Port for the Express gateway |
| `CORS_ORIGINS` | `http://localhost:3000` | Comma-separated CORS allowed origins |
| `ADMIN_DASHBOARD_PORT` | `3000` | Port for Next.js admin (via `pnpm run dev`) |
| `GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS` | `1000` | Max guardrail chain latency (ms) |
| `GUARDRAIL_CHAIN_BUDGET_MAX_TOKENS` | `8000` | Max tokens allowed in requests |
## Guardrails
The gateway ships with four built-in guardrails:
| Guardrail | Description |
|---|---|
| `pii-redaction` | Detects and masks PII (emails, phone numbers, SSNs) before forwarding |
| `prompt-injection` | Detects prompt injection and jailbreak attempts |
| `cost-precheck` | Validates request fits within the token budget |
| `content-moderation` | Checks for unsafe or inappropriate content patterns |
## Admin Dashboard
- **Dashboard** (`http://localhost:3000/admin`) — Live metrics: total requests, pass/block rates, latency percentiles
- **Policies** (`http://localhost:3000/admin/policies`) — Toggle guardrails on/off at runtime
- **Logs** (`http://localhost:3000/admin/logs`) — Real-time request log with correlation IDs
## API Reference
### POST /v1/chat/completions
Proxies to vLLM after guardrail checks. Blocked requests return `400 GUARDRAIL_BLOCKED`.
**Headers:**
- `X-Request-ID` (optional) — Correlation ID for tracing
- `X-User-ID` (optional) — User identifier
- `X-Session-ID` (optional) — Session identifier
**Request body:** OpenAI chat completions format.
### GET /health
Returns `{ "status": "ok", "timestamp": "..." }`.
### GET /admin/metrics
Returns the metrics snapshot (requests, passed, blocked, latencies).
### GET /admin/logs
Returns the in-memory request log array.
## Development
```bash
pnpm typecheck # TypeScript type check
pnpm lint # ESLint
pnpm test # Vitest unit + integration tests
pnpm build # Next.js production build
```
## Architecture Decisions
- **Express + Next.js dual process**: The Express gateway runs as a standalone process (port 4000) to keep the API isolated and production-ready. The Next.js admin dashboard runs separately (port 3000) for developer convenience.
- **In-memory observability**: Metrics and logs are stored in-memory to avoid external dependencies. For production, swap `InMemoryMetricsCollector` for a Prometheus-compatible collector.
- **Guardrail chain framework**: Uses `@reaatech/guardrail-chain` to orchestrate PIIRedaction, PromptInjection, CostPrecheck, and ContentModeration in a configurable, observable pipeline.