SMB SaaS platforms deploying AI chat features across multiple customer tenants risk exposing PII, generating harmful content, or violating compliance policies. A single guardrail layer per tenant is complex to implement and maintain without a composable, configurable safety stack.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a multi-tenant security guardrail pipeline for an xAI Grok-powered chatbot using Next.js 16 (App Router), the @reaatech/guardrail-chain framework, Microsoft Presidio PII detection, and pino/Langfuse observability. By the end, you’ll have a working POST /api/chat endpoint that runs PII redaction, prompt injection detection, toxicity filtering, rate limiting, and topic boundary enforcement — all configurable per tenant via YAML files, with budget-aware scheduling that skips non-essential guardrails under latency pressure. This is for developers building SMB SaaS platforms who need to offer safe AI chat across multiple customer tenants without exposing PII, generating harmful content, or violating compliance policies.
Prerequisites
Node.js >= 22 (with corepack enabled for pnpm)
pnpm (this project uses pnpm@10.0.0)
An xAI API key — set as XAI_API_KEY in your environment
Basic knowledge of TypeScript, Next.js App Router, and guardrail-chain concepts
Langfuse credentials (optional) — if you want Langfuse metrics export, set LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY
All versions are exact-pinned (no ^ or ~). Now install:
terminal
pnpm install
Verify the project structure:
terminal
ls next.config.ts tsconfig.json vitest.config.ts app/ src/
Expected output: All five files/directories exist. Your scaffold is ready.
Step 2: Configure environment variables
Create .env.example with all the environment variables the application reads at runtime:
env
# Env vars used by xai-grok-security-guardrails-for-multi-tenant-smb-chat.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentXAI_API_KEY=<your-xai-api-key>XAI_API_BASE_URL=https://api.x.ai/v1XAI_MODEL=grok-2GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS=1000GUARDRAIL_CHAIN_BUDGET_MAX_TOKENS=8000GUARDRAIL_CHAIN_BUDGET_SKIP_SLOW=false# Langfuse — optional, omitting disables Langfuse exportLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=https://cloud.langfuse.comLOG_LEVEL=infoTENANT_CONFIG_DIR=./config/tenants
Copy it and fill in your xAI API key:
terminal
cp .env.example .env# Edit .env — set XAI_API_KEY=<your-real-key>
Expected output: The .env file exists and your key is set.
Step 3: Enable the instrumentation hook in Next.js config
The project uses src/instrumentation.ts to boot observability at startup. This requires setting experimental.instrumentationHook: true in next.config.ts. Without it, the register() function is dead code.
ts
// next.config.tsimport type { NextConfig } from "next";const nextConfig: NextConfig = { experimental: { instrumentationHook: true, } as NextConfig["experimental"],};export default nextConfig;
Expected output: The file uses the exact key instrumentationHook (not clientInstrumentationHook or just instrumentation).
Step 4: Create tenant configuration YAML files
Per-tenant YAML files live under config/tenants/. Each file defines the tenant’s budget limits, guardrail list, and observability preferences.
Create config/tenants/default.yaml — the baseline config that most tenants would use:
Expected output: Three YAML files in config/tenants/.
Step 5: Define TypeScript interfaces
Create src/types.ts with the shared interfaces for tenant configuration, chat requests, and chat responses. These types are used across every module you’ll write below.
Notice maxRequests and windowMs on TenantConfig — these drive the per-tenant RateLimiter parameters. They aren’t part of the base BudgetConfig but are added as custom fields.
Expected output: A clean src/types.ts with three exported interfaces and a module sentinel constant.
Step 6: Set up observability with pino and Langfuse
Create src/lib/observability.ts. This module wraps pino as a guardrail-chain Logger, creates a MetricsCollector that logs structured JSON, and optionally pipes metrics to Langfuse.
initObservability() sets the global logger and metrics collector. If Langfuse env vars are present, it also async-initializes a Langfuse client and pipes metrics there.
closeObservability() flushes pino and shuts down Langfuse.
Langfuse import is dynamic — register() runs in both Node and Edge runtimes, and langfuse is a Node-only package.
Expected output:src/lib/observability.ts exports initObservability, closeObservability, getLogger, and getMetrics.
Step 7: Implement tenant config loading with LRU caching
Create src/services/tenant-config.ts. This module loads YAML config files per tenant, validates them, falls back to environment-only config, and caches results in an LRU cache with a 5-minute TTL.
The fallback chain: try file-based config with env overrides → if file not found, try env-only config → if that also fails, return hardcoded defaults. This means a tenant with no YAML file still gets safe defaults.
Expected output:src/services/tenant-config.ts with loadTenantConfig and getDefaultConfig.
Step 8: Create the Presidio guardrail adapter
Create src/middleware/presidio-guardrail.ts. This class wraps Microsoft Presidio’s GuardrailsEngine as a @reaatech/guardrail-chainGuardrail<string, string>, running injection detection and PII detection on every input message.
The guardrail is fail-open: if the engine throws (network error, timeout), it returns { passed: false, ... } with recoverable: true instead of crashing the chain. The caller (chain or route handler) decides how to handle it.
Create src/middleware/guardrails.ts — the heart of the recipe. It builds a GuardrailChain with input guardrails (rate limiter, cost precheck, PII redaction, Presidio guard, prompt injection, topic boundary) and output guardrails (PII scan, toxicity filter, hallucination check) from a tenant’s config.
ts
import { GuardrailChain, type ChainResult, type ExecutionOptions, CircuitBreaker, type Guardrail,} from "@reaatech/guardrail-chain";import { PIIRedaction, PromptInjection, ToxicityFilter, PIIScan, TopicBoundary, CostPrecheck, RateLimiter, CachedGuardrail, HallucinationCheck,} from "@reaatech/guardrail-chain-guardrails";import { PresidioGuardrail } from "./presidio-guardrail.js";import type { TenantConfig } from "../types.js";import { getLogger } from "../lib/observability.js";
What makes this per-tenant:
RateLimiter reads config.maxRequests and config.windowMs — each tenant can have a different rate limit.
TopicBoundary is only added if the tenant config specifies allowedTopics or blockedTopics.
BudgetConfig drives the chain’s budget, which enforces skipSlowGuardrailsUnderPressure — non-essential guardrails are skipped when latency is tight.
Expected output:src/middleware/guardrails.ts exports buildGuardrailChain, executeGuardrails, and executeOutputGuardrails.
Step 10: Create the xAI Grok client
Create src/services/xai-client.ts. This wraps the OpenAI SDK (which is API-compatible with xAI) into a simple client that logs every API call and wraps errors as GuardrailError.
ts
import OpenAI from "openai";import { GuardrailError, GuardrailErrorType,} from "@reaatech/guardrail-chain";import { getLogger } from "../lib/observability.js";export function createXaiClient( baseURL?: string,): OpenAI { return new OpenAI({ baseURL: baseURL ?? process.env.XAI_API_BASE_URL ?? "https://api.x.ai/v1", apiKey: process.env.XAI_API_KEY, });}export async function sendChatMessage(params: { client: OpenAI; message: string; systemPrompt?: string; model?: string;}): Promise<{ content: string; inputTokens: number; outputTokens: number }> { const model = params.model ?? process.env.XAI_MODEL ?? "grok-2"; const systemPrompt = params.systemPrompt ?? "You are a helpful assistant for a multi-tenant SMB platform. Respond safely and avoid generating harmful content."; try { const completion = await params.client.chat.completions.create({ model, messages: [ { role: "system", content: systemPrompt }, { role: "user", content: params.message }, ], }); const content = completion.choices[0]?.message?.content; if (typeof content !== "string") { throw new GuardrailError( "xAI returned no content", GuardrailErrorType.EXECUTION_FAILED, "xai-client", ); } const inputTokens = completion.usage?.prompt_tokens ?? 0; const outputTokens = completion.usage?.completion_tokens ?? 0; getLogger().info( { model, inputTokens, outputTokens }, "xAI API call", ); return { content, inputTokens, outputTokens }; } catch (err) { if (err instanceof GuardrailError) { throw err; } if (err instanceof OpenAI.APIError) { const isRecoverable = err.status === 429 || (err.status !== undefined && err.status >= 500); throw new GuardrailError( `xAI API error: ${err.message}`, GuardrailErrorType.EXECUTION_FAILED, "xai-client", isRecoverable, ); } throw err; }}
Expected output:src/services/xai-client.ts exports createXaiClient and sendChatMessage.
Step 11: Wire up the API route handler
Create app/api/chat/route.ts — the main chat endpoint. It ties everything together: parse the request, load tenant config, run input guardrails, call xAI, run output guardrails, and return the result.
Expected output:app/api/chat/route.ts — the imports use .js extensions for ESM resolution, and the route handler uses NextRequest/NextResponse.json() (never bare Request/Response).
Step 12: Add Next.js instrumentation
Create src/instrumentation.ts. Next.js calls the exported register() function once at server startup. It dynamically imports the observability module and initializes pino + Langfuse.
ts
export async function register(): Promise<void> { if (process.env.NEXT_RUNTIME === "nodejs") { const { initObservability } = await import("./lib/observability.js"); initObservability(); }}
The NEXT_RUNTIME === "nodejs" guard is critical: register() runs in both Node and Edge runtimes by default, but pino and Langfuse are Node-only packages. The dynamic import() inside the guard ensures Edge doesn’t try to load Node-builtins.
Expected output:src/instrumentation.ts with the register() function. Confirm the flag in next.config.ts is set:
Update src/index.ts to re-export every public API so consumers can import from a single location:
ts
export { loadTenantConfig, getDefaultConfig } from "./services/tenant-config.js";export { buildGuardrailChain, executeGuardrails, executeOutputGuardrails } from "./middleware/guardrails.js";export { PresidioGuardrail } from "./middleware/presidio-guardrail.js";export { createXaiClient, sendChatMessage } from "./services/xai-client.js";export { initObservability, closeObservability } from "./lib/observability.js";export type { TenantConfig, ChatRequest, ChatResponse } from "./types.js";
Expected output:src/index.ts re-exports all seven modules.
Step 14: Run the tests
The test suite covers every module. Here is a sample from tests/middleware/guardrails.test.ts — it verifies input guardrails pass clean text, block prompt injection, redact PII, enforce rate limits, handle large inputs, and survive tight budgets.
pnpm typecheckpnpm lintpnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output:pnpm typecheck exits 0, pnpm lint exits 0, and vitest reports numFailedTests: 0 with coverage >= 90% on src/**/*.ts and app/**/route.ts.
Next steps
Add a Langfuse trace per request — Extend the POST handler to create a Langfuse trace for each API call, adding latency breakdowns per guardrail phase for deeper observability.
Extend tenant configs with custom guardrail parameters — The YAML format supports extra fields. Add a customPatterns key to tenant configs and wire them into PresidioGuardrail for per-tenant PII pattern overrides.
Build a tenant management dashboard — Create a Next.js page at /admin/tenants that lists all cached configs, shows guardrail metrics per tenant, and allows toggling guardrails on/off at runtime via a config-reload endpoint.