Files · OpenAI Guardrail Layer for SMB Customer Chat Safety

62 (1 binary, 472.5 kB total)attempt 1

README.md·5681 B·markdown

markdown

# OpenAI Guardrail Layer for SMB Customer Chat Safety
 
> Add a pluggable guardrail layer to your OpenAI chatbot that detects prompt injection, redacts PII, and filters unsafe content before it reaches your users.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem
 
Small businesses deploying AI chatbots for customer support face three critical risks:
- **Prompt injection**: Users trick the model into ignoring its instructions or revealing system prompts.
- **PII disclosure**: Chatbot responses accidentally leak customer personal information (emails, phone numbers, SSNs, credit cards).
- **Toxic content**: The model generates offensive, harmful, or brand-damaging responses.
 
SMBs lack security engineering resources to build custom guardrails. This recipe provides a drop-in shim over the OpenAI SDK that intercepts every prompt and response through a configurable chain of guardrails.
 
## How It Works
 
The guardrail layer wraps the OpenAI `chat.completions.create` call with two guardrail chains:
 
```
User message → Input chain (PII redaction + Presidio PII + prompt injection) → OpenAI → Output chain (toxicity filter + PII scan) → Safe response
```
 
1. **Input chain** runs before the LLM call: redacts PII, detects prompt injection, and blocks malicious input.
2. If input passes, the (redacted) message is sent to OpenAI.
3. **Output chain** runs on the LLM response: filters toxic content and scans for leaked PII.
4. If any guardrail fails, a configurable safe fallback message is returned instead.
 
## Quick Start
 
```typescript
import { createGuardrailedOpenAI } from "./src/guard.js";
 
const client = await createGuardrailedOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});
 
const result = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "My email is john@example.com" }],
});
 
console.log(result.choices[0].message.content);
// The email is automatically redacted by the input guardrail chain
```
 
## Architecture
 
### Guardrails
 
| Guardrail | Phase | Source | Purpose |
|-----------|-------|--------|---------|
| `PIIRedaction` | Input | `@reaatech/guardrail-chain-guardrails` | Regex-based PII detection and masking (email, phone, SSN, credit card) |
| `PresidioPIIGuardrail` | Input | `@presidio-dev/hai-guardrails` | ML-based PII detection via Microsoft Presidio |
| `PromptInjection` | Input | `@reaatech/guardrail-chain-guardrails` | 300+ attack patterns for jailbreak and instruction injection |
| `ToxicityFilter` | Output | `@reaatech/guardrail-chain-guardrails` | Regex-based toxicity detection (insults, violence, hate speech) |
| `PIIScan` | Output | `@reaatech/guardrail-chain-guardrails` | Scans LLM responses for PII leakage |
 
### Configuration
 
Guardrails are configured in `guardrail.yaml` at the project root:
 
```yaml
budget:
  maxLatencyMs: 1000
  maxTokens: 8000
 
guardrails:
  - id: pii-redaction
    type: input
    enabled: true
    timeout: 500
    essential: true
```
 
Environment variables override file settings (e.g., `GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS=2000`).
 
### API
 
#### `createGuardrailedOpenAI(config): Promise<GuardrailedOpenAI>`
 
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `apiKey` | `string` | — | OpenAI API key |
| `guardrailConfigPath` | `string` | `"./guardrail.yaml"` | Path to YAML config |
| `maxRetries` | `number` | — | OpenAI client retries (passed to SDK) |
| `timeout` | `number` | — | OpenAI client timeout in ms |
| `safeFallbackMessage` | `string` | Default fallback | Message returned when guardrails block |
 
Returns object with:
- `client` — the underlying `OpenAI` instance
- `chains` — `{ inputChain, outputChain }` exposed for inspection
- `chat.completions.create` — the proxied method (same signature as OpenAI SDK)
 
## Project layout
 
```
app/
  api/chat/route.ts       POST /api/chat route handler
src/
  guard.ts                createGuardrailedOpenAI factory
  guardrails.ts           Guardrail chain wiring
  config.ts               Config loader
  observability.ts        Logging and metrics setup
  errors.ts               Custom error classes
  types.ts                TypeScript types and Zod schemas
tests/                    vitest suite (mirrors src/)
guardrail.yaml            Default guardrail configuration
```
 
## Dependencies
 
### REAA packages
- `@reaatech/guardrail-chain@0.1.0` — core chain orchestration
- `@reaatech/guardrail-chain-config@0.1.0` — YAML/env config loader
- `@reaatech/guardrail-chain-guardrails@0.1.0` — built-in guardrails (PII, injection, toxicity)
- `@reaatech/guardrail-chain-observability@0.1.0` — logging/metrics interfaces
 
### Third-party packages
- `openai@6.42.0` — OpenAI SDK
- `@presidio-dev/hai-guardrails@1.12.0` — Microsoft Presidio PII detection via hai-guardrails
- `zod@4.4.3` — request body validation
 
## Running locally
 
```bash
pnpm install
pnpm test            # vitest run with coverage
pnpm dev             # next dev
```
 
## Environment Variables
 
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Yes | OpenAI API key |
| `SAFE_FALLBACK_MESSAGE` | No | Custom fallback message on guardrail block |
| `GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS` | No | Max latency budget in ms (default: 1000) |
| `GUARDRAIL_CHAIN_BUDGET_MAX_TOKENS` | No | Max token budget (default: 8000) |
| `GUARDRAIL_CHAIN_BUDGET_SKIP_SLOW` | No | Skip slow guardrails under pressure (default: true) |
 
## License
 
MIT — see [LICENSE](./LICENSE).