Files · Anthropic Security Guardrails for SMB AI Chatbots

50 (0 binary, 273.9 kB total)attempt 2

README.md·2883 B·markdown

markdown

# Anthropic Security Guardrails for SMB AI Chatbots
 
Pluggable Express middleware and Next.js API route handler that scrubs PII, blocks prompt injections, and enforces content policies on Anthropic-powered chatbots — no vendor lock‑in.
 
## Features
 
- **PII Redaction**: Automatic detection and masking of emails, phone numbers, SSNs, and credit card numbers using regex patterns and Luhn validation.
- **Prompt Injection Detection**: Blocks jailbreak patterns, role-reversal attempts, and instruction-injection using heuristic pattern matching.
- **Topic Boundary Enforcement**: Configurable allowlists and blocklists to restrict chatbot to approved topics.
- **Content Moderation**: Regex-based profanity and hate-speech filtering.
- **Output Guardrails**: PIIScan, ToxicityFilter, and HallucinationCheck on LLM responses.
- **Structured Observability**: Console logging with `[LEVEL]` prefix, JSON metrics to stdout, and optional Sentry error tracking.
- **Dual Server Modes**: Standalone Express server or Next.js App Router handler.
- **Budget-Aware Execution**: Latency and token budgets with configurable guardrail skipping under pressure.
 
## Architecture
 
The `GuardrailChain` orchestrates input → LLM → output phases. Express middleware and Next.js route handler share the same chain instance. All guardrails are regex-based with no external API dependencies.
 
## Quick Start
 
1. Copy `.env.example` to `.env` and fill in your values:
   ```
   ANTHROPIC_API_KEY=<your-anthropic-key>
   SENTRY_DSN=<your-sentry-dsn>  # optional
   ```
 
2. Install dependencies:
   ```
   pnpm install
   ```
 
3. Run the Next.js API handler:
   ```
   pnpm dev
   ```
 
4. Or run the standalone Express server:
   ```
   pnpm start:express
   ```
 
## Configuration
 
Edit `guardrail.config.yaml` to enable/disable guardrails and configure topic allowlists/blocklists.
 
Environment variable overrides:
```
GUARDRAIL_CHAIN_BUDGET_MAX_LATENCY_MS=2000
GUARDRAIL_CHAIN_BUDGET_MAX_TOKENS=8000
GUARDRAIL_CHAIN_BUDGET_SKIP_SLOW=true
```
 
## Testing
 
```
pnpm test
```
 
Example `curl` call:
```bash
curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hi"}]}'
```
 
## API
 
### POST /v1/chat/completions (Express) or /api/v1/chat (Next.js)
 
**Request:**
```json
{
  "model": "claude-sonnet-4-6",
  "messages": [{ "role": "user", "content": "Hi" }]
}
```
 
**Response headers:**
- `X-Guardrail-Passed: true|false`
- `X-Guardrail-Chain-Version: 0.1.0`
 
**Safety response** (output guardrail failed):
```json
{
  "choices": [{ "message": { "content": "[response filtered for safety]" } }]
}
```
 
## Guardrail Configuration
 
Input guardrails: `PIIRedaction`, `PromptInjection`, `TopicBoundary`, `ContentModeration`
Output guardrails: `PIIScan`, `ToxicityFilter`, `HallucinationCheck`