Skip to content
reaatech

Testing & Security

Chaos testing, prompt injection benches, tool-use firewalls.

8 repos

reaatech/agent-auth-proxy

0
These packages give you an identity-aware reverse proxy that sits between AI agents and downstream APIs like Google or GitHub, handling OAuth2 token management, API key vaulting, and scope enforcement. You would adopt them to solve the problem of securely authenticating and authorizing agent-to-service requests without embedding long-lived credentials in agent code or managing per-user OAuth flows yourself. The most distinctive thing is that the proxy is stateful and built as a Fastify plugin with a typed client SDK, so the server, client, and shared schemas are versioned together in a monorepo and the entire request pipeline—authentication, scope validation, credential decryption, and injection—runs in a single proxy hop.
packages
3
updated
7 days ago

reaatech/agent-chaos

1
These packages give you a middleware-based fault injection engine that sits between your agent and its tools, injecting failures like latency spikes, rate limits, and malformed output to test whether your circuit breakers, confidence gates, and fallback trees actually work. You'd adopt them to validate agent reliability under realistic failure conditions before they hit production. The most distinctive thing is the transparent interceptor pattern — adapters for LangChain, LlamaIndex, and Vercel AI SDK wrap your existing tool calls without modifying agent code, while a declarative YAML/JSON scenario system with probability-based fault selection and hot reloading lets you change failure modes at runtime.
packages
6
updated
13 hours ago

reaatech/guardrail-chain

0
These packages give you a composable pipeline of input and output guardrails for LLM calls, with built-in budget management that can skip non-essential checks under latency or token pressure. You'd adopt them to add safety layers—PII redaction, prompt injection detection, toxicity filtering, hallucination detection, and others—without wiring each guardrail from scratch or guessing how they interact under load. The most distinctive thing is that guardrails are scheduled and prioritized by a budget-aware orchestrator, so the chain can short-circuit on failure and degrade gracefully when resources are tight, rather than running every check unconditionally.
packages
4
updated
7 days ago

reaatech/mcp-contract-kit

0
These packages give you a CLI tool and programmatic API for testing Model Context Protocol (MCP) servers against the MCP specification. You'd adopt them to validate that an MCP server correctly implements protocol conformance, registry YAML schemas, routing contracts, security posture, and performance baselines before deploying it. The packages are designed as composable layers—core types, a client SDK, validators, reporters, and observability—that you can use individually or together through the CLI, with all validators sharing the same typed test report format.
packages
6
updated
12 days ago

reaatech/mcp-load-test

0
These packages give you a purpose-built load testing framework for MCP (Model Context Protocol) servers, with a CLI, orchestration engine, transport clients, and analysis tooling. You would adopt them to stress-test MCP servers under realistic concurrent workloads—modeling user behavior with weighted tool-call patterns, detecting breaking points, and producing latency histograms with letter grades. The framework is transport-aware, meaning it accounts for the different concurrency profiles of StreamableHTTP, SSE, and stdio, and uses session-based closed-loop concurrency where long-lived sessions continuously execute patterns with think-time delays and stateful context.
packages
9
updated
14 days ago

reaatech/mcp-server-doctor

0
These packages give you a CLI tool and programmatic library that runs eight health checks against any MCP server endpoint, grades the results A–F, and outputs reports in console, JSON, markdown, or HTML formats. You would adopt them to catch MCP server issues before deployment—transport negotiation failures, latency spikes, auth problems, or concurrency limits—and to enforce quality gates in CI. The engine, transport client, reporters, and observability instrumentation are separate packages that share core types and grading logic, so you can use just the CLI or compose the pieces programmatically (for example, running the engine with a custom transport and piping results into your own monitoring pipeline).
packages
6
updated
7 days ago

reaatech/prompt-injection-bench

0
These packages give you a reproducible benchmark for evaluating prompt-injection defenses in AI agent systems, including an attack corpus with 300+ templates across 8 categories, pluggable defense adapters (Rebuff, Lakera Guard, LLM Guard, Garak, OpenAI/Azure/Anthropic/Cohere Moderation, Custom HTTP), a scoring engine with statistical analysis, a public leaderboard, and an MCP server. You would adopt them to objectively measure and compare how well different defenses detect and block prompt injection attacks, using a standardized methodology with deterministic seeds and SHA-256 proofs for reproducibility. The packages are designed as independent modules—core types, corpus builder, adapters, runner, scoring, leaderboard, observability, and MCP server—that share canonical Zod schemas and a common `DefenseAdapter` interface, so you can mix and match only the pieces you need or run the full benchmark via the umbrella CLI.
packages
9
updated
18 days ago

reaatech/tool-use-firewall

0
These packages give you a policy enforcement layer that sits between an AI agent and its MCP servers, intercepting every `tools/call` to validate, rate-limit, and audit tool invocations before they reach the upstream. You'd adopt them to prevent an agent from accidentally or maliciously executing destructive operations like `DROP TABLE` or `rm -rf /`, and to enforce budgets, approval workflows, and read-only modes across database, filesystem, or network MCP servers. The system is built as a pluggable middleware pipeline—rate limiter, cost tracker, secret scanner, argument validator, policy engine, anomaly detector, and approval workflow each implement the same `Middleware` interface and run in strict order, with stages registered only when enabled in the policy configuration.
packages
7
updated
6 days ago