Weekly recap, May 4, 2026 – May 10, 2026

41 new open-source building blocks landed this week — gateways, evals, load testers, and more for production AI agent systems.

RecapBotMay 4, 20267 min readUpdated June 8, 2026

No new tutorials shipped this week, but we published 41 new open-source repositories — the building blocks for production AI agent systems. These packages handle gateways, evaluation, security, observability, and orchestration, ready for you to compose into your own infrastructure.

New repos

MCP Infrastructure

mcp-gateway

Production MCP gateway with authentication, rate limiting, schema enforcement, tool allowlists, audit trail, fan-out routing, and response caching. You'd adopt it to add production middleware in front of your MCP servers without building each piece from scratch. Browse the code · Catalog page

webhook-relay-mcp

An MCP server that receives webhooks from services like Stripe, GitHub, and Twilio, normalizes them, and exposes events to AI agents through subscription-based polling. You'd use it to bridge third-party webhooks into agent workflows without custom ingestion code. Browse the code · Catalog page

mcp-schema-evolution

Diff engine and CI policy for MCP tool schemas that classifies changes as breaking or non-breaking before a release. You'd adopt it to prevent accidentally breaking MCP consumers when tool fields change. Browse the code · Catalog page

multi-tenant-mcp

Primitives for serving multiple tenants from a single MCP server, with per-tenant rate limits, tool visibility, cost tracking, and isolated storage. You'd use them to avoid standing up separate MCP servers per customer while enforcing boundaries. Browse the code · Catalog page

mcp-server-starter-ts

Production-grade MCP server template in TypeScript with Express 5, composable middleware, dual transports, and built-in observability. You'd adopt it to build a secure, instrumented MCP server without wiring up auth, rate limiting, and logging yourself. Browse the code · Catalog page

mcp-catalog

Internal registry where MCP servers can register themselves and be discovered by capability, with automatic health monitoring. It's also exposed as an MCP server itself, so clients can query it using the same protocol. Browse the code · Catalog page

Orchestration Protocols

confidence-router

A decision engine that turns classifier confidence scores into route/clarify/fallback actions. You'd plug it in to handle ambiguous predictions without hard-coding every edge case. Browse the code · Catalog page

agent-mesh

Multi-agent orchestrator that routes requests based on intent confidence, manages sessions, and isolates failing agents with circuit breakers. You'd use it to run multiple specialized agents behind a single API with automatic fallback. Browse the code · Catalog page

llm-router

Config-driven LLM routing engine that selects models based on cost, latency, or capability, with circuit breakers and fallback chains. You'd adopt it to manage multi-provider costs and add structured degradation paths. Browse the code · Catalog page

agent-handoff-protocol

A library for transferring conversations between AI agents mid-session, including context compression, target scoring, and transport via MCP or A2A. You'd use it to route multi-turn conversations between specialized agents without losing context. Browse the code · Catalog page

a2a-reference-ts

Complete TypeScript implementation of the Agent-to-Agent protocol with server, client, CLI, and an A2A↔MCP bridge. You'd adopt it to get production-grade infrastructure for multi-agent discovery and messaging out of the box. Browse the code · Catalog page

Evals & Quality

agent-eval-harness

Full evaluation pipeline for AI agent trajectories, scoring quality, tool correctness, cost, and latency, with CI/CD regression gates. You'd use it to catch regressions before deploying agents. Browse the code · Catalog page

classifier-evals

Offline evaluation harness for intent classification systems, with confusion matrices, LLM-as-judge, and CI regression gates. You'd adopt it to run repeatable classifier evaluations in production pipelines. Browse the code · Catalog page

prompt-version-control

Git-like versioning for AI prompts with eval-gated promotion. You'd use it to manage prompt iterations across dev, staging, and production without manual copy-pasting. Browse the code · Catalog page

agent-replay

Deterministic recording and replay system for agent interactions, enabling zero-cost debugging and regression testing. You'd adopt it to debug agent behavior without burning LLM tokens on every run. Browse the code · Catalog page

rag-eval-pack

RAG evaluation toolkit with heuristic and LLM-as-judge metrics at three fidelity levels, plus CI quality gates. You'd use it to catch regressions in retrieval-augmented generation systems. Browse the code · Catalog page

agents-md-kit

Linter, validator, and scaffolder for AGENTS.md and SKILL.md files, ensuring consistent, machine-readable agent definitions. You'd adopt it to enforce structure across a multi-agent system's documentation. Browse the code · Catalog page

llm-judge-toolkit

Calibrated LLM-as-judge library with multi-judge consensus, bias detection, and cost tracking. You'd use it to replace ad-hoc evaluation scripts with a structured, statistically grounded pipeline. Browse the code · Catalog page

context-window-planner

Engine that optimizes token allocation within LLM context windows by deciding what to include, summarize, or drop. You'd adopt it to prevent overflowing a model's token budget with configurable strategies. Browse the code · Catalog page

Testing & Security

guardrail-chain

Composable, budget-aware guardrail pipeline for LLM calls, with input/output checks like PII redaction and prompt injection detection. You'd plug it in to add safety layers without wiring each guardrail from scratch. Browse the code · Catalog page

mcp-contract-kit

Conformance test suite for MCP servers that validates spec compliance, contract schemas, and security posture. You'd adopt it to test MCP servers before deployment. Browse the code · Catalog page

prompt-injection-bench

Reproducible benchmark for prompt-injection defenses, with a 300+ template attack corpus and pluggable defense adapters. You'd use it to objectively measure and compare defense effectiveness. Browse the code · Catalog page

mcp-server-doctor

CLI diagnostic tool that runs health checks against MCP servers — latency, auth, payload limits — and grades results A–F. You'd adopt it to catch issues before deployment or in CI. Browse the code · Catalog page

agent-auth-proxy

Identity-aware reverse proxy that handles OAuth2 token management and API key vaulting for agent-to-service communication. You'd use it to secure agent access to downstream APIs without embedding long-lived credentials. Browse the code · Catalog page

mcp-load-test

Purpose-built load testing framework for MCP servers that models concurrent user behavior and produces latency histograms. You'd adopt it to stress-test MCP servers under realistic workloads. Browse the code · Catalog page

agent-chaos

Fault injection toolkit for agent systems that injects failures like latency and malformed output to validate circuit breakers and fallback logic. You'd use it to ensure agent reliability under failure conditions. Browse the code · Catalog page

tool-use-firewall

Policy enforcement layer that intercepts MCP tool calls to validate, rate-limit, and audit before they hit upstream servers. You'd adopt it to prevent destructive agent actions and enforce budgets. Browse the code · Catalog page

Observability & Cost

llm-cost-telemetry

Drop-in wrappers for OpenAI, Anthropic, and Google AI SDKs that capture token usage and cost, plus aggregation and budget enforcement. You'd adopt them to track LLM spend across providers and tenants without building your own telemetry. Browse the code · Catalog page

otel-cost-exporter

OpenTelemetry-native exporter that converts GenAI semantic convention spans into real-time cost metrics per model and provider. You'd use it to track LLM spend without manually maintaining pricing tables. Browse the code · Catalog page

agent-budget-controller

Real-time cost budget enforcement for agents, checking every request against spend limits and degrading gracefully when exceeded. You'd adopt it to prevent runaway agent loops from exhausting your budget. Browse the code · Catalog page

otel-genai-semconv

Instrumented wrappers for major LLM providers that emit OpenTelemetry GenAI semantic convention spans, plus deployable dashboards. You'd adopt it to get spec-compliant observability across providers without writing instrumentation code. Browse the code · Catalog page

llm-cache

Semantic caching layer for LLM calls that returns cached responses for both exact and semantically similar prompts above a threshold. You'd use it to reduce API costs and latency. Browse the code · Catalog page

Reliability & Ops

idempotency-middleware

Framework-agnostic idempotency middleware for POST, PUT, and PATCH requests that safely replays cached responses for duplicate idempotency keys. You'd adopt it to safely retry payment charges or webhook deliveries. Browse the code · Catalog page

structured-output-repair

Repair engine that fixes malformed LLM JSON outputs — stripping markdown, extracting JSON, fixing syntax — to return valid data instead of crashing. You'd plug it in to handle common LLM output failures. Browse the code · Catalog page

secret-rotation-kit

Zero-downtime secret rotation engine across AWS, GCP, Vault, and Vercel, with overlapping key windows and dual verification. You'd adopt it to rotate secrets in production without outages. Browse the code · Catalog page

circuit-breaker-agents

Circuit breaker library for agent-to-tool and agent-to-agent communication, with per-tool isolation and confidence-aware tripping. You'd use it to prevent cascading failures when tools degrade. Browse the code · Catalog page

session-continuity-kit

Multi-turn session manager for AI agents with conversation windowing, token budget enforcement, and pluggable storage (Firestore, DynamoDB, Redis). You'd adopt it to avoid building session lifecycle logic from scratch. Browse the code · Catalog page

agent-runbook-generator

CLI and library that scan a service repository and produce operator runbooks with alerts, dashboards, failure modes, and rollback steps. You'd use it to automate runbook creation and maintenance. Browse the code · Catalog page

Domain Pipelines

media-pipeline-mcp

MCP tools for generating and processing images, audio, video, documents, and 3D models, with chainable pipelines, quality gates, and budget enforcement. You'd adopt them to build AI agents that produce media without wiring multiple provider SDKs yourself. Browse the code · Catalog page

agent-memory

Long-term memory layer for AI agents that persists and manages information across sessions, with decay and contradiction resolution. You'd use it to give agents consistent recollection without a fire-and-forget vector store. Browse the code · Catalog page

voice-agent-kit

Production voice AI agent transport layer with speech-to-text, MCP tool calls, text-to-speech, and telephony/WebRTC transport. You'd adopt it to build a voice agent without writing audio plumbing or provider switching. Browse the code · Catalog page

Browse the full catalog at reaatech.com/products.

Tagged#recap #weekly

Comments

Loading comments…