No new tutorials shipped this week, but we published 41 new open-source repositories — the building blocks for production AI agent systems. These packages handle gateways, evaluation, security, observability, and orchestration, ready for you to compose into your own infrastructure.
New repos
MCP Infrastructure
mcp-gateway
Production MCP gateway with authentication, rate limiting, schema enforcement, tool allowlists, audit trail, fan-out routing, and response caching. You'd adopt it to add production middleware in front of your MCP servers without building each piece from scratch. Browse the code · Catalog page
webhook-relay-mcp
An MCP server that receives webhooks from services like Stripe, GitHub, and Twilio, normalizes them, and exposes events to AI agents through subscription-based polling. You'd use it to bridge third-party webhooks into agent workflows without custom ingestion code. Browse the code · Catalog page
mcp-schema-evolution
Diff engine and CI policy for MCP tool schemas that classifies changes as breaking or non-breaking before a release. You'd adopt it to prevent accidentally breaking MCP consumers when tool fields change. Browse the code · Catalog page
multi-tenant-mcp
Primitives for serving multiple tenants from a single MCP server, with per-tenant rate limits, tool visibility, cost tracking, and isolated storage. You'd use them to avoid standing up separate MCP servers per customer while enforcing boundaries. Browse the code · Catalog page
mcp-server-starter-ts
Production-grade MCP server template in TypeScript with Express 5, composable middleware, dual transports, and built-in observability. You'd adopt it to build a secure, instrumented MCP server without wiring up auth, rate limiting, and logging yourself. Browse the code · Catalog page
mcp-catalog
Internal registry where MCP servers can register themselves and be discovered by capability, with automatic health monitoring. It's also exposed as an MCP server itself, so clients can query it using the same protocol. Browse the code · Catalog page
Orchestration Protocols
confidence-router
A decision engine that turns classifier confidence scores into route/clarify/fallback actions. You'd plug it in to handle ambiguous predictions without hard-coding every edge case. Browse the code · Catalog page
agent-mesh
Multi-agent orchestrator that routes requests based on intent confidence, manages sessions, and isolates failing agents with circuit breakers. You'd use it to run multiple specialized agents behind a single API with automatic fallback. Browse the code · Catalog page
llm-router
Config-driven LLM routing engine that selects models based on cost, latency, or capability, with circuit breakers and fallback chains. You'd adopt it to manage multi-provider costs and add structured degradation paths. Browse the code · Catalog page
agent-handoff-protocol
A library for transferring conversations between AI agents mid-session, including context compression, target scoring, and transport via MCP or A2A. You'd use it to route multi-turn conversations between specialized agents without losing context. Browse the code · Catalog page
a2a-reference-ts
Complete TypeScript implementation of the Agent-to-Agent protocol with server, client, CLI, and an A2A↔MCP bridge. You'd adopt it to get production-grade infrastructure for multi-agent discovery and messaging out of the box. Browse the code · Catalog page
Evals & Quality
agent-eval-harness
Full evaluation pipeline for AI agent trajectories, scoring quality, tool correctness, cost, and latency, with CI/CD regression gates. You'd use it to catch regressions before deploying agents. Browse the code · Catalog page
classifier-evals
Offline evaluation harness for intent classification systems, with confusion matrices, LLM-as-judge, and CI regression gates. You'd adopt it to run repeatable classifier evaluations in production pipelines. Browse the code · Catalog page
prompt-version-control
Git-like versioning for AI prompts with eval-gated promotion. You'd use it to manage prompt iterations across dev, staging, and production without manual copy-pasting. Browse the code · Catalog page
agent-replay
Deterministic recording and replay system for agent interactions, enabling zero-cost debugging and regression testing. You'd adopt it to debug agent behavior without burning LLM tokens on every run. Browse the code · Catalog page
rag-eval-pack
RAG evaluation toolkit with heuristic and LLM-as-judge metrics at three fidelity levels, plus CI quality gates. You'd use it to catch regressions in retrieval-augmented generation systems. Browse the code · Catalog page
agents-md-kit
Linter, validator, and scaffolder for AGENTS.md and SKILL.md files, ensuring consistent, machine-readable agent definitions. You'd adopt it to enforce structure across a multi-agent system's documentation. Browse the code · Catalog page
llm-judge-toolkit
Calibrated LLM-as-judge library with multi-judge consensus, bias detection, and cost tracking. You'd use it to replace ad-hoc evaluation scripts with a structured, statistically grounded pipeline. Browse the code · Catalog page
context-window-planner
Engine that optimizes token allocation within LLM context windows by deciding what to include, summarize, or drop. You'd adopt it to prevent overflowing a model's token budget with configurable strategies. Browse the code · Catalog page
Testing & Security
guardrail-chain
Composable, budget-aware guardrail pipeline for LLM calls, with input/output checks like PII redaction and prompt injection detection. You'd plug it in to add safety layers without wiring each guardrail from scratch. Browse the code · Catalog page
mcp-contract-kit
Conformance test suite for MCP servers that validates spec compliance, contract schemas, and security posture. You'd adopt it to test MCP servers before deployment. Browse the code · Catalog page
prompt-injection-bench
Reproducible benchmark for prompt-injection defenses, with a 300+ template attack corpus and pluggable defense adapters. You'd use it to objectively measure and compare defense effectiveness. Browse the code · Catalog page
mcp-server-doctor
CLI diagnostic tool that runs health checks against MCP servers — latency, auth, payload limits — and grades results A–F. You'd adopt it to catch issues before deployment or in CI. Browse the code · Catalog page
agent-auth-proxy
Identity-aware reverse proxy that handles OAuth2 token management and API key vaulting for agent-to-service communication. You'd use it to secure agent access to downstream APIs without embedding long-lived credentials. Browse the code · Catalog page
mcp-load-test
Purpose-built load testing framework for MCP servers that models concurrent user behavior and produces latency histograms. You'd adopt it to stress-test MCP servers under realistic workloads. Browse the code · Catalog page
agent-chaos
Fault injection toolkit for agent systems that injects failures like latency and malformed output to validate circuit breakers and fallback logic. You'd use it to ensure agent reliability under failure conditions. Browse the code · Catalog page
tool-use-firewall
Policy enforcement layer that intercepts MCP tool calls to validate, rate-limit, and audit before they hit upstream servers. You'd adopt it to prevent destructive agent actions and enforce budgets. Browse the code · Catalog page
Observability & Cost
llm-cost-telemetry
Drop-in wrappers for OpenAI, Anthropic, and Google AI SDKs that capture token usage and cost, plus aggregation and budget enforcement. You'd adopt them to track LLM spend across providers and tenants without building your own telemetry. Browse the code · Catalog page
otel-cost-exporter
OpenTelemetry-native exporter that converts GenAI semantic convention spans into real-time cost metrics per model and provider. You'd use it to track LLM spend without manually maintaining pricing tables. Browse the code · Catalog page
agent-budget-controller
Real-time cost budget enforcement for agents, checking every request against spend limits and degrading gracefully when exceeded. You'd adopt it to prevent runaway agent loops from exhausting your budget. Browse the code · Catalog page
otel-genai-semconv
Instrumented wrappers for major LLM providers that emit OpenTelemetry GenAI semantic convention spans, plus deployable dashboards. You'd adopt it to get spec-compliant observability across providers without writing instrumentation code. Browse the code · Catalog page
llm-cache
Semantic caching layer for LLM calls that returns cached responses for both exact and semantically similar prompts above a threshold. You'd use it to reduce API costs and latency. Browse the code · Catalog page
Reliability & Ops
idempotency-middleware
Framework-agnostic idempotency middleware for POST, PUT, and PATCH requests that safely replays cached responses for duplicate idempotency keys. You'd adopt it to safely retry payment charges or webhook deliveries. Browse the code · Catalog page
structured-output-repair
Repair engine that fixes malformed LLM JSON outputs — stripping markdown, extracting JSON, fixing syntax — to return valid data instead of crashing. You'd plug it in to handle common LLM output failures. Browse the code · Catalog page
secret-rotation-kit
Zero-downtime secret rotation engine across AWS, GCP, Vault, and Vercel, with overlapping key windows and dual verification. You'd adopt it to rotate secrets in production without outages. Browse the code · Catalog page
circuit-breaker-agents
Circuit breaker library for agent-to-tool and agent-to-agent communication, with per-tool isolation and confidence-aware tripping. You'd use it to prevent cascading failures when tools degrade. Browse the code · Catalog page
session-continuity-kit
Multi-turn session manager for AI agents with conversation windowing, token budget enforcement, and pluggable storage (Firestore, DynamoDB, Redis). You'd adopt it to avoid building session lifecycle logic from scratch. Browse the code · Catalog page
agent-runbook-generator
CLI and library that scan a service repository and produce operator runbooks with alerts, dashboards, failure modes, and rollback steps. You'd use it to automate runbook creation and maintenance. Browse the code · Catalog page
Domain Pipelines
media-pipeline-mcp
MCP tools for generating and processing images, audio, video, documents, and 3D models, with chainable pipelines, quality gates, and budget enforcement. You'd adopt them to build AI agents that produce media without wiring multiple provider SDKs yourself. Browse the code · Catalog page
agent-memory
Long-term memory layer for AI agents that persists and manages information across sessions, with decay and contradiction resolution. You'd use it to give agents consistent recollection without a fire-and-forget vector store. Browse the code · Catalog page
voice-agent-kit
Production voice AI agent transport layer with speech-to-text, MCP tool calls, text-to-speech, and telephony/WebRTC transport. You'd adopt it to build a voice agent without writing audio plumbing or provider switching. Browse the code · Catalog page
Browse the full catalog at reaatech.com/products.
- recap
Daily recap for June 7, 2026
Five new step-by-step tutorials landed today, covering HR compliance, insurance quote comparison, GitHub DevOps, financial Q&A, and secure Databricks analysis — plus 24 package updates across five toolkits.
- recap
Daily recap for June 6, 2026
Today we shipped six new tutorials, headlined by a LangChain reliability suite that keeps SMB support agents online 24/7 with circuit breakers and automatic runbooks.
- recap
Daily recap for June 5, 2026
Today we shipped 7 new tutorials, including a Return Reason Agent for Shopify that automates refund/RMA decisions from return reasons.
Comments
Sign in with GitHub to comment and vote.
