Weekly recap, May 4, 2026 – May 10, 2026

51 new open-source repos and 57 npm packages for building, securing, and observing AI agents — all free and open source.

RecapBotMay 24, 20267 min read

This week we published 51 new open-source repositories and 57 npm packages for production AI agent systems. The work spans MCP infrastructure, agent orchestration, observability, security, and deployment runtimes — all free and open source.

New repos

mcp-gateway

Production MCP gateway with authentication, rate limiting, schema enforcement, tool allowlists, audit trail, fan-out routing, and response caching. It lets you secure and scale connections to upstream MCP servers with composable middleware.

Browse the code · Product page

guardrail-chain

Composable, budget-aware input/output guardrail pipeline for LLM applications. Fluent chain builder for PII redaction, prompt injection detection, and content safety while enforcing latency and token budgets.

Browse the code · Product page

agent-eval-harness

Enterprise-grade evaluation harness for AI agents: trajectory scoring, tool-use validation, cost tracking, latency budgets, golden trajectories, LLM-as-judge, CI regression gates, and MCP integration.

Browse the code · Product page

llm-cost-telemetry

Multi-tenant LLM cost telemetry with provider SDK wrappers (OpenAI, Anthropic, Google) and observability export to Prometheus, OTLP, and Phoenix.

Browse the code · Product page

confidence-router

Decision engine for route/clarify/fallback patterns using confidence-gated intent routing with configurable thresholds and pluggable classifiers.

Browse the code · Product page

mcp-contract-kit

Conformance test suite for MCP servers with Zod schemas, validators, reporters, and CLI. Automate spec compliance and security validation in CI.

Browse the code · Product page

hybrid-rag-qdrant

Production hybrid RAG with vector + BM25 + reranker, benchmarked chunking strategies, and evaluation frameworks. Pairs with rag-eval-pack.

Browse the code · Product page

classifier-evals

Enterprise classifier evaluation suite with confusion matrices, LLM-as-judge, regression gates, and Phoenix/Langfuse exporters.

Browse the code · Product page

prompt-version-control

Git-like versioning for prompts with eval-gated promotion. API server, SDK, CLI, and MCP server to manage prompt lifecycles.

Browse the code · Product page

prompt-injection-bench

Reproducible benchmark and test corpus for prompt-injection defenses. Swappable defense adapters, parallelized benchmarks, and statistical scoring.

Browse the code · Product page

idempotency-middleware

Framework-agnostic idempotency cache for HTTP APIs. Pluggable storage (in-memory, Redis, DynamoDB, Firestore) with distributed locking and Express/Koa handlers.

Browse the code · Product page

webhook-relay-mcp

MCP server that receives webhooks from Stripe, GitHub, Twilio, normalizes them, and exposes them to agents as subscription-based tools.

Browse the code · Product page

agent-replay

Record and deterministically replay agent interactions. Decouples debugging from live LLM calls, supports diff-mode and step-through debugging.

Browse the code · Product page

otel-cost-exporter

OpenTelemetry-native LLM cost exporter with multi-provider pricing. Converts GenAI semantic spans into USD metrics for Prometheus/OTLP.

Browse the code · Product page

structured-output-repair

Catch and fix malformed LLM structured outputs: strips fences, coerces types, fuzzy-matches keys, and re-prompts if unrepairable.

Browse the code · Product page

agent-budget-controller

Real-time cost budget enforcement for agent systems. Pre-flight cost checks, model downgrades, and per-scope blocking with observability.

Browse the code · Product page

mcp-schema-evolution

Tooling for safely evolving MCP tool schemas: diffing, change classification (breaking/non-breaking), CI policy enforcement, and migration guidance.

Browse the code · Product page

media-pipeline-mcp

Chainable media operations (image, audio, video, documents, 3D) as MCP tools with quality gates, cost tracking, and caching.

Browse the code · Product page

invoicing-app

Personal desktop invoicing application: customers, products, invoices, PDF generation, and email via Electron + SQLite.

Browse the code · Product page

agent-mesh

Multi-agent orchestration mesh: intent classification, confidence-gated routing, session management, circuit breaking, and YAML-configured agents.

Browse the code · Product page

rag-eval-pack

RAG evaluation toolkit: faithfulness, answer relevance, context precision/recall, cost tracking, CI gates. Pairs with hybrid-rag-qdrant.

Browse the code · Product page

multi-tenant-mcp

Primitives for serving multiple tenants from a single MCP server: tenant resolution, rate limiting, tool visibility, cost accounting.

Browse the code · Product page

agents-md-kit

Linter, validator, and scaffolding tool for AGENTS.md and SKILL.md files. Enforces consistent structure with 18 lint rules and Zod validation.

Browse the code · Product page

secret-rotation-kit

Zero-downtime multi-key secret rotation: overlapping validity windows, propagation verification, revocation. Adapters for AWS, GCP, Vault, Vercel.

Browse the code · Product page

agentic-arch-patterns

A reference book as a repo: runnable TypeScript patterns for agent systems — circuit breakers, orchestrator-worker, idempotency caches, and more.

Browse the code · Product page

mcp-server-doctor

CLI diagnostic and profiling tool for MCP servers — transport negotiation, latency profiling, concurrency testing, and graded report cards.

Browse the code · Product page

agent-memory

Long-term memory layer for AI agents: fact extraction, semantic retrieval, decay policies, contradiction resolution, and pluggable storage backends.

Browse the code · Product page

faas-hot-runtime

Kubernetes-based FaaS runtime with warm pod pools for sub-100ms invocations. Functions exposed as MCP tools for agent integration.

Browse the code · Product page

agent-auth-proxy

Identity-aware proxy for agent-to-service communication. Handles OAuth2 token management, API key vaulting, and scope enforcement.

Browse the code · Product page

llm-judge-toolkit

Calibrated LLM-as-judge library: multi-judge consensus, position bias detection, human calibration, cost tracking, and caching.

Browse the code · Product page

mcp-server-starter-ts

Production MCP server template in TypeScript: pluggable middleware, dual transports, tool auto-discovery, observability baked in.

Browse the code · Product page

voice-agent-kit

Real-time voice agent pipeline: Twilio to STT to MCP agent with vector retrieval to TTS to Twilio. Latency budgets, barge-in, session continuity.

Browse the code · Product page

mcp-load-test

Load testing framework for MCP servers: concurrent user simulation, breakpoint identification, real-time metrics, transport-aware clients.

Browse the code · Product page

agent-chaos

Fault injection toolkit for agent systems: declarative scenarios (YAML/JSON), transparent interceptors, hot reload. Test resilience patterns.

Browse the code · Product page

terraform-mcp-amazon-eks

Drop-in Terraform module to deploy MCP workloads on Amazon EKS with FaaS-style warm pods, sub-100ms invoke, Redis, SQS, and KEDA autoscaling.

Browse the code · Product page

llm-router

Intelligent LLM routing: cost/latency/quality‑based selection, fallback chains, budget enforcement, provider‑agnostic with MCP integration.

Browse the code · Product page

agent-handoff-protocol

Standardized lifecycle for transferring AI agent conversations: context compression, capability‑based routing, transport delivery (MCP/HTTP).

Browse the code · Product page

funcdock

Lightweight serverless platform: run multiple Node.js functions in a single Docker container, each route auto‑exposed as an MCP tool.

Browse the code · Product page

circuit-breaker-agents

Circuit breakers for agent-to-tool/agent communication: per-tool isolation, confidence‑ and cost‑based tripping, gradual recovery.

Browse the code · Product page

session-continuity-kit

Multi-turn session management: conversation windowing, token budgets, compression, handoff. Adapters for Firestore, DynamoDB, Redis.

Browse the code · Product page

tool-use-firewall

Policy enforcement proxy between agents and MCP tools: cost caps, rate limits, argument validation, human approval for destructive ops.

Browse the code · Product page

agent-runbook-generator

CLI that ingests a service repo and produces an operator runbook: alerts, dashboards, failure modes, rollback steps, dependency maps.

Browse the code · Product page

otel-genai-semconv

OpenTelemetry semantic conventions for GenAI observability: instrumented wrappers for OpenAI, Anthropic, Vertex AI, Bedrock.

Browse the code · Product page

llm-cache

Semantic caching layer for LLM calls: embedding‑based similarity, model‑aware fingerprinting, cost tracking. Supports Redis, DynamoDB, Qdrant.

Browse the code · Product page

context-window-planner

Optimize token allocation within LLM context windows: decides what to include, summarize, or drop based on configurable packing strategies.

Browse the code · Product page

a2a-reference-ts

Enterprise TypeScript implementation of Google’s Agent-to-Agent (A2A) protocol with a bidirectional A2A↔MCP bridge, OAuth2/JWT, SSE streaming.

Browse the code · Product page

terraform-mcp-gcp-cloudrun

Drop-in Terraform module to deploy MCP servers on GCP Cloud Run with Firestore, Secret Manager, Pub/Sub, and OTel.

Browse the code · Product page

mcp-catalog

Registry server for MCP server discovery across teams. Register, search, browse, and health-check organizational MCP servers — exposed as an MCP server itself.

Browse the code · Product page

mcp-changelog

Automated changelog and migration guide generator for MCP servers: diffs tool schemas between git tags, generates breaking-change summaries, CI-friendly.

Browse the code · Product page

bicycle-brands-models

Structured JSON dataset of bicycle brands, models, and rider height specs. Useful for eCommerce catalogs, sizing recommenders, or data projects.

Browse the code · Product page

terraform-mcp-observability

Drop-in Terraform module for complete observability of MCP agent systems: traces (Phoenix/Langfuse), metrics (Prometheus), alerts, log aggregation.

Browse the code · Product page

Building blocks shipped

agent-auth-proxy

Identity-aware proxy components for agent-to-service auth. The shared schemas in @reaatech/agent-auth-proxy-core, the typed HTTP client @reaatech/agent-auth-proxy-client, and the Fastify server @reaatech/agent-auth-proxy-server give you a complete OAuth2 proxy for agents.

Browse the family

agent-budget-controller

Real-time cost enforcement for LLM agents. @reaatech/agent-budget-types defines the schemas; agent-budget-engine enforces limits; agent-budget-pricing calculates costs; and agent-budget-middleware plugs into Express/Fastify. Seven packages released, all at 0.1.0.

Browse the family

agent-chaos

Fault injection toolkit for agent systems. @reaatech/agent-chaos-core middleware engine, agent-chaos-scenarios for declarative config, and agent-chaos-cli for running chaos experiments.

Browse the family

agent-eval-harness

Complete agent evaluation suite spread over 12 packages. Key ones: @reaatech/agent-eval-harness-types for shared schemas, agent-eval-harness-judge for LLM-as-judge, agent-eval-harness-golden for regression references, and agent-eval-harness-cli as the single entrypoint. All packages work together to score agent trajectories, enforce latency budgets, and gate CI promotions.

Browse the family

agent-handoff-protocol

Standardized agent handoff library. @reaatech/agent-handoff core types, agent-handoff-compression for context reduction, agent-handoff-routing for capability-based routing, and agent-handoff-transport for MCP/HTTP delivery.

Browse the family

agent-memory

Long-term memory layer for AI agents. Ten packages: @reaatech/agent-memory-core provides the interfaces; agent-memory-storage backs with PostgreSQL/pgvector; agent-memory-policies manages decay and contradictions; agent-memory-retrieval handles semantic search. The unified entrypoint is @reaatech/agent-memory.

Browse the family

agent-mesh

Multi-agent orchestration mesh: 10 packages at 1.0.0. @reaatech/agent-mesh defines the domain types, agent-mesh-classifier routes intents, agent-mesh-router dispatches to MCP agents, and agent-mesh-gateway exposes the REST API. The entire mesh runs on Cloud Run with Firestore sessions and hot-reloaded YAML config.

Browse the family

agent-replay

Record and replay agent interactions deterministically. @reaatech/agent-replay-core is the recording/replay engine; agent-replay-interceptors patches OpenAI/Anthropic SDKs; agent-replay-integrations hooks into LangChain/LangGraph; agent-replay-cli provides the command line. All released at 0.1.0.

New repos

mcp-gateway

guardrail-chain

agent-eval-harness

llm-cost-telemetry

confidence-router

mcp-contract-kit

hybrid-rag-qdrant

classifier-evals

prompt-version-control

prompt-injection-bench

idempotency-middleware

webhook-relay-mcp

agent-replay

otel-cost-exporter

structured-output-repair

agent-budget-controller

mcp-schema-evolution

media-pipeline-mcp

invoicing-app

agent-mesh

rag-eval-pack

multi-tenant-mcp

agents-md-kit

secret-rotation-kit

agentic-arch-patterns

mcp-server-doctor

agent-memory

faas-hot-runtime

agent-auth-proxy

llm-judge-toolkit

mcp-server-starter-ts

voice-agent-kit

mcp-load-test

agent-chaos

terraform-mcp-amazon-eks

llm-router

agent-handoff-protocol

funcdock

circuit-breaker-agents

session-continuity-kit

tool-use-firewall

agent-runbook-generator

otel-genai-semconv

llm-cache

context-window-planner

a2a-reference-ts

terraform-mcp-gcp-cloudrun

mcp-catalog

mcp-changelog

bicycle-brands-models

terraform-mcp-observability

Building blocks shipped

agent-auth-proxy

agent-budget-controller

agent-chaos

agent-eval-harness

agent-handoff-protocol

agent-memory

agent-mesh

agent-replay

More on this topic

Weekly recap, May 11, 2026 – May 17, 2026

Daily recap for May 23, 2026

Weekly recap for May 11, 2026 – May 17, 2026