This week we published 51 new open-source repositories and 57 npm packages for production AI agent systems. The work spans MCP infrastructure, agent orchestration, observability, security, and deployment runtimes — all free and open source.
New repos
mcp-gateway
Production MCP gateway with authentication, rate limiting, schema enforcement, tool allowlists, audit trail, fan-out routing, and response caching. It lets you secure and scale connections to upstream MCP servers with composable middleware.
Browse the code · Product page
guardrail-chain
Composable, budget-aware input/output guardrail pipeline for LLM applications. Fluent chain builder for PII redaction, prompt injection detection, and content safety while enforcing latency and token budgets.
Browse the code · Product page
agent-eval-harness
Enterprise-grade evaluation harness for AI agents: trajectory scoring, tool-use validation, cost tracking, latency budgets, golden trajectories, LLM-as-judge, CI regression gates, and MCP integration.
Browse the code · Product page
llm-cost-telemetry
Multi-tenant LLM cost telemetry with provider SDK wrappers (OpenAI, Anthropic, Google) and observability export to Prometheus, OTLP, and Phoenix.
Browse the code · Product page
confidence-router
Decision engine for route/clarify/fallback patterns using confidence-gated intent routing with configurable thresholds and pluggable classifiers.
Browse the code · Product page
mcp-contract-kit
Conformance test suite for MCP servers with Zod schemas, validators, reporters, and CLI. Automate spec compliance and security validation in CI.
Browse the code · Product page
hybrid-rag-qdrant
Production hybrid RAG with vector + BM25 + reranker, benchmarked chunking strategies, and evaluation frameworks. Pairs with rag-eval-pack.
Browse the code · Product page
classifier-evals
Enterprise classifier evaluation suite with confusion matrices, LLM-as-judge, regression gates, and Phoenix/Langfuse exporters.
Browse the code · Product page
prompt-version-control
Git-like versioning for prompts with eval-gated promotion. API server, SDK, CLI, and MCP server to manage prompt lifecycles.
Browse the code · Product page
prompt-injection-bench
Reproducible benchmark and test corpus for prompt-injection defenses. Swappable defense adapters, parallelized benchmarks, and statistical scoring.
Browse the code · Product page
idempotency-middleware
Framework-agnostic idempotency cache for HTTP APIs. Pluggable storage (in-memory, Redis, DynamoDB, Firestore) with distributed locking and Express/Koa handlers.
Browse the code · Product page
webhook-relay-mcp
MCP server that receives webhooks from Stripe, GitHub, Twilio, normalizes them, and exposes them to agents as subscription-based tools.
Browse the code · Product page
agent-replay
Record and deterministically replay agent interactions. Decouples debugging from live LLM calls, supports diff-mode and step-through debugging.
Browse the code · Product page
otel-cost-exporter
OpenTelemetry-native LLM cost exporter with multi-provider pricing. Converts GenAI semantic spans into USD metrics for Prometheus/OTLP.
Browse the code · Product page
structured-output-repair
Catch and fix malformed LLM structured outputs: strips fences, coerces types, fuzzy-matches keys, and re-prompts if unrepairable.
Browse the code · Product page
agent-budget-controller
Real-time cost budget enforcement for agent systems. Pre-flight cost checks, model downgrades, and per-scope blocking with observability.
Browse the code · Product page
mcp-schema-evolution
Tooling for safely evolving MCP tool schemas: diffing, change classification (breaking/non-breaking), CI policy enforcement, and migration guidance.
Browse the code · Product page
media-pipeline-mcp
Chainable media operations (image, audio, video, documents, 3D) as MCP tools with quality gates, cost tracking, and caching.
Browse the code · Product page
invoicing-app
Personal desktop invoicing application: customers, products, invoices, PDF generation, and email via Electron + SQLite.
Browse the code · Product page
agent-mesh
Multi-agent orchestration mesh: intent classification, confidence-gated routing, session management, circuit breaking, and YAML-configured agents.
Browse the code · Product page
rag-eval-pack
RAG evaluation toolkit: faithfulness, answer relevance, context precision/recall, cost tracking, CI gates. Pairs with hybrid-rag-qdrant.
Browse the code · Product page
multi-tenant-mcp
Primitives for serving multiple tenants from a single MCP server: tenant resolution, rate limiting, tool visibility, cost accounting.
Browse the code · Product page
agents-md-kit
Linter, validator, and scaffolding tool for AGENTS.md and SKILL.md files. Enforces consistent structure with 18 lint rules and Zod validation.
Browse the code · Product page
secret-rotation-kit
Zero-downtime multi-key secret rotation: overlapping validity windows, propagation verification, revocation. Adapters for AWS, GCP, Vault, Vercel.
Browse the code · Product page
agentic-arch-patterns
A reference book as a repo: runnable TypeScript patterns for agent systems — circuit breakers, orchestrator-worker, idempotency caches, and more.
Browse the code · Product page
mcp-server-doctor
CLI diagnostic and profiling tool for MCP servers — transport negotiation, latency profiling, concurrency testing, and graded report cards.
Browse the code · Product page
agent-memory
Long-term memory layer for AI agents: fact extraction, semantic retrieval, decay policies, contradiction resolution, and pluggable storage backends.
Browse the code · Product page
faas-hot-runtime
Kubernetes-based FaaS runtime with warm pod pools for sub-100ms invocations. Functions exposed as MCP tools for agent integration.
Browse the code · Product page
agent-auth-proxy
Identity-aware proxy for agent-to-service communication. Handles OAuth2 token management, API key vaulting, and scope enforcement.
Browse the code · Product page
llm-judge-toolkit
Calibrated LLM-as-judge library: multi-judge consensus, position bias detection, human calibration, cost tracking, and caching.
Browse the code · Product page
mcp-server-starter-ts
Production MCP server template in TypeScript: pluggable middleware, dual transports, tool auto-discovery, observability baked in.
Browse the code · Product page
voice-agent-kit
Real-time voice agent pipeline: Twilio to STT to MCP agent with vector retrieval to TTS to Twilio. Latency budgets, barge-in, session continuity.
Browse the code · Product page
mcp-load-test
Load testing framework for MCP servers: concurrent user simulation, breakpoint identification, real-time metrics, transport-aware clients.
Browse the code · Product page
agent-chaos
Fault injection toolkit for agent systems: declarative scenarios (YAML/JSON), transparent interceptors, hot reload. Test resilience patterns.
Browse the code · Product page
terraform-mcp-amazon-eks
Drop-in Terraform module to deploy MCP workloads on Amazon EKS with FaaS-style warm pods, sub-100ms invoke, Redis, SQS, and KEDA autoscaling.
Browse the code · Product page
llm-router
Intelligent LLM routing: cost/latency/quality‑based selection, fallback chains, budget enforcement, provider‑agnostic with MCP integration.
Browse the code · Product page
agent-handoff-protocol
Standardized lifecycle for transferring AI agent conversations: context compression, capability‑based routing, transport delivery (MCP/HTTP).
Browse the code · Product page
funcdock
Lightweight serverless platform: run multiple Node.js functions in a single Docker container, each route auto‑exposed as an MCP tool.
Browse the code · Product page
circuit-breaker-agents
Circuit breakers for agent-to-tool/agent communication: per-tool isolation, confidence‑ and cost‑based tripping, gradual recovery.
Browse the code · Product page
session-continuity-kit
Multi-turn session management: conversation windowing, token budgets, compression, handoff. Adapters for Firestore, DynamoDB, Redis.
Browse the code · Product page
tool-use-firewall
Policy enforcement proxy between agents and MCP tools: cost caps, rate limits, argument validation, human approval for destructive ops.
Browse the code · Product page
agent-runbook-generator
CLI that ingests a service repo and produces an operator runbook: alerts, dashboards, failure modes, rollback steps, dependency maps.
Browse the code · Product page
otel-genai-semconv
OpenTelemetry semantic conventions for GenAI observability: instrumented wrappers for OpenAI, Anthropic, Vertex AI, Bedrock.
Browse the code · Product page
llm-cache
Semantic caching layer for LLM calls: embedding‑based similarity, model‑aware fingerprinting, cost tracking. Supports Redis, DynamoDB, Qdrant.
Browse the code · Product page
context-window-planner
Optimize token allocation within LLM context windows: decides what to include, summarize, or drop based on configurable packing strategies.
Browse the code · Product page
a2a-reference-ts
Enterprise TypeScript implementation of Google’s Agent-to-Agent (A2A) protocol with a bidirectional A2A↔MCP bridge, OAuth2/JWT, SSE streaming.
Browse the code · Product page
terraform-mcp-gcp-cloudrun
Drop-in Terraform module to deploy MCP servers on GCP Cloud Run with Firestore, Secret Manager, Pub/Sub, and OTel.
Browse the code · Product page
mcp-catalog
Registry server for MCP server discovery across teams. Register, search, browse, and health-check organizational MCP servers — exposed as an MCP server itself.
Browse the code · Product page
mcp-changelog
Automated changelog and migration guide generator for MCP servers: diffs tool schemas between git tags, generates breaking-change summaries, CI-friendly.
Browse the code · Product page
bicycle-brands-models
Structured JSON dataset of bicycle brands, models, and rider height specs. Useful for eCommerce catalogs, sizing recommenders, or data projects.
Browse the code · Product page
terraform-mcp-observability
Drop-in Terraform module for complete observability of MCP agent systems: traces (Phoenix/Langfuse), metrics (Prometheus), alerts, log aggregation.
Browse the code · Product page
Building blocks shipped
agent-auth-proxy
Identity-aware proxy components for agent-to-service auth. The shared schemas in @reaatech/agent-auth-proxy-core, the typed HTTP client @reaatech/agent-auth-proxy-client, and the Fastify server @reaatech/agent-auth-proxy-server give you a complete OAuth2 proxy for agents.
agent-budget-controller
Real-time cost enforcement for LLM agents. @reaatech/agent-budget-types defines the schemas; agent-budget-engine enforces limits; agent-budget-pricing calculates costs; and agent-budget-middleware plugs into Express/Fastify. Seven packages released, all at 0.1.0.
agent-chaos
Fault injection toolkit for agent systems. @reaatech/agent-chaos-core middleware engine, agent-chaos-scenarios for declarative config, and agent-chaos-cli for running chaos experiments.
agent-eval-harness
Complete agent evaluation suite spread over 12 packages. Key ones: @reaatech/agent-eval-harness-types for shared schemas, agent-eval-harness-judge for LLM-as-judge, agent-eval-harness-golden for regression references, and agent-eval-harness-cli as the single entrypoint. All packages work together to score agent trajectories, enforce latency budgets, and gate CI promotions.
agent-handoff-protocol
Standardized agent handoff library. @reaatech/agent-handoff core types, agent-handoff-compression for context reduction, agent-handoff-routing for capability-based routing, and agent-handoff-transport for MCP/HTTP delivery.
agent-memory
Long-term memory layer for AI agents. Ten packages: @reaatech/agent-memory-core provides the interfaces; agent-memory-storage backs with PostgreSQL/pgvector; agent-memory-policies manages decay and contradictions; agent-memory-retrieval handles semantic search. The unified entrypoint is @reaatech/agent-memory.
agent-mesh
Multi-agent orchestration mesh: 10 packages at 1.0.0. @reaatech/agent-mesh defines the domain types, agent-mesh-classifier routes intents, agent-mesh-router dispatches to MCP agents, and agent-mesh-gateway exposes the REST API. The entire mesh runs on Cloud Run with Firestore sessions and hot-reloaded YAML config.
agent-replay
Record and replay agent interactions deterministically. @reaatech/agent-replay-core is the recording/replay engine; agent-replay-interceptors patches OpenAI/Anthropic SDKs; agent-replay-integrations hooks into LangChain/LangGraph; agent-replay-cli provides the command line. All released at 0.1.0.
- recap
Weekly recap, May 11, 2026 – May 17, 2026
This week we shipped 47 step-by-step tutorials for small-business AI — covering voice agents, document pipelines, RAG, observability, and more — alongside 134 new npm packages and a new repository.
- recap
Daily recap for May 23, 2026
Five step-by-step AI tutorials for small business — from self-hosted financial analysis to Shopify voice agents — landed today, along with 22 new building-block packages for evals, prompt control, and secret rotation.
- recap
Weekly recap for May 11, 2026 – May 17, 2026
A customer support knowledge base that answers questions from your help articles, 46 other AI tutorials, and 134 new building-block packages landed this week.
Comments
Sign in with GitHub to comment and vote.
