Skip to content
reaatechREAATECH

@reaatech/agent-mesh-utils

npm v1.0.0

Provides a three-state circuit breaker for managing agent health, featuring Firestore-backed persistence and leader election for cross-instance state synchronization. It exposes a singleton `circuitBreaker` object for traffic control and lifecycle functions to manage distributed state across Cloud Run instances.

@reaatech/agent-mesh-utils

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Per-agent circuit breaker implementation with Firestore persistence and cross-instance leader election. Prevents cascading failures by isolating unhealthy agents and provides distributed state synchronization across Cloud Run instances.

Installation

terminal
npm install @reaatech/agent-mesh-utils
# or
pnpm add @reaatech/agent-mesh-utils

Feature Overview

  • Three-state circuit breaker — CLOSED (normal), OPEN (reject requests), HALF_OPEN (probing recovery)
  • Exponential backoff — backoff multiplier doubles on each OPEN transition, up to 32×
  • Leader-elected persistence — Firestore-based leader election ensures single-writer sync across instances
  • Automatic state restoration — persisted circuit state survives Cloud Run restarts
  • Configurable thresholds — failure count, reset timeout, half-open max calls — all environment-configurable

Quick Start

typescript
import { circuitBreaker } from "@reaatech/agent-mesh-utils";
 
// Check if an agent can receive traffic
if (circuitBreaker.canCall("serval")) {
  await dispatchToAgent(servalAgent);
}
 
// Record success or failure after dispatch
try {
  await mcpClient.sendMessage(context);
  circuitBreaker.recordSuccess("serval");
} catch (error) {
  circuitBreaker.recordFailure("serval");
}

API Reference

Circuit Breaker

circuitBreaker (singleton)

The global CircuitBreaker instance. All methods are synchronous and thread-safe.

MethodDescription
getState(agentId)Returns the current CircuitBreakerState, auto-transitioning if timeouts have elapsed
canCall(agentId)Returns true if the circuit is CLOSED or HALF_OPEN with available slots
recordSuccess(agentId)Records a successful call; closes the circuit if HALF_OPEN with enough successes
recordFailure(agentId)Records a failure; opens the circuit if threshold reached
forceState(agentId, newState)Forces the circuit to a specific state (testing/admin)
getAllStates()Returns a snapshot of all circuit states
setState(state)Sets a single circuit state (used for restoring from persistence)
setStates(states)Sets multiple circuit states at once
clear()Clears all circuit states (testing)

Circuit States

StateBehavior
CLOSEDNormal operation — requests pass through
OPENFailures >= threshold — requests rejected, auto-transitions to HALF_OPEN after RESET_TIMEOUT_MS * backoff_multiplier
HALF_OPENTesting recovery — limited test calls allowed, reverts to OPEN if any fail

Persistence Layer

startCircuitBreakerPersistence(): Promise<void>

Initializes leader election, restores states from Firestore, and starts periodic sync (leader only).

typescript
import { startCircuitBreakerPersistence, stopCircuitBreakerPersistence } from "@reaatech/agent-mesh-utils";
 
await startCircuitBreakerPersistence();
 
// On shutdown
stopCircuitBreakerPersistence();

stopCircuitBreakerPersistence(): void

Stops the sync interval and cleans up.

isLeader(): boolean

Returns true if this instance currently holds the leader lease.

getLeaderId(): string | null

Returns the current leader’s instance ID.

restoreCircuitBreakerStates(maxRetries?): Promise<void>

Loads all circuit breaker states from Firestore with exponential-backoff retries.

updateCircuitBreakerState(state): Promise<void>

Updates local state and persists to Firestore.

getLocalCircuitBreakerState(agentId)

Returns the local circuit state for a given agent (no Firestore read).

Configuration

All thresholds are configured via environment variables (validated by @reaatech/agent-mesh):

VariableDefaultDescription
CIRCUIT_BREAKER_FAILURE_THRESHOLD5Failures before opening circuit
CIRCUIT_BREAKER_RESET_TIMEOUT_MS30000Time before attempting recovery
CIRCUIT_BREAKER_HALF_OPEN_MAX_CALLS3Test calls allowed in HALF_OPEN
CIRCUIT_BREAKER_HALF_OPEN_TIMEOUT_MS60000Max time in HALF_OPEN before reverting to OPEN
CB_SYNC_INTERVAL_MS5000Leader sync interval
CB_LEADER_LEASE_MS15000Leader lease duration

Usage Patterns

With the Router

typescript
import { circuitBreaker } from "@reaatech/agent-mesh-utils";
import { env } from "@reaatech/agent-mesh";
 
async function dispatchToAgent(agent: AgentConfig) {
  if (env.ENABLE_CIRCUIT_BREAKER && !circuitBreaker.canCall(agent.agent_id)) {
    throw new Error(`Circuit breaker OPEN for agent ${agent.agent_id}`);
  }
 
  try {
    const response = await mcpClient.sendMessage(context);
    circuitBreaker.recordSuccess(agent.agent_id);
    return response;
  } catch (error) {
    circuitBreaker.recordFailure(agent.agent_id);
    throw error;
  }
}

Admin Operations

typescript
// Force-close a circuit (e.g., after fixing a downstream agent)
circuitBreaker.forceState("serval", "CLOSED");
 
// Inspect all circuits
for (const [agentId, state] of circuitBreaker.getAllStates()) {
  console.log(`${agentId}: ${state.state} (failures: ${state.failure_count})`);
}

License

MIT