Small businesses rely on AI support agents, but LLM provider outages, API rate limits, and configuration drift can bring the agent offline for hours, costing sales and trust. They need a robust reliability layer they don’t have to build themselves.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial builds a full reliability suite for AI support agents using LangChain, circuit breakers, idempotent workflows, automatic API key rotation, and self-generating runbooks. You’ll end up with a Next.js app that routes LLM calls through a circuit breaker that fails over between OpenAI and Anthropic, wraps every request in idempotency middleware backed by DynamoDB, limits rate with Upstash Redis, persists conversation sessions to DynamoDB, and runs a Temporal maintenance workflow that rotates API keys and generates incident runbooks automatically.
Prerequisites
Node.js 22+ and pnpm 10 installed on your machine
A Temporal server running (or a cloud account) for the maintenance workflow
Upstash Redis account (free tier works) for rate limiting
AWS credentials with DynamoDB access (or local DynamoDB via Docker)
OpenAI API key and Anthropic API key for the two LLM providers
Basic familiarity with TypeScript, Next.js App Router, and LangChain concepts
Step 1: Scaffold the Next.js project
Create the project with the Next.js App Router and TypeScript, then install all dependencies. Run these commands in an empty directory:
Expected output:pnpm install resolves all packages and writes pnpm-lock.yaml. Verify no ^ or ~ appears in package.json dependencies.
Step 2: Set up environment variables
Create .env.example with placeholders for every configuration value the app reads. The app reads these at startup through a Zod-validated config function.
env
# Env vars used by langchain-reliability-suite-for-smb-support-agents-with-auto-runbooks.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# LLM providersOPENAI_API_KEY=<your-openai-key>ANTHROPIC_API_KEY=<your-anthropic-key># AWS DynamoDBAWS_REGION=us-east-1AWS_ACCESS_KEY_ID=<your-access-key>AWS_SECRET_ACCESS_KEY=<your-secret>DYNAMODB_SESSIONS_TABLE=sessionsDYNAMODB_IDEMPOTENCY_TABLE=idempotency# TemporalTEMPORAL_HOST=<temporal-server-address>TEMPORAL_TASK_QUEUE=reliability-tasks# Upstash Redis (rate limiting)UPSTASH_REDIS_URL=<your-upstash-url>UPSTASH_REDIS_TOKEN=<your-upstash-token># Langfuse observabilityLANGFUSE_PUBLIC_KEY=<your-langfuse-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret>LANGFUSE_BASE_URL=https://cloud.langfuse.com# Circuit breaker defaultsCIRCUIT_BREAKER_FAILURE_THRESHOLD=5CIRCUIT_BREAKER_RECOVERY_TIMEOUT_MS=30000# Secret rotationSECRET_ROTATION_INTERVAL_MS=86400000# Rate limitingRATE_LIMIT_MAX_REQUESTS=100RATE_LIMIT_WINDOW_MS=60000# AdminADMIN_TOKEN=<your-admin-token>
Copy this to .env.local and fill in real values for local development.
Expected output: The .env.example file exists at the project root with all 21 environment variable placeholders.
Step 3: Define shared types and custom error classes
Create src/lib/types.ts with the core domain types used across the codebase:
Expected output: Running loadConfig() with all required env vars returns a typed object with nested sections (config.llm.openaiKey, config.circuitBreaker.failureThreshold, etc.). Missing OPENAI_API_KEY throws immediately.
Step 5: Build the ReliableChain with circuit breaker, fallback, pRetry, and Langfuse tracing
This is the core of the reliability layer. Create src/services/reliable-chain.ts:
ts
import { CircuitBreaker, CircuitOpenError, InMemoryAdapter, DefaultMetricsCollector,} from "@reaatech/circuit-breaker-agents";import { ChatOpenAI } from "@langchain/openai";import { ChatAnthropic } from "@langchain/anthropic";import { type BaseMessage, AIMessage } from "@langchain/core/messages";import pRetry from "p-retry";import { ProviderUnavailableError } from "../lib/errors.js";import type { HealthStatus, LLMProvider } from "../lib/types.js";let langfuseHandlerSingleton: Record<string, unknown> |
Expected output: The class is a singleton (ReliableChain.instance). When invoke() is called, it wraps the primary ChatOpenAI call in pRetry (3 retries, 1s backoff) inside a CircuitBreaker.execute(). If the circuit is open (CircuitOpenError), it falls back to ChatAnthropic. The getHealth() method returns the active provider, circuit state, and success/failure metrics.
Step 6: Add idempotency middleware with DynamoDB adapter
Create src/middleware/idempotency.ts. This implements a DynamoDBStorageAdapter that satisfies the StorageAdapter interface from @reaatech/idempotency-middleware:
ts
import { IdempotencyMiddleware, type StorageAdapter, type IdempotencyRecord, IdempotencyError, IdempotencyErrorCode,} from "@reaatech/idempotency-middleware";import { DynamoDBClient } from "@aws-sdk/client-dynamodb";import { DynamoDBDocumentClient, GetCommand, PutCommand, DeleteCommand,} from "@aws-sdk/lib-dynamodb";class DynamoDBStorageAdapter implements StorageAdapter { private client: DynamoDBDocumentClient; private tableName: string; private inMemoryStore = new Map<string
Expected output: The DynamoDBStorageAdapter implements all 8 methods (get, set, delete, connect, disconnect, acquireLock, releaseLock, waitForLock). Lock acquisition uses DynamoDB PutItem with ConditionExpression: "attribute_not_exists(PK)". The withIdempotency() function delegates to middlewareInstance.execute().
Step 7: Add rate limiting with Upstash Redis
Create src/middleware/rate-limit.ts:
ts
import { Redis } from "@upstash/redis";import { Ratelimit } from "@upstash/ratelimit";import type { Duration } from "@upstash/ratelimit";let ratelimitInstance: Ratelimit | null = null;export function createRateLimiter(config: { url: string; token: string; maxRequests: number; windowMs: number }): Ratelimit { if (ratelimitInstance) return ratelimitInstance; const redis = new Redis({ url: config.url, token: config.token }); const seconds = String(config.windowMs / 1000) + " s"; ratelimitInstance = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(config.maxRequests, seconds as Duration), }); return ratelimitInstance;}export async function checkRateLimit(userId: string): Promise<{ allowed: boolean; remaining: number; resetMs: number }> { if (!ratelimitInstance) { return { allowed: true, remaining: 999, resetMs: 0 }; } const result = await ratelimitInstance.limit(userId); return { allowed: result.success, remaining: result.remaining, resetMs: result.reset };}
Expected output:checkRateLimit("user-1") returns { allowed: true, remaining: 99, resetMs: ... }. When remaining reaches 0, allowed becomes false. Different users have independent counters.
Step 8: Set up the DynamoDB session store
Create src/services/session-store.ts:
ts
import { DynamoDBAdapter } from "@reaatech/session-continuity-storage-dynamodb";import { DynamoDBClient } from "@aws-sdk/client-dynamodb";import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";let adapter: DynamoDBAdapter | null = null;function getAdapter(): DynamoDBAdapter { if (!adapter) { const client = DynamoDBDocumentClient.from(new DynamoDBClient({ region: process.env.AWS_REGION ?? "us-east-1" })); adapter = new DynamoDBAdapter({ client, tableName: process.env.DYNAMODB_SESSIONS_TABLE ?? "sessions" }); } return adapter;}export async function createSession(userId: string, agentId?: string) { const a = getAdapter(); return a.createSession({ userId, status: "active" as const, activeAgentId: agentId, metadata: {}, participants: [], schemaVersion: 1, });}export async function getSession(sessionId: string) { const a = getAdapter(); return a.getSession(sessionId);}export async function addMessage(sessionId: string, message: { role: "user" | "assistant" | "system" | "tool"; content: string }) { const a = getAdapter(); return a.addMessage(sessionId, message);}export async function getMessages(sessionId: string) { const a = getAdapter(); return a.getMessages(sessionId);}export async function deleteSession(sessionId: string) { const a = getAdapter(); return a.deleteSession(sessionId);}
Expected output: The DynamoDBAdapter from @reaatech/session-continuity-storage-dynamodb implements a single-table DynamoDB design. The adapter manages sessions and messages using composite keys. createSession() returns a session with .id; getMessages() returns messages in insertion order.
Step 9: Build the secret rotation service with a mock provider
Create src/services/rotation-service.ts:
ts
import { RotationManager } from "@reaatech/secret-rotation-core";import type { SecretProvider, SecretValue, ProviderHealth as PHealth, ProviderCapabilities as PCaps, RotationSession } from "@reaatech/secret-rotation-types";import * as crypto from "node:crypto";class MockSecretProvider implements SecretProvider { name = "mock"; priority = 100; private secrets = new Map<string, string>(); private versions = new Map<string, Map<string, string>>();
Expected output: The RotationManager orchestrates the full rotation lifecycle. The MockSecretProvider implements all methods of the SecretProvider interface with an in-memory Map. Subscribe to key_activated and rotation_failed events for logging.
Step 10: Build the auto-runbook generator
Create src/services/runbook-generator.ts:
ts
import { type AnalysisContext, type Runbook, generateId, validateInput, AnalysisContextSchema, RunbookSectionSchema,} from "@reaatech/agent-runbook";import { ChatOpenAI } from "@langchain/openai";import { HumanMessage, SystemMessage } from "@langchain/core/messages";interface BreakerMetrics { name: string; failureCount: number; lastFailures?: unknown[];}export class RunbookGenerator { private static _instance: RunbookGenerator;
Expected output: When a circuit breaker trips, generateFromCircuitBreaker() builds an AnalysisContext, validates it, sends a prompt to the LLM, and parses the response into a Runbook object. Runbooks are stored in-memory and retrieved by ID via getRunbook().
Step 11: Create the Temporal maintenance workflow
Create three files under src/workflows/. Start with src/workflows/reliability.ts — the workflow function that orchestrates the daily maintenance run:
import { Worker, NativeConnection } from "@temporalio/worker";import * as activities from "./activities.js";export async function startWorker(config: { host: string; taskQueue: string }): Promise<Worker> { const connection = await NativeConnection.connect({ address: config.host }); const worker = await Worker.create({ connection, namespace: "default", taskQueue: config.taskQueue, workflowsPath: new URL("./reliability.js", import.meta.url).pathname, activities, }); return worker;}
Expected output: The workflow runs in order: rotate secrets, check circuit breakers, generate runbooks for any open circuits. Each activity reports heartbeats via Context.current().heartbeat().
Step 12: Wire up the Next.js API routes
Create the health route at app/api/health/route.ts:
ts
import { NextRequest, NextResponse } from "next/server";import { ReliableChain } from "../../../src/services/reliable-chain.js";export function GET(_req: NextRequest) { try { const health = ReliableChain.instance.getHealth(); return NextResponse.json(health); } catch { return NextResponse.json( { error: "reliable chain not initialised" }, { status: 503 }, ); }}
Expected output: The chat route loads session history, appends the new message, invokes ReliableChain, stores the assistant reply, and returns usage info — all wrapped in idempotency and rate-limit checks. The health endpoint returns circuit state and uptime. The rotate endpoint requires Authorization: Bearer <token>.
Step 13: Replace the placeholder entry point
Update src/index.ts to re-export the full public API surface:
ts
export { ReliableChain } from "./services/reliable-chain.js";export { withIdempotency, createIdempotencyMiddleware } from "./middleware/idempotency.js";export { checkRateLimit, createRateLimiter } from "./middleware/rate-limit.js";export { createSession, getSession, addMessage, getMessages, deleteSession } from "./services/session-store.js";export { RotationService } from "./services/rotation-service.js";export { RunbookGenerator } from "./services/runbook-generator.js";export { RecipeError, CircuitBreakerTrippedError, IdempotencyConflictError, RateLimitExceededError, RotationInProgressError, ProviderUnavailableError } from "./lib/errors.js";export type { LLMProvider, HealthStatus, ChatRequest, ChatResponse, RotationStatus } from "./lib/types.js";
Expected output: Every @reaatech/* package can be traced to at least one import under src/. Run grep -r "@reaatech/" src/ to verify.
Step 14: Run the tests
Create the test setup at tests/setup.ts with MSW handlers for mocked LLM endpoints, Upstash Redis, and DynamoDB:
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: All tests pass (numFailedTests: 0). Coverage thresholds are at least 90% on src/**/*.ts and app/**/route.ts. The test suite covers the ReliableChain (primary invoke, fallback on circuit open, both providers unavailable, getHealth states), the config loader (happy path, missing keys, numeric coercion), and the health API route (200 response, uptimeMs value).
Next steps
Replace MockSecretProvider with a real AWS Secrets Manager or Vault provider adapter from @reaatech/secret-rotation-provider-aws for production key rotation
Add a Temporal cron schedule to run dailyReliabilityMaintenance() every 24 hours instead of triggering it manually via the /api/rotate endpoint
Extend the runbook generator to pull real error logs from your observability platform and include actual stack traces in the generated runbooks
Add a status dashboard page at /dashboard that shows live circuit breaker state, active provider, and recent runbooks
Deploy the DynamoDB tables with aws dynamodb create-table using the composite key schemas (PK: IDEMPOTENCY#{key}, SK: META for idempotency; session composite keys for the session store)
if (config.langfuse.publicKey && config.langfuse.secretKey) {
const mod = await import("langfuse-langchain");
const CallbackHandler = ((mod as Record<string, unknown>).default ?? (mod as Record<string, unknown>).CallbackHandler) as new (...args: unknown[]) => Record<string, unknown>;
if (!validated.success) return Promise.reject(new Error("Invalid analysis context"));
const runbookId = generateId("rb");
try {
const systemPrompt = new SystemMessage(
"You are a runbook generator. Given a circuit breaker failure analysis, produce a structured runbook with sections: root cause analysis, failure mode description, detection method, mitigation steps, rollback procedure.",
);
const userPrompt = new HumanMessage(
`Circuit breaker "${metrics.name}" opened after ${String(metrics.failureCount)} consecutive failures. Generate a structured runbook.`,