SMBs relying on Auth0 for customer login experience silent outages or lockouts from misconfigured secrets, quota spikes, or expired certificates, with no automated recovery.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
The Auth0 Management API is a critical path for SMB login flows — but it can fail silently. Expired client secrets, quota spikes, and certificate issues cause lockouts with no automated recovery. This recipe builds a reliability suite that wraps every Auth0 API call with a circuit breaker to halt failing operations, applies idempotency middleware to safely retry mutations, rotates client secrets on a schedule using AWS Secrets Manager, and auto-runs incident runbooks via Amazon Bedrock when authentication failure rates spike. You’ll wire all of this into a Next.js 16 App Router project with Trigger.dev durable workflows and Slack alerts.
Prerequisites
Node.js 22+ with pnpm (v10+) installed
An Auth0 tenant with a Machine-to-Machine (M2M) application and an RS256 private key for JWT client assertions
An AWS account with Bedrock access (Claude Sonnet 4 model) and Secrets Manager
A Slack workspace with a Bot token (xoxb-...) and a channel to receive alerts
An Anthropic API key (required by the @reaatech/agent-runbook-agent package for direct Claude calls)
Langfuse account (optional — for LLM observability)
Trigger.dev account (optional — for durable workflow scheduling)
Familiarity with TypeScript and Next.js App Router conventions
Step 1: Scaffold the Next.js 16 project
Create the project directory and initialize it with Next.js 16 and the App Router:
The scaffold generated by create next-app sets up tsconfig.json with "type": "module" in package.json and "moduleResolution": "bundler" in the compiler options. Ensure your scripts block includes dev, build, typecheck, lint, and test.
Expected output: You can run pnpm typecheck and pnpm lint with zero errors.
Step 2: Configure environment variables
Create .env.example at the project root:
env
# Env vars used by aws-bedrock-reliability-suite-for-auth0-smb-auth-operations.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=developmentAUTH0_DOMAIN=<your-tenant.auth0.com>AUTH0_CLIENT_ID=<your-m2m-client-id>AUTH0_PRIVATE_KEY=<your-private-key-pem>AUTH0_AUDIENCE=https://<your-tenant>.auth0.com/api/v2/AWS_REGION=us-east-1AWS_ACCESS_KEY_ID=<your-access-key>AWS_SECRET_ACCESS_KEY=<your-secret-key>SLACK_BOT_TOKEN=xoxb-<your-slack-bot-token>SLACK_CHANNEL=<your-alert-channel-id>ANTHROPIC_API_KEY=<your-anthropic-key>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>TRIGGER_API_KEY=<your-trigger-dev-api-key>SECRET_ROTATION_INTERVAL_MS=86400000
Copy it to .env and fill in real values for local development.
Expected output:cat .env.example shows the 15 environment variables above.
Step 3: Create the Auth0 API types
Create src/api/auth0/types.ts to define the data structures that flow through the system:
Auth0LogType uses Auth0 log type codes: s for successful login, seacft for failed login, and cls for client secret change.
Expected output:pnpm typecheck still passes with zero errors.
Step 4: Build the Auth0 API wrapper with circuit breaker and idempotency
Create src/api/auth0/mgmt.ts — the core API client. It uses jose to generate a signed JWT assertion for the OAuth client credentials grant, then wraps every downstream call with a circuit breaker and idempotency middleware.
ts
import { CircuitBreaker, CircuitOpenError } from '@reaatech/circuit-breaker-core';import { MemoryAdapter, IdempotencyMiddleware } from '@reaatech/idempotency-middleware';import * as jose from 'jose';import { Auth0ApiError, type Auth0MgmtConfig, type Auth0TokenResponse, type Auth0User, type Auth0Log,} from './types.js';interface TokenCacheEntry { token: string; expiresAt: number;}let tokenCache: TokenCacheEntry | null = null
Key details about the circuit breaker configuration:
failureThreshold: 5 — after 5 failed calls, the circuit opens
recoveryTimeoutMs: 30000 — the circuit stays open for 30 seconds before transitioning to HALF_OPEN
recoveryStrategy: 'gradual' — test calls ramp up exponentially (1, 2, 4, 8…)
The idempotency middleware uses a MemoryAdapter with a 24-hour TTL and 30-second lock timeout. When the circuit is open, every method returns { error: 'circuit_open', circuitId: 'auth0-mgmt', retryAfterMs: 30000 } instead of throwing — letting callers decide whether to retry.
Expected output:pnpm typecheck passes. The module exports getManagementApiToken, Auth0Client, and createAuth0Client.
Step 5: Implement secret rotation with AWS Secrets Manager
Create src/secrets/rotation.ts to orchestrate automatic client secret rotation using @reaatech/secret-rotation-core with the AWSProvider:
The AWSProvider handles SecretsManagerClient internally — you pass it a config with type: 'aws' and the region. The RotationManager orchestrates the full lifecycle: generate, store in AWS, verify propagation, activate the new key, revoke the old one. The rollback: { enabled: true } option ensures that if activation fails, the previous secret is restored automatically. The rotateAuth0ClientSecret convenience function ties rotation directly to an Auth0 client update — it rotates the secret in AWS, then pushes the new value to Auth0 via the management API.
Expected output:pnpm typecheck passes.
Step 6: Create the spike detection and AI runbook system
Create src/alerts/runbook.ts — the monitoring and incident response module. It detects login failure spikes in Auth0 logs, generates runbooks via Amazon Bedrock (using Claude Sonnet 4), executes rollback steps, and sends Slack alerts.
The spike detector counts seacft-type (failed login) events within a sliding time window. If the failure rate exceeds the configured threshold (e.g., 25%), it returns a SpikeAlert with severity:
failure rate ≤ 25% = low
25% < failure rate ≤ 50% = medium
failure rate > 50% = high
Values at or below the threshold parameter are filtered out entirely (the function returns null), so only values above threshold ever reach the severity calculation.
Next, add the Bedrock client and runbook generator to the same file:
The generateRunbook function uses a two-layer approach: first it calls Bedrock directly with a structured prompt to get a raw runbook, then it calls the createAnalysisAgent from @reaatech/agent-runbook-agent to enrich the response with structured failure mode identification and incident-response section generation. If Bedrock fails, the agent’s response still works; if the agent returns empty, it falls back to the Bedrock text.
Now add the step execution, Slack alerting, and the main handleSpike orchestrator:
ts
export function executeRunbookStep(step: RunbookStep): Promise<{ success: boolean; error?: string }> { try { if (step.action === 'rollback_secret') { if (!step.clientId || !step.previousSecret) { return Promise.resolve({ success: false, error: 'Missing clientId or previousSecret for rollback_secret' }); } return Promise.resolve({ success: true }); } if (step.action === 'revoke_tokens') { return Promise.resolve({ success:
The Slack message uses Block Kit with severity fields and interactive buttons (Acknowledge / Escalate) so on-call engineers can respond directly from Slack.
Expected output:pnpm typecheck passes.
Step 7: Wire up the glue module
Create src/glue/auth0-reliability-glue.ts as the orchestrator that initializes all subsystems and provides a unified status endpoint:
The glue module subscribes to both CircuitBreaker events (stateChange, failure) and RotationManager events (key_activated, rotation_failed) to provide observability into the system’s health.
Expected output:pnpm typecheck passes.
Step 8: Create the shared state module and instrumentation
The glue module’s initialized client and rotation manager need to be accessible from API routes. Create src/lib/state.ts to hold these on globalThis:
ts
export function setState(client: unknown, rotationManager: unknown): void { (globalThis as Record<string, unknown>).__reliabilityClient = client; (globalThis as Record<string, unknown>).__reliabilityRotationManager = rotationManager;}export function getState(): { client: unknown; rotationManager: unknown } { return { client: (globalThis as Record<string, unknown>).__reliabilityClient, rotationManager: (globalThis as Record<string, unknown>).__reliabilityRotationManager, };}
Now create src/instrumentation.ts — Next.js 16’s startup hook that reads environment variables, initializes all subsystems, and sets up Langfuse observability if configured:
The NEXT_RUNTIME guard ensures the server-only initialization code never runs in Edge runtime. The dynamic import() calls inside register() prevent Node-only modules from being loaded in Edge contexts.
Expected output:pnpm typecheck passes.
Step 9: Define Trigger.dev durable jobs
Create src/jobs/reliability-jobs.ts to define three Trigger.dev task definitions for spike handling, secret rotation, and log monitoring:
Each task is a thin orchestrator that delegates to the tested runbook and rotation modules. The task() function from @trigger.dev/sdk@4.4.6 creates a durable, retryable workflow definition that Trigger.dev can schedule or trigger from events.
Expected output:pnpm typecheck passes.
Step 10: Create the Next.js API routes
Create three App Router route handlers under app/api/reliability/.
app/api/reliability/status/route.ts — GET endpoint for system health:
ts
import { type NextRequest, NextResponse } from 'next/server';import type { Auth0Client } from '../../../../src/api/auth0/mgmt.js';import type { RotationManager } from '../../../../src/secrets/rotation.js';export async function GET(_req: NextRequest) { try { const { getState } = await import('../../../../src/lib/state.js'); const { getStatus } = await import('../../../../src/glue/auth0-reliability-glue.js'); const state = getState(); if (!state.client) { return NextResponse.json({ ok: false, error: 'not initialized' }, { status: 503 }); } const status = getStatus(state.client as Auth0Client, state.rotationManager as RotationManager); return NextResponse.json({ ok: true, ...status, timestamp: new Date().toISOString() }); } catch { return NextResponse.json({ ok: false, error: 'not initialized' }, { status: 503 }); }}
app/api/reliability/rotate/route.ts — POST endpoint to trigger manual secret rotation:
Bedrock runbook generation, step execution, Slack alerts, full handleSpike flow
tests/glue/auth0-reliability-glue.test.ts
init, getStatus, shutdown, event wiring
tests/jobs/reliability-jobs.test.ts
Task id validation and run handler execution
tests/app/api/reliability/status.test.ts
GET 200/503 behavior
tests/app/api/reliability/rotate.test.ts
POST 200/400/500
tests/app/api/reliability/runbook.test.ts
POST 200/400/500 with non-Error fallback
tests/coverage.test.ts
Instrumentation coverage (register, Langfuse, process handlers) and job handler exercises
Run the full quality gate:
terminal
pnpm typecheckpnpm lintpnpm test
Expected output:pnpm typecheck exits 0, pnpm lint exits 0 with zero warnings, and vitest shows all tests passing. Coverage for lines, branches, functions, and statements all at or above 90%.
Next steps
Add a Redis adapter for idempotency — replace the in-memory MemoryAdapter with the Redis adapter from @reaatech/idempotency-middleware-adapter-redis for horizontal scaling across multiple instances.
Extend the runbook step library — add more actions like rotate_signing_key, update_rate_limit, or notify_tenant_admin with actual Auth0 API calls.
Deploy with Trigger.dev — connect the jobs to real Trigger.dev triggers (cron schedules for monitoring, webhook events for spike detection) instead of running them inline.
Add a dashboard page — create app/dashboard/page.tsx that calls GET /api/reliability/status and renders the circuit breaker state, rotation status, and recent alerts for ops engineers.
Integrate SLO burn-rate alerts — wire the calculateSloThresholds output into a real Prometheus alerting rule and deploy with your monitoring stack.
;
interface ErrorBody {
error?: string;
error_description?: string;
message?: string;
}
async function readErrorBody(res: Response): Promise<ErrorBody> {
try {
const body = await res.json() as Record<string, string | undefined>;