AWS Bedrock AI Incident Runbook for DevOps-light SMBs

Automated incident response and rollback for AI agents, so lean teams can sleep without production nightmares.

aws-bedrock incident-response reliability runbook ai-agents rollback nextjs fastify

The problem

Small teams deploying AI agents (e.g., customer support bots, lead intake) face catastrophic outages like database deletions or cost spikes, but lack DevOps staff to craft and run reliable incident playbooks. Manual fixes waste hours and erode customer trust.

Built from

Intro

In this tutorial, you’ll build an AI-powered incident runbook system that monitors your services, detects failures, and generates plain-language triage reports using AWS Bedrock — all without a dedicated DevOps team. You’ll wire up a Fastify server that ingests webhook alerts, classifies them against known failure modes, calls Bedrock to produce summaries and remediation plans, triggers automatic rollbacks for critical incidents, and sends Slack notifications. On the front end, a Next.js dashboard gives you a real-time service map and alert view, while an admin panel lets you customize runbooks in natural language. By the end, you’ll have a working reliability system you can point at your own AI agents.

Prerequisites

Node.js >= 22 (the project’s package.json specifies "node": ">=22")
pnpm 10.x (the lockfile specifies "packageManager": "pnpm@10.21.0")
An AWS account with Bedrock access — you’ll need the model ID for the model you’ve enabled (the default is amazon.nova-micro-v1:0)
A Slack webhook URL (optional but recommended) for incident notifications
Familiarity with TypeScript, Next.js pages routing, and Fastify

Step 1: Scaffold the project

Create a new directory and scaffold the project configuration files. These define your build, linting, and test tooling.

Create package.json:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

47 tests·100.0% coverage·vitest passing

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 (the project’s package.json specifies "node": ">=22")
pnpm 10.x (the lockfile specifies "packageManager": "pnpm@10.21.0")
An AWS account with Bedrock access — you’ll need the model ID for the model you’ve enabled (the default is amazon.nova-micro-v1:0)
A Slack webhook URL (optional but recommended) for incident notifications
Familiarity with TypeScript, Next.js pages routing, and Fastify

Step 1: Scaffold the project

Create a new directory and scaffold the project configuration files. These define your build, linting, and test tooling.

Create package.json:

import Fastify, { type FastifyInstance } from "fastify"; import { initLogger, initTracing, initMetrics, info, shutdownTracing, shutdownMetrics, } from "@reaatech/agent-runbook-observability"; import { type AnalysisContext } from "@reaatech/agent-runbook"; import { generateAlerts } from "@reaatech/agent-runbook-alerts"; import incidentPlugin from "../api/incidents/route.js"; function buildAnalysisContext(): AnalysisContext { return { serviceDefinition: { name: "incident-runbook" }, repositoryAnalysis: { serviceType: "web-api", language: "typescript", framework: "fastify", structure: { mainDirectories: ["src"], fileCount: 10, depth: 3, hasTests: true, hasDockerfile: false, hasKubernetesManifests: false, hasTerraform: false, }, configFiles: ["package.json", "tsconfig.json"], entryPoints: [{ file: "src/server/index.ts", type: "http_server", port: 3001 }], externalServices: [], }, dependencyAnalysis: { directDeps: [], transitiveDeps: [], dependencyGraph: [], externalServices: [], }, deploymentPlatform: "kubernetes", monitoringPlatform: "prometheus", externalServices: [], }; } export async function buildServer(): Promise<FastifyInstance> { await initLogger({ level: "info", service: "incident-runbook" }); initTracing({ serviceName: "incident-runbook", enabled: true }); initMetrics({ serviceName: "incident-runbook", enabled: true }); const fastify: FastifyInstance = Fastify({ logger: false }); fastify.get("/health", () => { return { status: "ok", uptime: process.uptime(), timestamp: new Date().toISOString(), }; }); fastify.get("/alerts", () => { const context = buildAnalysisContext(); const alerts = generateAlerts(context, { sloTargets: { availability: 99.9, latencyP99: 500 }, }); return alerts; }); await fastify.register(incidentPlugin); return fastify; } export async function startServer(): Promise<void> { const fastify: FastifyInstance = await buildServer(); const port = Number(process.env.FASTIFY_PORT ?? "3001"); await fastify.listen({ port }); info("Server started", { port }); process.on("SIGTERM", () => { fastify.close().catch((err: unknown) => { info("Error during shutdown", { error: String(err) }); }); shutdownTracing().catch(() => {}); shutdownMetrics().catch(() => {}); }); } // Only start server when executed directly const isMainModule = process.argv[1]?.includes("server/index"); if (isMainModule) { startServer().catch((err: unknown) => { process.stderr.write(`Failed to start server: ${String(err)}\n`); process.exit(1); }); }

AWS Bedrock AI Incident Runbook for DevOps-light SMBs

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project

Step 2: Install dependencies

Step 3: Configure environment variables

Step 4: Create the config loader and observability init

Step 5: Create the Bedrock AI integration

Step 6: Create incident classification

Step 7: Create the Fastify server

Step 8: Create the incidents API plugin

Step 9: Create the Next.js admin UI and dashboard

Step 10: Create the Next.js API routes

Step 11: Create the entry point and test setup

Step 12: Build, test, and run

Next steps