AWS Bedrock MCP Server for Small Business DevOps Runbooks
An MCP server that exposes AWS Bedrock‑powered runbook automation to any AI assistant, giving SMBs self‑healing infrastructure without a dedicated DevOps team.
Small businesses can't afford 24/7 DevOps expertise. Their AI assistants have no secure, structured way to diagnose incidents, trigger rollbacks, or query service health — every minor outage becomes a panic call to an expensive contractor.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial you’ll build an MCP (Model Context Protocol) server that exposes AWS Bedrock–powered DevOps runbook tools to any AI assistant. By the end you’ll have a working Express server that registers four typed JSON-RPC tools — health checks, incident triage, rollback procedures, and service dependency maps — all backed by REAA agent-runbook packages and AWS Bedrock for generative remediation suggestions. You can connect this server to Claude Desktop, ChatGPT, or any MCP-compatible client and give a small business self‑healing infrastructure without a dedicated DevOps team.
Prerequisites
Node.js >= 22 (the package.jsonengines field requires it)
pnpm 10.x (the packageManager field pins pnpm@10.0.0)
AWS credentials with access to Amazon Bedrock — you’ll need an AWS access key ID, secret access key, and a region where Bedrock is available
A Bedrock model enabled in your AWS account (the default is anthropic.claude-v2; you can change it with the BEDROCK_MODEL_ID env var)
Familiarity with TypeScript, Express, and running terminal commands
Step 1: Scaffold the project
Create a new directory and add the two files that define the project: package.json and tsconfig.json. The package manifest pins every dependency at exact versions, declares the project as ESM ("type": "module"), and defines five scripts: typecheck, lint, test, build, and dev.
Run pnpm install from the project root to pull down every dependency and dev dependency at the pinned versions.
terminal
pnpm install
Expected output: pnpm resolves the lockfile and installs all packages into node_modules/. You should see a message like Done in Xs with no errors.
Step 3: Set environment variables
The server reads six environment variables. Four are required (the three AWS credentials plus the Bedrock model ID) and two have defaults (PORT defaults to 3000, LOG_LEVEL defaults to info). Create a .env file with your own values:
Replace the three <your-…> placeholders with your real AWS credentials. The LOG_LEVEL can be debug, info, warn, or error. The logger automatically redacts AWS_SECRET_ACCESS_KEY from its output, so you can safely run with LOG_LEVEL=debug during development.
Step 4: Write the utility modules
Two small utility files live under src/utils/: a Pino logger and a set of custom error classes for MCP protocol errors.
Create src/utils/logger.ts:
ts
import { pino } from "pino";import type { Logger, LoggerOptions } from "pino";const options: LoggerOptions = { level: process.env.LOG_LEVEL ?? "info", redact: ["AWS_SECRET_ACCESS_KEY"],};export const logger = pino(options) as Logger;
The redact option ensures the secret key never appears in logs.
BedrockAPIError always uses JSON-RPC error code -32603 (internal error). ToolExecutionError lets callers pass an arbitrary code — the tool handler wrapper later uses -32602 for Zod validation failures. The safeExecute helper wraps any async function with a catch that logs the error and rethrows through a factory, keeping error handling consistent across the codebase.
Step 5: Write the config module
The config module defines all shared TypeScript types for JSON-RPC messages, tool handlers, and Bedrock analysis results, then uses Zod to validate and freeze the environment configuration.
Calling loadConfig() at startup validates every required env var and returns a frozen object. If any required variable is missing or invalid (e.g. LOG_LEVEL=verbose), the function throws immediately with a message listing every failed field.
Step 6: Write the glue layer — analysis context and Bedrock client
The glue layer in src/glue/ has three files. Start with the analysis context factory, which builds the AnalysisContext object that every REAA package expects.
Next, the Bedrock client wraps @aws-sdk/client-bedrock-runtime with a retry loop that handles throttling, parses Claude-style JSON responses, and logs latency per invocation.
Create src/glue/bedrock-client.ts:
ts
import { BedrockRuntimeClient, InvokeModelCommand,} from "@aws-sdk/client-bedrock-runtime";import { logger } from "../utils/logger.js";import { BedrockAPIError } from "../utils/errors.js";export function createBedrockClient(): BedrockRuntimeClient { const region = process.env.AWS_REGION ?? "us-east-1"; return new BedrockRuntimeClient({ region });}export async function invokeModel( prompt: string, modelId?: string,): Promise<string> { const bedrockClient = createBedrockClient(); const resolvedModelId = modelId ?? process.env.BEDROCK_MODEL_ID ?? "anthropic.claude-v2"; const body = JSON.stringify({ prompt, max_tokens_to_sample: 2048, temperature: 0.3, }); const command = new InvokeModelCommand({ modelId: resolvedModelId, contentType: "application/json", accept: "application/json", body: new TextEncoder().encode(body), }); const startTime = Date.now(); const delays = [1000, 2000, 4000]; for (let attempt = 0; attempt <= delays.length; attempt++) { try { const response = await bedrockClient.send(command); const latencyMs = Date.now() - startTime; logger.debug( { modelId: resolvedModelId, promptLength: prompt.length, latencyMs, }, "Bedrock model invocation completed", ); const decoder = new TextDecoder(); const responseText = decoder.decode(response.body); try { const parsed = JSON.parse(responseText) as Record<string, unknown>; const completion = parsed.completion; if (typeof completion === "string") { return completion; } return responseText; } catch { if (responseText.trim().length > 0) { return responseText; } throw new BedrockAPIError( "unparseable model response", ); } } catch (err) { const error = err as Error & { name?: string; $metadata?: { requestId?: string }; }; if (error.name === "ThrottlingException" && attempt < delays.length) { const delay = delays[attempt]; logger.debug( { attempt: attempt + 1, delay }, "Bedrock throttled, retrying", ); await new Promise((resolve) => setTimeout(resolve, delay)); continue; } if (error.$metadata) { const requestId = error.$metadata.requestId ?? "unknown"; throw new BedrockAPIError( `Bedrock API error: ${error.message ?? "unknown"} (requestId: ${requestId})`, ); } throw new BedrockAPIError( error.message ?? "unknown Bedrock error", ); } } throw new BedrockAPIError("maximum retries exceeded");}export async function analyzeWithBedrock(prompt: string): Promise<string> { const systemContext = "You are a DevOps runbook automation assistant. " + "Analyze the following infrastructure context and provide remediation steps.\n\n"; return invokeModel(systemContext + prompt);}
The retry strategy uses exponential-ish backoff: 1 s, 2 s, 4 s. If all retries are exhausted or the error is not a ThrottlingException, a BedrockAPIError is thrown with the AWS request ID for traceability.
Step 7: Write the tool handlers
The tool-handlers.ts file is the heart of the server. It imports from four REAA packages, defines Zod input schemas for each tool, and wires them into four exported handlers. A wrapHandler helper catches any thrown error and converts it to a JSON-RPC error response — mapping Zod validation failures to code -32602 (invalid params) and everything else to -32603 (internal error). The file also exports toolMap (a lookup table keyed by tool name) and toolDefinitions (the JSON Schema definitions the MCP client sees when it calls tools/list).
Create src/glue/tool-handlers.ts:
ts
import { z } from "zod";import { identifyHealthChecks as reaIdentifyHealthChecks, generateHealthChecks as reaGenerateHealthChecks,} from "@reaatech/agent-runbook-health-checks";import { generateIncidentWorkflows as reaGenerateIncidentWorkflows, generateEscalationPolicy as reaGenerateEscalationPolicy,} from "@reaatech/agent-runbook-incident";import { analyzeDeployment as reaAnalyzeDeployment, generateRollbackProcedures as reaGenerateRollbackProcedures,} from "@reaatech/agent-runbook-rollback";import { analyzeDependencies as reaAnalyzeDependencies, generateServiceMap as reaGenerateServiceMap,
Each handler follows the same pattern: parse input with Zod, create an analysis context, delegate to the REAA package, optionally call Bedrock, and return a typed JSON-RPC response.
Step 8: Write the service map enricher
The service-map-enricher.ts module provides a helper that enriches raw Bedrock results with dependency graph data from @reaatech/agent-runbook-service-map. Tool handlers call it when they need to correlate AI-generated remediation steps with a dependency map.
The server sets up an Express app with four routes: an SSE endpoint at GET /mcp/sse, a message endpoint at POST /mcp/messages, and a health check at GET /health. It uses @reaatech/agent-mesh-mcp-server middleware and handlers to bridge the MCP transport layer. The start() function listens on the configured port and registers graceful shutdown handlers for SIGTERM and SIGINT.
Create src/index.ts:
ts
import express from "express";import { mcpMiddleware, sseHandler, messageHandler } from "@reaatech/agent-mesh-mcp-server";import { logger } from "./utils/logger.js";export function createApp(): express.Application { const app = express(); app.use(express.json()); app.use(mcpMiddleware); app.get("/mcp/sse", sseHandler); app.post("/mcp/messages", messageHandler); app.get("/health", (_req, res) => { res.json({ status: "ok", uptime: process.uptime() }); }); return app;}export function start(): void { const app = createApp(); const port = Number(process.env.PORT) || 3000; const server = app.listen(port, () => { logger.info({ port }, "MCP server listening"); }); function gracefulShutdown(signal: string): void { logger.info({ signal }, "Received shutdown signal"); server.close(() => { logger.info("Server closed"); process.exit(0); }); } process.on("SIGTERM", () => gracefulShutdown("SIGTERM")); process.on("SIGINT", () => gracefulShutdown("SIGINT"));}const app = createApp();export { app };
The file both exports the app for tests and provides start() for the pnpm dev script. The dev script runs tsx watch src/index.ts, which starts the server and auto-reloads on file changes.
Step 10: Write tests and run them
The project uses Vitest with v8 coverage at 90% thresholds. Tests live under tests/ with unit tests in tests/unit/ and integration tests in tests/integration/. A shared setup file mocks all external packages (AWS SDK, REAA packages, and the logger) so tests never make real network calls. A fixtures file provides reusable test data.
Create tests/helpers/fixtures.ts — this file defines the valid input shapes and expected mock return values for every tool (the full file is available in the downloadable artifact; here are the key fixtures):
Expected output: Vitest discovers tests in tests/unit/ and tests/integration/, all mocks are loaded from tests/setup.ts, and every test passes. The coverage report prints a table with four metrics (lines, branches, functions, statements) all at or above 90%. A vitest-report.json file is written to the project root, and a coverage/ directory is created with JSON summary files.
The response includes a healthy status and the count of generated and existing checks. Because this handler calls analyzeWithBedrock, you’ll also see debug-level logs from the Bedrock client if LOG_LEVEL=debug is set. If Bedrock is unavailable, the health check still succeeds — the Bedrock call is wrapped in a .catch() that swallows the error, so tool execution is never blocked by the LLM.
Next steps
Connect to Claude Desktop or ChatGPT — point any MCP-compatible client at http://your-host:3000/mcp/sse and your four runbook tools will appear in the tool palette. Try saying “check the health of api-gateway” or “what are the dependencies of my auth service?”
Customise the analysis context — edit src/glue/analysis-context.ts to match your real services, deployment platforms, and monitoring stack. The factory is a single function; swap the hard-coded values for environment-aware lookups.
Add a workflow orchestration tool — the toolMap and toolDefinitions arrays are designed for extension. Import another REAA package, write a handler following the same Zod‑parse → context → delegate → respond pattern, and register it in the map.
exportToMermaid as reaExportToMermaid,
exportToJson as reaExportToJson,
} from "@reaatech/agent-runbook-service-map";
import { createAnalysisContext } from "./analysis-context.js";
import { logger } from "../utils/logger.js";
import { analyzeWithBedrock } from "./bedrock-client.js";
import type { McpResponse, ToolHandler } from "../config.js";
// ---- Zod Schemas ----
const HealthCheckInput = z.object({
serviceName: z.string().min(1),
url: z.string().url().optional(),
});
const IncidentTriageInput = z.object({
description: z.string().min(1, "description is required"),
serviceName: z.string().min(1, "serviceName is required"),
});
const RollbackInput = z.object({
serviceName: z.string().min(1, "serviceName is required"),