Small DevOps teams using PagerDuty often have few written runbooks because writing and maintaining them is time‑consuming. When a critical incident hits, responders waste precious minutes guessing recovery steps instead of following a documented plan.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small DevOps teams using PagerDuty rarely have written runbooks because creating and maintaining them takes too much time. When an incident hits, responders waste precious minutes guessing recovery steps instead of following a documented plan. This recipe builds a CLI-powered automation that pulls PagerDuty incident and service metadata, uses Google Gemini to generate narrative content, and writes complete markdown runbooks — all with duplicate detection, Langfuse tracing, and optional Inngest scheduling.
You’ll wire up six REAA packages (@reaatech/agent-runbook, @reaatech/agent-runbook-analyzer, @reaatech/agent-runbook-alerts, @reaatech/agent-runbook-health-checks, @reaatech/agents-markdown, @reaatech/confidence-router-core), a PagerDuty REST client, a Google Gemini LLM service with Langfuse observability, and an Inngest integration for automated generation.
Prerequisites
Node.js >= 22 — the project uses pnpm and ESM throughout
Expected output: pnpm resolves and installs all 14 dependencies plus devDependencies. No peer-dependency warnings.
Step 2: Set up environment variables
Create the .env.example file with placeholder values for every variable the project needs:
env
# Env vars used by google-gemini-runbook-automation-for-pagerduty-smb-incidents.# Keep placeholders only — never commit real values.NODE_ENV=developmentGEMINI_API_KEY=<your-google-gemini-api-key>PAGERDUTY_API_KEY=<your-pagerduty-api-token>PAGERDUTY_USER_EMAIL=<your-pagerduty-user-email>INNGEST_EVENT_KEY=<your-inngest-event-key>INNGEST_SIGNING_KEY=<your-inngest-signing-key>LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.comRUNBOOKS_OUTPUT_DIR=./src/runbooksCONFIDENCE_ROUTE_THRESHOLD=0.8CONFIDENCE_FALLBACK_THRESHOLD=0.3
Copy it to .env and fill in your real keys:
terminal
cp .env.example .env
Expected output:.env exists with your API keys populated. The project reads these at runtime — none are committed to git.
Step 3: Create the configuration and types module
Create src/config.ts. This file defines the core TypeScript interfaces for PagerDuty API responses and exposes a loadConfig() function that reads environment variables with runtime validation.
Expected output: The loadConfig() function reads process.env, validates the two required keys, coerces thresholds from strings to numbers, and falls back to sensible defaults. If GEMINI_API_KEY or PAGERDUTY_API_KEY is missing, it throws immediately — no silent failures during runbook generation.
Step 4: Build the PagerDuty REST client
Create src/lib/pagerduty-client.ts. This class wraps the @pagerduty/pdjs API library with retry logic from @reaatech/agent-runbook. Every call to the PagerDuty API goes through retry() to handle rate-limit (429) errors automatically.
ts
import { api } from "@pagerduty/pdjs";import { type EscalationPolicy, NotFoundError, retry,} from "@reaatech/agent-runbook";import type { PagerDutyIncident, PagerDutyService } from "../config.js";export class PagerDutyClient { private pd: ReturnType<typeof api>; constructor(token: string) { this.pd = api({ token }); } async getIncidents( params?: { limit?: number; since?: string }, ): Promise<PagerDutyIncident[]> { let endpoint = "/incidents"; const qs: string[] = []; if (params?.limit) qs.push(`limit=${String(params.limit)}`); if (params?.since) qs.push(`since=${params.since}`); if (qs.length > 0) endpoint += `?${qs.join("&")}`; const result = (await retry( () => this.pd.get(endpoint), 3, 1000, )) as { resource: PagerDutyIncident[] | undefined }; return result.resource ?? []; } async getServices(): Promise<PagerDutyService[]> { const result = (await retry( () => this.pd.get("/services"), 3, 1000, )) as { resource: PagerDutyService[] | undefined }; return result.resource ?? []; } async getEscalationPolicy(id: string): Promise<EscalationPolicy> { const result = (await retry( () => this.pd.get(`/escalation_policies/${id}`), 3, 1000, )) as { resource: EscalationPolicy | undefined }; if (!result.resource) { throw new NotFoundError(`Escalation policy ${id} not found`); } return result.resource; } async getServiceIncidents( serviceId: string, ): Promise<PagerDutyIncident[]> { const result = (await retry( () => this.pd.get(`/incidents?service_ids[]=${serviceId}`), 3, 1000, )) as { resource: PagerDutyIncident[] | undefined; }; return result.resource ?? []; }}
Expected output:PagerDutyClient exposes four methods — getIncidents, getServices, getEscalationPolicy, and getServiceIncidents. Each uses retry(fn, 3, 1000) so transient 429 rate-limit errors are retried up to 3 times with 1-second backoff. A missing escalation policy throws NotFoundError instead of returning undefined.
Step 5: Build the Gemini LLM service with Langfuse tracing
Create src/lib/llm.ts. This service wraps the Google Gemini SDK with Langfuse observability. It offers three methods: a simple section generator, a runbook summarizer, and a fully traced generation that logs prompt/completion token counts.
Expected output:LlmService creates a Gemini client using gemini-2.5-flash (fast and economical for structured generation). The generateContentWithTracing method creates a Langfuse trace and generation span before calling Gemini, then records token usage from response.usageMetadata and marks the generation as complete. On error, it records the failure and rethrows so callers can handle it.
Step 6: Build the runbook generator CLI
Create src/cli/runbook-gen.ts. This is the heart of the project — a RunbookGenerator class that orchestrates the full pipeline: fetching PagerDuty incidents, computing duplicate-detection fingerprints, scanning repository code, generating alerts and health checks via REAA packages, producing narrative content through Gemini, and writing markdown files.
ts
import "dotenv/config";import { type AnalysisContext, AnalysisContextSchema, type AlertDefinition, ensureDirectory, escapeMarkdown, generateId, type HealthCheck, type Runbook, type RunbookSection, writeFile,} from "@reaatech/agent-runbook";import { scanRepository, analyzeCode } from "@reaatech/agent-runbook-analyzer";import { generateAlerts } from "@reaatech/agent-runbook-alerts";import { generateHealthChecks } from "@reaatech/agent-runbook-health-checks";import { sanitizePath, normalizeLineEndings } from "@reaatech/agents-markdown";import { DecisionEngine,
Expected output: The CLI can be invoked as pnpm tsx src/cli/runbook-gen.ts --service-id S1. It produces a markdown file at ./src/runbooks/rb-<id>.md with four sections: Incident Overview, Alerts, Health Checks, and Recovery Procedures. The --force flag skips duplicate detection; --repo-path adds repository analysis.
Step 7: Create the Inngest integration for scheduled and webhook-driven generation
Create src/inngest/trigger-runbook-gen.ts. This module exports two Inngest functions and two orchestrator helpers. The scheduledRunbookGen function runs every 6 hours (cron 0 */6 * * *), fetching all PagerDuty services and generating runbooks for each. The pagerdutyWebhook function listens for pagerduty/incident.triggered events and generates a runbook for the affected service.
ts
import { Inngest } from "inngest";import { RunbookGenerator } from "../cli/runbook-gen.js";import { loadConfig } from "../config.js";import { LlmService } from "../lib/llm.js";import { PagerDutyClient } from "../lib/pagerduty-client.js";import { DecisionEngine, mergeConfig } from "@reaatech/confidence-router-core";export const inngest = new Inngest({ id: "runbook-gen", name: "Runbook Generator",});function buildGenerator(): RunbookGenerator { const config = loadConfig(); const pd = new PagerDutyClient(config.pagerdutyApiKey); const llm = new LlmService(config.geminiApiKey, { publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "", secretKey: process.env.LANGFUSE_SECRET_KEY ?? "", }); const engine = new DecisionEngine( mergeConfig({ routeThreshold: config.routeThreshold, fallbackThreshold: config.fallbackThreshold, }), ); return new RunbookGenerator(pd, llm, engine, config.outputDir);}export async function runScheduledGeneration(): Promise<void> { const config = loadConfig(); const pd = new PagerDutyClient(config.pagerdutyApiKey); const services = await pd.getServices(); const gen = buildGenerator(); for (const service of services) { await gen.generateRunbook(service.id, undefined, true); }}export async function runWebhookGeneration( data: Record<string, string>,): Promise<void> { const serviceId = data.serviceId; if (!serviceId) return; const gen = buildGenerator(); await gen.generateRunbook(serviceId, undefined, true);}export const scheduledRunbookGen = inngest.createFunction( { id: "scheduled-runbook-gen", triggers: [{ cron: "0 */6 * * *" }], }, async () => { await runScheduledGeneration(); },);export const pagerdutyWebhook = inngest.createFunction( { id: "pagerduty-webhook", triggers: [{ event: "pagerduty/incident.triggered" }], }, async ({ event }) => { const data = event.data as Record<string, string> | undefined; if (data?.serviceId) { await runWebhookGeneration(data); } },);
Expected output: Two Inngest functions are exported as named createFunction calls. The scheduledRunbookGen function has cron 0 */6 * * * and fetches all services before generating. The pagerdutyWebhook expects a JSON payload with { "data": { "serviceId": "XYZ" } }. Both use buildGenerator() which wires PagerDuty, Gemini, and the DecisionEngine together.
Step 8: Create Next.js route handlers
Create the Inngest serve handler at app/api/inngest/route.ts to expose the Inngest functions as Next.js App Router endpoints:
ts
import { serve } from "inngest/next";import { inngest, scheduledRunbookGen, pagerdutyWebhook,} from "../../../src/inngest/trigger-runbook-gen.js";const handler = serve({ client: inngest, functions: [scheduledRunbookGen, pagerdutyWebhook],});export const GET = handler.GET;export const POST = handler.POST;export const PUT = handler.PUT;
Create the runbooks listing endpoint at app/api/runbooks/route.ts:
Create the single-runbook endpoint at app/api/runbooks/[slug]/route.ts:
ts
import { NextResponse } from "next/server";import type { NextRequest } from "next/server";import { readFile } from "@reaatech/agent-runbook";const outputDir = process.env.RUNBOOKS_OUTPUT_DIR ?? "./src/runbooks";export async function GET( _req: NextRequest, { params }: { params: Promise<{ slug: string }> },) { const { slug } = await params; const content = readFile(`${outputDir}/${slug}.md`); if (!content) { return NextResponse.json( { error: "Runbook not found" }, { status: 404 }, ); } return NextResponse.json({ content });}
Create the health check endpoint at app/api/health/route.ts:
ts
import { NextResponse } from "next/server";export function GET() { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), });}
Update app/layout.tsx with the project metadata:
tsx
import type { Metadata } from "next";export const metadata: Metadata = { title: "Google Gemini Runbook Automation", description: "Generate up-to-date incident runbooks for every PagerDuty-monitored service so your small team always knows how to respond during an outage.",};export default function RootLayout({ children,}: Readonly<{ children: React.ReactNode;}>) { return ( <html lang="en"> <body>{children}</body> </html> );}
Update app/page.tsx with a status homepage:
tsx
import Link from "next/link";export default function Home() { return ( <main> <h1>Google Gemini Runbook Automation</h1> <p> Generate up-to-date incident runbooks for every PagerDuty-monitored service so your small team always knows how to respond during an outage. </p> <ul> <li> <Link href="/api/runbooks">View Generated Runbooks</Link> </li> <li> <Link href="/api/health">Health Check</Link> </li> </ul> <h2>CLI Usage</h2> <pre> pnpm tsx src/cli/runbook-gen.ts --service-id <id> [--repo-path <path>] [--force] [--output-dir <dir>] </pre> </main> );}
Expected output: Four route handlers are registered in the App Router. GET /api/health returns {"status":"ok","timestamp":"..."}. GET /api/runbooks lists all generated .md files. GET /api/runbooks/<slug> returns a single runbook or 404. GET|POST|PUT /api/inngest serves the Inngest dev server.
Step 9: Create the public entry point
Create src/index.ts with re-exports for all public classes and types so consumers can import from a single module:
ts
export const VERSION = "0.1.0";export { PagerDutyClient } from "./lib/pagerduty-client.js";export { LlmService } from "./lib/llm.js";export { RunbookGenerator, main } from "./cli/runbook-gen.js";export { loadConfig } from "./config.js";export type { PagerDutyIncident, PagerDutyService, RunbookGenConfig } from "./config.js";
Step 10: Run the tests
The project ships with tests covering every module. Run them with:
terminal
pnpm test
Expected output: vitest runs the full suite. With the coverage threshold set to 90% across lines, branches, functions, and statements, the output should show something like:
Expected output: The CLI prints Runbook written to ./src/runbooks/rb-<id>.md. Open that file to see a complete markdown runbook with incident overview, alerts, health checks, and AI-generated recovery procedures.
Next steps
Wire up Inngest in production — deploy the app/api/inngest/route.ts handler to Vercel or your own Node server, configure the Inngest dev server to forward events, and watch runbooks generate automatically every 6 hours.
Tune duplicate detection — adjust CONFIDENCE_ROUTE_THRESHOLD and CONFIDENCE_FALLBACK_THRESHOLD in .env based on your incident patterns. Higher thresholds produce fewer but more confident new runbooks.
Add repository scanning — pass --repo-path with a git checkout path to include code-level analysis in the runbook output. The @reaatech/agent-runbook-analyzer package inspects service types, frameworks, and external dependencies.
Extend the runbook format — add custom sections (database schemas, environment-specific notes, compliance checklists) by composing additional RunbookSection objects in the generator.
Integrate with incident response tools — modify the Inngest webhook to post generated runbooks into Slack channels or link them in PagerDuty incident notes automatically.
mergeConfig,
type ClassificationResult,
type RoutingDecision,
} from "@reaatech/confidence-router-core";
import { loadConfig } from "../config.js";
import { LlmService } from "../lib/llm.js";
import { PagerDutyClient } from "../lib/pagerduty-client.js";
import type { PagerDutyIncident } from "../config.js";