Files · Anthropic AI Runbook Automation for SMB DevOps Incident Recovery
72 (1 binary, 599.7 kB total)attempt 1
README.md·4945 B·markdown
markdown
# Anthropic AI Runbook Automation for SMB DevOps Incident Recovery
> Automatically generate and test agent incident runbooks from your service repositories, then trigger them via durable workflows.
A reference solution from [reaatech.com](https://reaatech.com) demonstrating how to build production-grade AI systems for reliability-ops using the `@reaatech/*` agent-runbook ecosystem, Anthropic Claude, and Trigger.dev durable workflows.
## Architecture
```
POST /api/runbooks/sync ──► Trigger.dev workflow ──► agent-runbook-cli (generate)
│
agent-runbook-analyzer (scan)
│
agent-runbook-alerts (extract + generate)
│
agent-chaos-scenarios (validate)
│
Anthropic Claude (summary)
│
DynamoDB (persist)
│
Slack (notify)
```
A background freshness job (`src/services/freshness.ts` + `src/instrumentation.ts`) polls stale runbooks and re-triggers the workflow at a configurable interval.
## Setup
```bash
pnpm install
pnpm dev # Next.js dev server
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
```
### Environment variables
| Variable | Description |
|---|---|
| `ANTHROPIC_API_KEY` | Claude API key |
| `SLACK_TOKEN` | Slack bot token |
| `SLACK_CHANNEL` | Slack channel ID for notifications |
| `TRIGGER_API_KEY` | Trigger.dev API key |
| `TRIGGER_API_ENDPOINT` | Trigger.dev API endpoint |
| `AWS_REGION` | AWS region for DynamoDB |
| `AWS_ACCESS_KEY_ID` | AWS access key |
| `AWS_SECRET_ACCESS_KEY` | AWS secret key |
| `DYNAMODB_TABLE_NAME` | DynamoDB table (default: `sessions`) |
| `RUNBOOK_SYNC_INTERVAL_MS` | Freshness poll interval (default: `3600000` = 1 hour) |
| `LOG_LEVEL` | Pino log level (default: `info`) |
## API
### POST /api/runbooks/sync
Trigger a new runbook sync workflow.
**Request body:**
```json
{
"repoUrl": "https://github.com/org/repo",
"repoPath": "/optional/local/path",
"provider": "claude",
"model": "claude-sonnet-4-6"
}
```
**Response (202):**
```json
{
"runbookId": "rb_abc123",
"status": "queued"
}
```
**Response (400):**
```json
{
"error": "Validation failed",
"details": { ... }
}
```
### GET /api/runbooks/[id]
Retrieve the status and result of a runbook sync.
**Response (200):**
```json
{
"runbookId": "rb_abc123",
"status": "completed",
"alertsGenerated": 5,
"chaosScenariosValidated": 3,
"summary": "...",
"repoUrl": "https://github.com/org/repo",
"timestamp": "2026-01-01T00:00:00.000Z"
}
```
**Response (404):** `{ "error": "Runbook not found" }`
## Tech stack
### REAA packages
- `@reaatech/agent-runbook@0.1.0` — core types, Zod schemas, utilities
- `@reaatech/agent-runbook-cli@0.1.0` — CLI + programmatic runbook generation
- `@reaatech/agent-runbook-alerts@0.1.0` — alert extraction and generation
- `@reaatech/agent-runbook-analyzer@0.1.0` — service repository analysis
- `@reaatech/agent-chaos-scenarios@0.1.0` — chaos scenario loading and validation
- `@reaatech/session-continuity-storage-dynamodb@0.1.0` — DynamoDB session storage
### Third-party packages
- `@anthropic-ai/sdk@0.104.1` — Claude API
- `@slack/web-api@7.17.0` — Slack notifications
- `@trigger.dev/sdk@4.4.6` — durable workflows
- `pino@10.3.1` — structured logging
- `zod@4.4.3` — schema validation
- `@aws-sdk/client-dynamodb` + `@aws-sdk/lib-dynamodb` — DynamoDB client
## Project layout
```
app/api/runbooks/sync/route.ts POST endpoint to trigger workflows
app/api/runbooks/[id]/route.ts GET endpoint to check status
src/services/anthropic.ts Claude summary generation
src/services/slack.ts Slack notification dispatch
src/services/storage.ts DynamoDB persistence
src/services/analyzer.ts Repository analysis adapter
src/services/alerts.ts Alert generation adapter
src/services/chaos.ts Chaos scenario validator
src/services/workflow.ts Trigger.dev workflow definition
src/services/freshness.ts Background freshness polling
src/instrumentation.ts Next.js instrumentation hook
tests/ Vitest test suite (74 tests)
```
## License
MIT — see [LICENSE](./LICENSE).