Small business AI agents regularly fail due to downstream tool outages, LLM hallucinations, and retry storms, causing customer-facing disruptions and uncontrolled costs without dedicated SRE teams.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial you’ll build a Vertex AI reliability suite that gives small-business AI agents production-grade fault tolerance without a dedicated SRE team. You’ll layer automated circuit breakers over Vertex AI model calls to isolate failing tools, add idempotency middleware to prevent duplicate side effects, wire up structured output repair that fixes malformed LLM responses against Zod schemas, and connect it all to runbook incident workflows with severity-based escalation. By the end you’ll have a reusable reliability middleware, a webhook endpoint that reacts to circuit-breaker state changes, and an Inngest durable workflow that orchestrates backoff, health-check recovery, and escalation — all backed by Gemini models and fully tested.
Prerequisites
Node.js >=22 and pnpm 10.x (the project uses pnpm workspaces)
A Google Cloud project with the Vertex AI API enabled
A Supabase project (stores incident records and recovery actions)
A Langfuse account for LLM telemetry
An Inngest account for durable workflow orchestration
An OpenAI API key (used by @instructor-ai/instructor for structured output repair)
Familiarity with TypeScript, Next.js App Router route handlers, and basic Zod schema definitions
Step 1: Scaffold the project and install dependencies
Start with a fresh Next.js project using the App Router. The scaffold provides the TypeScript config, ESLint, Vitest with coverage, and a clean lockfile. Once the shell is in place, install all the reliability packages.
terminal
pnpm
create
next-app
vertex-ai-reliability-suite
--ts
--eslint
--app
--src-dir
--import-alias
"@/*"
--use-pnpm
--no-tailwind
--no-turbopack
cd vertex-ai-reliability-suite
Next, add the REAA reliability packages and third-party dependencies. Every version is pinned exactly — no ^ or ~.
Expected output:pnpm install resolves the lockfile with no warnings, and your package.json dependencies section shows every package at the exact version above.
Now add these scripts to package.json under the "scripts" key:
json
"typecheck": "tsc --noEmit","lint": "eslint .","test": "vitest run --coverage --reporter=json --outputFile=vitest-report.json"
Step 2: Configure environment variables
The recipe reads seventeen environment variables spanning Vertex AI, Supabase, Langfuse, Inngest, OpenAI, and reliability tuning knobs. Create or update .env.example with the entries below. Never commit real credentials — use angle-bracket placeholders.
env
# Env vars used by vertex-ai-reliability-suite-for-smb-agent-operations.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# Google Cloud / Vertex AIGOOGLE_CLOUD_PROJECT=<your-gcp-project-id>GOOGLE_CLOUD_LOCATION=us-central1GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json# SupabaseSUPABASE_URL=<your-supabase-url>SUPABASE_ANON_KEY=<your-supabase-anon-key># Langfuse telemetryLANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.com# Inngest durable workflowsINNGEST_EVENT_KEY=<your-inngest-event-key>INNGEST_SIGNING_KEY=<your-inngest-signing-key># OpenAI (peer dep of @instructor-ai/instructor — used for structured output repair)OPENAI_API_KEY=<your-openai-key># Reliability configurationRELIABILITY_CIRCUIT_BREAKER_THRESHOLD=5RELIABILITY_CIRCUIT_BREAKER_WINDOW_MS=60000RELIABILITY_IDEMPOTENCY_TTL_MS=86400000RELIABILITY_MAX_RETRIES=3RELIABILITY_CONCURRENCY_LIMIT=5
Expected output: Your .env.example contains every process.env.X that the source files reference later — seventeen variables across six integration groups.
Step 3: Create shared types with Zod
The types layer drives schema validation across all the services. Create src/types/index.ts with Zod schemas for tool calls, incident severity, reliability config, and TypeScript interfaces for the webhook payload and execution results.
Expected output: TypeScript picks up the Zod schemas without errors. The ReliabilityConfigSchema.default() calls mean you can pass a partial config and get sensible defaults (5 failures, 60s window, 24h TTL, 3 retries, 5 concurrency).
Step 4: Create the Vertex AI client
The Vertex AI client wraps the @google-cloud/vertexai SDK (version 1.12.0) in a typed singleton. It provides methods for basic text generation, tool-augmented generation, streaming, and token counting — all wrapped in a custom VertexAIError with a code and originalError field so the reliability middleware can distinguish SDK failures from business logic errors.
Expected output:pnpm typecheck passes. The vertexClient singleton is ready to call generateContent('gemini-2.5-flash', 'Hello') against your GCP project. The private extractText() method safely unwraps the nested candidates[0].content.parts[0].text chain and throws VertexAIError with code EMPTY_RESPONSE when any level is undefined.
Step 5: Create the circuit breaker service
The circuit breaker service manages named breakers backed by @reaatech/circuit-breaker-agents. Each breaker tracks failures up to a configurable threshold, then opens to stop further calls until a recovery timeout elapses. The service stores breakers in a Map and creates them lazily with in-memory persistence.
Create src/services/circuit-breaker-service.ts:
ts
import { CircuitBreaker, CircuitOpenError, InMemoryAdapter } from "@reaatech/circuit-breaker-agents"import type { ReliabilityConfig } from "../types/index.js"import { ReliabilityConfigSchema } from "../types/index.js"export class CircuitBreakerService { private breakers: Map<string, CircuitBreaker> = new Map() private config: ReliabilityConfig constructor(config?: Partial<ReliabilityConfig>) { this.config = ReliabilityConfigSchema.parse(config ?? {}) } getOrCreateBreaker(name: string): CircuitBreaker { const existing = this.breakers.get(name) if (existing) return existing const breaker = new CircuitBreaker({ name, failureThreshold: this.config.circuitBreakerThreshold, recoveryTimeoutMs: this.config.circuitBreakerWindowMs, persistence: new InMemoryAdapter(), }) this.breakers.set(name, breaker) return breaker } async executeWithBreaker<T>(name: string, fn: () => Promise<T>): Promise<T> { const breaker = this.getOrCreateBreaker(name) try { return await breaker.execute(fn) } catch (error) { if (error instanceof CircuitOpenError) { const enriched = new Error(`Circuit breaker "${name}" is open`) enriched.name = "CircuitOpenError" throw enriched } throw error } }}export const circuitBreakerService = new CircuitBreakerService()
Expected output: When a breaker’s failureThreshold is 5 and the handler fails five times, the sixth call throws CircuitOpenError immediately without invoking the handler.
Step 6: Create the idempotency service
The idempotency service prevents duplicate execution of side-effecting operations. It wraps @reaatech/idempotency-middleware with a lazy-initialised MemoryAdapter and IdempotencyMiddleware. The first call with a given key runs the handler and caches the result; subsequent identical keys return the cached value. On IdempotencyError, the service retries once if isRecoverable() returns true.
Expected output: Two identical calls with the same key invoke the handler only once; the second call returns the cached result. A recoverable error (like a storage timeout) triggers a retry automatically.
Step 7: Create structured output repair and runbook services
Two more supporting services complete the reliability puzzle. The structured output repair service parses raw LLM output against a Zod schema; if parsing fails, it hands the output to @instructor-ai/instructor using gpt-5.2 to repair it. The runbook service wraps @reaatech/agent-runbook-incident to generate incident workflows, escalation policies, and notification templates — plus a determineSeverity() method that maps failure counts to SEV1–SEV4.
Create src/services/structured-output.ts:
ts
import createInstructor from "@instructor-ai/instructor"import OpenAI from "openai"import { z } from "zod"import pRetry from "p-retry"type AnyZodObject = z.ZodObjecttype InstructorClient = { chat: { completions: { create: (p: unknown) => Promise<unknown> } }}export class StructuredOutputRepair { private client: InstructorClient constructor() { const oai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? "" }) this.client = createInstructor({ client: oai, mode: "TOOLS" }) } async repair<T>(rawOutput: string, schema: AnyZodObject): Promise<T> { try { const parsed: unknown = JSON.parse(rawOutput) const result = schema.safeParse(parsed) if (result.success) { return result.data as T } } catch { // JSON parse failed — fall through to Instructor repair } const completion = await this.client.chat.completions.create({ messages: [{ role: "user", content: rawOutput }], model: "gpt-5.2", response_model: { schema, name: "RepairedOutput" }, }) return completion as T } async repairWithRetries<T>( rawOutput: string, schema: AnyZodObject, maxRetries?: number, ): Promise<T> { return pRetry(() => this.repair<T>(rawOutput, schema), { retries: maxRetries ?? 3, }) }}export const outputRepair = new StructuredOutputRepair()
Expected output:outputRepair.repair('{"text":"hello"}', z.object({ text: z.string() })) returns { text: "hello" } synchronously without calling OpenAI. Malformed JSON falls through to the Instructor repair call.
Step 8: Create supporting libraries
Three library modules wire together the infrastructure integrations: Supabase for incident records, Langfuse for LLM telemetry, and a pricing calculator for Gemini cost estimation.
Create src/lib/supabase.ts:
ts
import { createClient } from "@supabase/supabase-js";export const supabase = createClient( process.env.SUPABASE_URL as string, process.env.SUPABASE_ANON_KEY as string,);export const incidents = () => supabase.from("incidents");export const recoveryActions = () => supabase.from("recovery_actions");
export * from "./types/index.js";export * from "./lib/supabase.js";export * from "./lib/langfuse.js";export * from "./lib/pricing.js";
Expected output:pnpm typecheck passes. The traceCall utility wraps any async function with Langfuse observability, and VertexPricingProvider computes Gemini Flash at $0.00015/1K input tokens and $0.0006/1K output tokens.
Step 9: Compose the reliability middleware
The ReliabilityMiddleware class chains all four reliability layers into a single callWithReliability method. The chain runs: idempotency check → circuit breaker → Vertex AI call → structured output repair. It also exposes callWithRetry (powered by p-retry) and callWithConcurrencyLimit (powered by p-limit).
Expected output: Calling reliabilityMiddleware.callWithReliability(...) runs the full chain. The first call with a given idempotency key invokes Vertex AI; the second returns the cached result. If the circuit breaker is open, it propagates as { success: false, error: ... }.
Step 10: Create the Inngest retry orchestrator
The Inngest workflow handles reliability/circuit-breaker-tripped events sent by the webhook. It runs four durable steps:
assess-severity — evaluates the incident via runbookService.evaluateIncident() and writes a record to Supabase
backoff — sleeps for a severity-based duration (SEV1=5s, SEV2=15s, SEV3=30s, SEV4=60s)
attempt-recovery — executes a health check through the circuit breaker
notify-resolution or escalate — on success, applies the resolution notification template; on failure, calls getEscalationPolicy() and updates Supabase with the escalation status
Expected output: The inngestClient is a configured Inngest instance, and handleCircuitBreakerTripped is registered as a function that listens for reliability/circuit-breaker-tripped events.
Step 11: Wire up the webhook API route
The webhook route at POST /api/runbook/webhook receives circuit-breaker state change alerts. When a breaker trips to OPEN state, the handler:
Evaluates the incident using runbookService.evaluateIncident()
Renders notification templates via runbookService.applyTemplate()
Sends an Inngest event to kick off the durable recovery workflow
Expected output: A GET to http://localhost:3000/api/runbook/webhook returns {"status":"ok"}. A POST with a valid payload returns {"received":true,"severity":"SEV3","incidentId":"...","templates":[...]}.
Step 12: Run the tests
The test suite covers all services, the middleware, the webhook route, the Inngest workflow, and the three library modules. Every external dependency is mocked with vi.mock so tests never hit a live network.
terminal
pnpm test
Expected output: All 71 tests pass with code coverage above 90% across all metrics:
Swap in-memory storage for distributed persistence — the circuit breaker and idempotency layers currently use in-memory adapters. For horizontal scaling, replace them with Redis or DynamoDB adapters from @reaatech/circuit-breaker-agents and @reaatech/idempotency-middleware.
Add a Slack or PagerDuty notification channel — extend the webhook handler to send the rendered notification templates to a real messaging platform when a severity SEV1 or SEV2 incident is detected.
Deploy the Inngest workflow to production — configure INNGEST_EVENT_KEY and INNGEST_SIGNING_KEY and run the Inngest dev server alongside Next.js to see the durable workflow execute with real retries and escalation.