Small businesses using multiple AI agents on Vertex AI often exceed their monthly budget due to unpredictable model calls and no per-agent cost controls.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
You’ll build an Express server that acts as a budget guard for any AI agent calling Vertex AI. Every LLM request passes through a middleware that checks per-scope spending limits, attaches budget headers to the response, and blocks requests when the limit is hit. By the end you’ll have a working server you can start with pnpm dev, a /api/llm endpoint that enforces budget rules, a /metrics endpoint for real-time cost dashboards, and a full test suite with 90%+ coverage.
Prerequisites
Node.js >= 22 — the project uses ES2022 and "type": "module"
pnpm 10.x — the lockfile was generated with pnpm 10.9.0
A Google Cloud project with the Vertex AI API enabled — you’ll need your project ID and a service account with Vertex AI permissions
Application Default Credentials — run gcloud auth application-default login so the Vertex AI SDK can authenticate
Familiarity with TypeScript, Express, and environment variables
Step 1: Scaffold the project
Create a new directory and a package.json that declares the project as an ES module with the exact scripts and metadata the recipe needs.
Run pnpm install from the project root. This pulls in everything: the REAA budget packages (@reaatech/agent-budget-engine, agent-budget-pricing, agent-budget-spend-tracker, agent-budget-llm-router-plugin, agent-budget-middleware, agent-eval-harness-cost), the Vertex AI SDK, Express 5, and all dev tooling (TypeScript, Vitest, ESLint, Prettier, MSW, supertest).
terminal
pnpm install
Expected output: pnpm resolves and installs all dependencies, then prints a summary like Done in Xs. You’ll see node_modules/ appear with a .pnpm lockfile.
Step 3: Configure TypeScript
The project targets ES2022 with NodeNext module resolution and strict mode enabled. Create tsconfig.json at the project root:
The server reads configuration from the environment. Create .env at the project root with every variable the app expects. Fill in your actual GCP project ID.
terminal
# ServerPORT=3000# Vertex AI (required — get from GCP console)PROJECT_ID=your-gcp-project-idLOCATION=us-central1DEFAULT_MODEL_ID=gemini-2.0-flash# BudgetDEFAULT_BUDGET_LIMIT=100.00BUDGET_SCOPE_TYPE_HEADER=x-budget-scope-typeBUDGET_SCOPE_KEY_HEADER=x-budget-scope-key# AdminADMIN_TOKEN=change-me-to-a-secure-random-stringLOG_LEVEL=info
Don’t skip PROJECT_ID — the server throws a descriptive error on startup if it’s missing or empty. ADMIN_TOKEN protects the /metrics endpoint; replace it with a secure random string before deploying.
Step 5: Create the configuration loader
The config module reads every environment variable, applies defaults, validates required fields, and returns a frozen typed object. Create src/config.ts:
parseIntSafe and parseFloatSafe guard against garbage values — a non-numeric PORT silently falls back to 3000, and a non-numeric DEFAULT_BUDGET_LIMIT falls back to 100.0.
Step 6: Create shared types
This module re-exports error types from @reaatech/agent-budget-types, declares the BudgetContext that rides on every Express request, defines the MetricsResponse shape, and creates a custom VertexApiError. Create src/types.ts:
ts
import type { BudgetScope } from '@reaatech/agent-budget-types';export { BudgetScope } from '@reaatech/agent-budget-types';export { BudgetExceededError, BudgetValidationError, BudgetError, BudgetErrorCode, type ScopeIdentifier, type BudgetState, type BudgetCheckResult,} from '@reaatech/agent-budget-types';export interface BudgetContext { scopeType: BudgetScope; scopeKey: string; remainingBudget: number; allowed: boolean; suggestedModel?: string; modelId?: string; disabledTools?: string[]; warning?: string;}declare global { namespace Express { interface Request { budgetContext?: BudgetContext; } }}export type BudgetRequest = import('express').Request;export interface MetricsResponse { totalSpent: number; perScopeBreakdown: Record<string, { spent: number; limit: number; state: string }>; efficiencyScores: Record<string, number>; trajectoryCount: number;}export class VertexApiError extends Error { public readonly statusCode: number; public readonly vertexCode: string | undefined; public constructor(message: string, statusCode: number, vertexCode?: string) { super(message); this.name = 'VertexApiError'; this.statusCode = statusCode; this.vertexCode = vertexCode; }}
The declare global block augments Express’s Request type so every route handler can access req.budgetContext without casting.
Step 7: Create the Vertex AI client wrapper
The VertexClient class wraps the @google-cloud/vertexai SDK. It initializes with your project and location, then exposes a generateContent method that handles response parsing and error normalization. Create src/lib/vertex-client.ts:
Any Vertex API error is caught and re-thrown as a VertexApiError with a known HTTP status code, which the Express error-handling chain can surface cleanly.
Step 8: Create the budget middleware
This is the core of the recipe. The middleware factory wires together all the REAA packages — SpendStore for tracking, PricingEngine for cost calculation, BudgetController for policy enforcement, and BudgetAwareStrategy for model routing — into a pair of Express middleware functions. Create src/middleware/budget.ts:
ts
import { BudgetController } from '@reaatech/agent-budget-engine';import { SpendStore } from '@reaatech/agent-budget-spend-tracker';import { PricingEngine } from '@reaatech/agent-budget-pricing';import { BudgetAwareStrategy } from '@reaatech/agent-budget-llm-router-plugin';import { BudgetExceededError, BudgetScope, BudgetValidationError } from '@reaatech/agent-budget-types';import type { BudgetCheckResult } from '@reaatech/agent-budget-types';import type { Request, Response, NextFunction } from 'express';import { VertexClient } from '../lib/vertex-client.js';import type { AppConfig } from '../config.js';import type { BudgetContext } from '../types.js'
Here’s what happens on each request:
beforeStep reads x-budget-scope-type and x-budget-scope-key from the request headers, maps them to a BudgetScope, runs a pre-flight cost check through the BudgetController, and sets response headers (X-Budget-Remaining, X-Budget-Limit, X-Budget-Status, and optionally X-Budget-Suggested-Model). If the budget is exceeded it returns HTTP 402 immediately — your route handler never runs.
BudgetAwareStrategy consults the pricing registry and suggests a cheaper model when the budget is constrained.
afterStep runs after your handler completes and records the spend (input tokens, output tokens, computed cost) into the SpendStore for future checks.
Step 9: Create the metrics endpoint
The /metrics route reads from the BudgetController and CostTracker to produce a real-time JSON dashboard. It’s protected by an admin token. Create src/lib/metrics.ts:
ts
import { Router, type Request, type Response } from 'express';import { BudgetController } from '@reaatech/agent-budget-engine';import { SpendStore } from '@reaatech/agent-budget-spend-tracker';import { CostTracker } from '@reaatech/agent-eval-harness-cost';import type { MetricsResponse } from '../types.js';import type { AppConfig } from '../config.js';export function createMetricsRouter( _store: SpendStore, controller: BudgetController, appConfig: Readonly<AppConfig>,): Router { const router = Router(); const costTracker = new CostTracker(appConfig.defaultBudgetLimit); router.get('/metrics', (req: Request, res: Response): void => { // Check admin authorization const authHeader = req.headers.authorization; if (!authHeader) { res.status(401).json({ error: 'Missing authorization header' }); return; } const token = authHeader.startsWith('Bearer ') ? authHeader.slice(7) : authHeader; if (token !== appConfig.adminToken) { res.status(401).json({ error: 'Invalid admin token' }); return; } try { // Query all budgets from controller const allBudgets = controller.listAll(); let totalSpent = 0; const perScopeBreakdown: Record<string, { spent: number; limit: number; state: string }> = {}; for (const entry of allBudgets) { const def = entry.definition; const state = entry.state; const spent = state?.spent ?? 0; const limitVal = def.limit; const stateStr = state?.state ?? 'active'; totalSpent += spent; const key = `${def.scopeType}:${def.scopeKey}`; perScopeBreakdown[key] = { spent, limit: limitVal, state: stateStr }; } // Get cost tracker data const trackedTotal = costTracker.getTotalCost(); const trajectoryCount = costTracker.getTrajectoryCount(); // Efficiency scores - simple heuristic based on spend vs limit const efficiencyScores: Record<string, number> = {}; for (const entry of allBudgets) { const def = entry.definition; const state = entry.state; if (state !== undefined) { const efficiency = def.limit > 0 ? Math.min(100, Math.round((1 - state.spent / def.limit) * 100)) : 100; efficiencyScores[`${def.scopeType}:${def.scopeKey}`] = efficiency; } } const metrics: MetricsResponse = { totalSpent: trackedTotal > 0 ? trackedTotal : totalSpent, perScopeBreakdown, efficiencyScores, trajectoryCount, }; res.json(metrics); } catch (error: unknown) { const message = error instanceof Error ? error.message : 'Internal server error'; res.status(500).json({ error: message }); } }); return router;}
The response is a MetricsResponse object with totalSpent, perScopeBreakdown (keyed by "scopeType:scopeKey"), efficiencyScores (0–100, computed as (1 - spent/limit) * 100), and trajectoryCount.
Step 10: Assemble the Express app and entry point
The app factory creates an Express application, mounts the budget middleware on POST /api/llm, mounts the metrics router, and adds a /health endpoint. Create src/app.ts:
ts
import express, { type Application, type Request, type Response, type NextFunction } from 'express';import { createBudgetGuardMiddleware } from './middleware/budget.js';import { createMetricsRouter } from './lib/metrics.js';import { loadConfig } from './config.js';export function createApp(): Application { const app = express(); // JSON body parser app.use(express.json()); // Request logging middleware app.use((req: Request, _res: Response, next: NextFunction) => { const now = new Date().toISOString(); console.log(`[${now}] ${req.method} ${req.url}`); next(); }); const config = loadConfig(); // Create budget middleware const { beforeStep, afterStep, controller, store } = createBudgetGuardMiddleware(config); // Mount budget middleware on /api/llm with proper chaining app.post( '/api/llm', beforeStep, (req: Request, res: Response, next: NextFunction) => { // The Vertex AI call would go here in production res.json({ status: 'ok', model: req.budgetContext?.modelId ?? config.defaultModelId, }); // Proceed to afterStep for spend recording next(); }, afterStep, ); // Mount metrics router const metricsRouter = createMetricsRouter(store, controller, config); app.use(metricsRouter); // Health check app.get('/health', (_req: Request, res: Response) => { res.json({ status: 'ok', uptime: process.uptime() }); }); return app;}
The entry point loads the config, creates the app, starts listening, and handles graceful shutdown on SIGTERM/SIGINT. Create src/index.ts:
ts
import { createApp } from './app.js';import { loadConfig } from './config.js';const config = loadConfig();const app = createApp();const server = app.listen(config.port, () => { console.log( `Vertex Budget Guardrails listening on port ${config.port} (model: ${config.defaultModelId})`, );});// Graceful shutdownfunction shutdown(signal: string): void { console.log(`\n${signal} received — shutting down...`); server.close(() => { console.log('Server closed'); process.exit(0); }); // Force shutdown after 5 seconds setTimeout(() => { console.error('Forced shutdown after timeout'); process.exit(1); }, 5_000).unref();}process.on('SIGTERM', () => shutdown('SIGTERM'));process.on('SIGINT', () => shutdown('SIGINT'));
Step 11: Run the test suite
The project ships with 40 tests across 14 suites covering the config loader, budget middleware (every scope type, budget-exceeded paths, edge cases with missing/invalid headers), the metrics endpoint (valid tokens, missing tokens, wrong tokens, empty store), and full integration flows. Tests use Vitest with V8 coverage and MSW to mock Vertex AI calls.
terminal
pnpm test
Expected output: all 40 tests pass, and the terminal prints a coverage summary. You’ll see output like:
To see the budget guard block a request, repeat the /api/llm call many times (or set DEFAULT_BUDGET_LIMIT=0.01 in .env and restart). When the limit is reached, the server returns HTTP 402 with {"error": "Budget exceeded"}.
Next steps
Wire up the real Vertex AI call — replace the stub in src/app.ts with vertexClient.generateContent(prompt, req.budgetContext?.modelId) to connect live LLM calls through the budget guard.
Add per-agent scoped budgets — call controller.defineBudget() for each agent ID (e.g. BudgetScope.User with the agent’s key) so different agents have independent spending caps.
Set up a cron job to reset budgets — wrap the metrics endpoint in a scheduled job that reports daily spend summaries to Slack or email, so you catch runaway costs before they hit the hard cap.
;
function getScopeFromRequest(req: Request): { scopeType: BudgetScope; scopeKey: string } {
const rawType = (req.headers['x-budget-scope-type'] as string | undefined) ?? '';
const rawKey = (req.headers['x-budget-scope-key'] as string | undefined) ?? '';