SMBs deploying AI-driven scheduling and dispatch agents on ServiceTitan risk runaway costs if one tenant floods the system with requests; without per-tenant budgets, a single bad day can exceed the entire month's budget.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
Small businesses using ServiceTitan with AI-driven scheduling and dispatch agents risk runaway OpenAI costs if one tenant floods the system with requests. Without per-tenant budgets, a single bad day can exceed the entire month’s spend. This recipe builds a cost-control layer that enforces per-tenant token budgets using @reaatech/agent-budget-engine, auto-downgrades to cheaper models when caps approach via @reaatech/llm-router-engine, trips circuit breakers after repeated failures via @reaatech/circuit-breaker-core, and streams spend metrics to Helicone for real-time dashboards. You’ll wire everything into a Next.js 16+ App Router project with a Hono API layer.
Prerequisites
Node.js 22+ and pnpm 10+ installed
An OpenAI API key with billing enabled
A ServiceTitan developer account with client ID and secret (or you can skip the ServiceTitan integration and test with static tenant IDs)
Familiarity with TypeScript, basic Next.js App Router concepts, and terminal usage
Step 1: Scaffold the project and install dependencies
Create a Next.js 16+ project with the App Router and install all dependencies, pinning every version exactly.
Expected output:pnpm install completes without errors. The lockfile pins every package to the exact version above (no ^ or ~ prefixes). You can verify with pnpm ls --depth=0.
Step 2: Configure environment variables
Create your .env file from the example. These variables wire up every integration — OpenAI, ServiceTitan, Postgres, and Helicone — plus budget defaults.
terminal
cp .env.example .env
The .env.example includes these entries:
env
# Env vars used by openai-cost-control-for-servicetitan-small-business-agent-spend.# The builder adds entries here as it wires up each integration.# Keep placeholders only — never commit real values.NODE_ENV=development# OpenAIOPENAI_API_KEY=<your-openai-key># ServiceTitan OAuth2SERVICETITAN_CLIENT_ID=<your-client-id>SERVICETITAN_CLIENT_SECRET=<your-client-secret>SERVICETITAN_TENANT_ID=<your-tenant-id>SERVICETITAN_BASE_URL=https://api.servicetitan.com# PostgresDATABASE_URL=postgres://user:***@localhost:5432/costcontrol# Helicone observabilityHELICONE_API_KEY=<your-helicone-key># Budget defaultsDEFAULT_DAILY_BUDGET=100DEFAULT_MONTHLY_BUDGET=2000BUDGET_SOFT_CAP=0.8CHEAPER_MODEL=gpt-4o-mini# Telemetry config (used by @reaatech/llm-cost-telemetry)OTEL_SERVICE_NAME=servicetitan-cost-control
Fill in real values for OPENAI_API_KEY, SERVICETITAN_CLIENT_ID, SERVICETITAN_CLIENT_SECRET, DATABASE_URL, and HELICONE_API_KEY. The budget defaults let you tune caps without redeploying.
Expected output:.env is populated with your keys. The .env.example stays in source control as a template.
Step 3: Create shared types and config
Create src/lib/types.ts with the core interfaces and Zod schemas used across all services:
Now create src/lib/config.ts — a validated environment-config loader using Zod that also merges telemetry defaults from @reaatech/llm-cost-telemetry:
ts
import { z } from "zod";import { loadConfig } from "@reaatech/llm-cost-telemetry";const envSchema = z.object({ OPENAI_API_KEY: z.string().min(1, "OPENAI_API_KEY is required"), DATABASE_URL: z.string().min(1, "DATABASE_URL is required"), SERVICETITAN_CLIENT_ID: z.string().min(1, "SERVICETITAN_CLIENT_ID is required"), SERVICETITAN_CLIENT_SECRET: z.string().min(1, "SERVICETITAN_CLIENT_SECRET is required"), SERVICETITAN_TENANT_ID: z.string().min(1, "SERVICETITAN_TENANT_ID is required"), SERVICETITAN_BASE_URL: z.string().default("https://api.servicetitan.com"), HELICONE_API_KEY: z.string().optional(), DEFAULT_DAILY_BUDGET: z.coerce.number().positive().default(100), DEFAULT_MONTHLY_BUDGET: z.coerce.number().positive().default(2000), BUDGET_SOFT_CAP: z.coerce.number().min(0).max(1).default(0.8), CHEAPER_MODEL: z.string().default("gpt-4o-mini"),});export type AppConfig = z.infer<typeof envSchema>;let cachedConfig: AppConfig | null = null;export function loadAppConfig(): AppConfig { if (cachedConfig) return cachedConfig; const parsed = envSchema.safeParse(process.env); if (!parsed.success) { const missing = parsed.error.issues.map((i) => i.path.join(".")).join(", "); throw new Error(`Missing or invalid env vars: ${missing}`); } // Merge telemetry defaults from @reaatech/llm-cost-telemetry try { loadConfig(); } catch { /* telemetry defaults optional */ } cachedConfig = parsed.data; return parsed.data;}
Expected output:pnpm typecheck passes. The config loader validates all required env vars at first call and caches the result for subsequent access.
Step 4: Set up the database layer
Create src/lib/db.ts — a singleton Postgres connection with schema initialization:
ts
import postgres from "postgres";import { loadAppConfig } from "./config.js";let sql: postgres.Sql | null = null;export function getDb(): postgres.Sql { if (!sql) { const config = loadAppConfig(); sql = postgres(config.DATABASE_URL); } return sql;}export async function initSchema(): Promise<void> { const db = getDb(); await db` CREATE TABLE IF NOT EXISTS tenant_budgets ( tenant_id TEXT PRIMARY KEY, daily_budget NUMERIC NOT NULL, monthly_budget NUMERIC NOT NULL, spent_today NUMERIC DEFAULT 0, spent_this_month NUMERIC DEFAULT 0, state TEXT DEFAULT 'Active', updated_at TIMESTAMPTZ DEFAULT NOW() ) `; await db` CREATE TABLE IF NOT EXISTS cost_spans ( id TEXT PRIMARY KEY, tenant_id TEXT NOT NULL, provider TEXT, model TEXT, input_tokens INT, output_tokens INT, cost_usd NUMERIC, feature TEXT, route TEXT, timestamp TIMESTAMPTZ DEFAULT NOW() ) `;}export async function closeDb(): Promise<void> { if (sql) { await sql.end({ timeout: 5 }); sql = null; }}
The tenant_budgets table stores per-tenant caps and running totals. The cost_spans table records every LLM call for audit and aggregation.
Expected output:pnpm typecheck passes. The functions compile without errors.
Step 5: Implement the ServiceTitan client
Create src/services/servicetitan.ts — an OAuth2-backed client that fetches tenant identity from job IDs:
ts
import { ServiceTitanJobSchema, type ServiceTitanConfig, type ServiceTitanJob } from "../lib/types.js";import { loadAppConfig } from "../lib/config.js";export class ServiceTitanError extends Error { constructor( public status: number, message: string, ) { super(message); this.name = "ServiceTitanError"; }}interface TokenCache { accessToken: string; expiresAt: number;}export class ServiceTitanClient { private tokenCache: TokenCache | null = null; constructor(private config: ServiceTitanConfig) {} async getAccessToken(): Promise<string> { if (this.tokenCache && Date.now() < this.tokenCache.expiresAt) { return this.tokenCache.accessToken; } const resp = await fetch(`${this.config.baseUrl}/connect/token`, { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ grant_type: "client_credentials", client_id: this.config.clientId, client_secret: this.config.clientSecret, }), }); if (!resp.ok) { throw new ServiceTitanError(resp.status, `OAuth2 token request failed: ${resp.statusText}`); } const data = (await resp.json()) as { access_token: string; expires_in: number }; this.tokenCache = { accessToken: data.access_token, expiresAt: Date.now() + (data.expires_in - 60) * 1000, }; return data.access_token; } async getTenantFromJob(jobId: string): Promise<string> { if (!jobId) throw new Error("jobId is required"); const token = await this.getAccessToken(); const resp = await fetch(`${this.config.baseUrl}/v1/jobs/${jobId}`, { headers: { Authorization: `Bearer ${token}` }, }); if (resp.status === 404) throw new ServiceTitanError(404, `Job ${jobId} not found`); if (!resp.ok) throw new ServiceTitanError(resp.status, `Job fetch failed: ${resp.statusText}`); const data = (await resp.json()) as { tenantId: string }; return data.tenantId; } async getJobDetails(jobId: string): Promise<ServiceTitanJob> { if (!jobId) throw new Error("jobId is required"); const token = await this.getAccessToken(); const resp = await fetch(`${this.config.baseUrl}/v1/jobs/${jobId}`, { headers: { Authorization: `Bearer ${token}` }, }); if (resp.status === 404) throw new ServiceTitanError(404, `Job ${jobId} not found`); if (!resp.ok) throw new ServiceTitanError(resp.status, `Job fetch failed: ${resp.statusText}`); return ServiceTitanJobSchema.parse(await resp.json()); }}export function createServiceTitanClient(config?: Partial<ServiceTitanConfig>): ServiceTitanClient { const appConfig = loadAppConfig(); return new ServiceTitanClient({ clientId: config?.clientId ?? appConfig.SERVICETITAN_CLIENT_ID, clientSecret: config?.clientSecret ?? appConfig.SERVICETITAN_CLIENT_SECRET, tenantId: config?.tenantId ?? appConfig.SERVICETITAN_TENANT_ID, baseUrl: config?.baseUrl ?? appConfig.SERVICETITAN_BASE_URL, });}
The client caches OAuth2 tokens with a 60-second safety margin before expiry, auto-refreshing on subsequent calls.
Expected output:pnpm typecheck passes without errors.
Step 6: Build the budget enforcement service
Create src/services/budget-service.ts — wraps BudgetController from @reaatech/agent-budget-engine and SpendStore from @reaatech/agent-budget-spend-tracker:
ts
import { BudgetController } from "@reaatech/agent-budget-engine";import { SpendStore } from "@reaatech/agent-budget-spend-tracker";import { BudgetScope } from "@reaatech/agent-budget-types";import { type BudgetCheckResult } from "../lib/types";import { loadAppConfig } from "../lib/config";export class BudgetEnforcementService { private controller: BudgetController; private store: SpendStore; constructor() { this.store = new SpendStore(); this.controller = new BudgetController({ spendTracker: this.store });
The service wires threshold-breach and hard-stop events for logging, accepts optional policy overrides per tenant, and falls back to env-configured defaults.
Expected output:pnpm typecheck passes.
Step 7: Build the cost telemetry service
Create src/services/telemetry-service.ts — wires CostCollector, CostAggregator, and BudgetManager from @reaatech/llm-cost-telemetry-aggregation:
ts
import { CostCollector, CostAggregator, BudgetManager } from "@reaatech/llm-cost-telemetry-aggregation";import { type CostSpan, type BudgetStatus } from "@reaatech/llm-cost-telemetry";import { getDb } from "../lib/db";export class CostTelemetryService { public collector: CostCollector; public aggregator: CostAggregator; public budgetManager: BudgetManager; constructor() { this.aggregator = new CostAggregator({ dimensions: ["tenant", "provider", "model"], timeWindows: ["hour"
The CostCollector buffers spans in memory and flushes them to the aggregator, budget manager, and Postgres every 60 seconds or when the buffer hits 1,000 entries.
Expected output:pnpm typecheck passes. All three internal components (collector, aggregator, budgetManager) are created with the configured dimensions and alert thresholds.
Step 8: Build the circuit breaker service
Create src/services/circuit-breaker-service.ts — wraps CircuitBreaker from @reaatech/circuit-breaker-core with per-circuit tracking:
Each circuit is identified by a string like openai-{tenantId}. The breaker trips after 5 consecutive failures with a 30-second recovery timeout. State transitions (CLOSED → OPEN → HALF_OPEN → CLOSED) are logged.
Expected output:pnpm typecheck passes.
Step 9: Build the OpenAI service with Helicone proxying
Create src/services/openai-service.ts — wraps the OpenAI SDK with budget checks and cost telemetry:
ts
import OpenAI from "openai";import { generateId, now, calculateCostFromTokens, CostSpanSchema, type CostSpan, type BudgetStatus } from "@reaatech/llm-cost-telemetry";import { BudgetExceededError, BudgetScope, EnforcementAction } from "@reaatech/agent-budget-types";import { loadAppConfig } from "../lib/config";import { type BudgetCheckResult } from "../lib/types";import { HeliconeService } from "./helicone-service";export interface IBudgetEnforcementService { checkRequest(tenantId: string, estimatedCost: number, modelId: string, tools?: string[]): BudgetCheckResult
The chatWithBudget method runs a two-stage pre-flight check (first the budget engine, then the telemetry budget manager) before making the API call, and records spend afterwards.
Now create src/services/helicone-service.ts — a proxy wrapper that routes OpenAI calls through Helicone’s observability proxy:
Expected output:pnpm typecheck passes. HeliconeService degrades gracefully when the HELICONE_API_KEY is absent, falling back to a direct OpenAI client.
Step 10: Build the LLM router with auto-downgrade
Create src/services/router-service.ts — registers two model tiers (gpt-4o-mini as workhorse, gpt-4o as quality judge) and routes requests through @reaatech/llm-router-engine:
ts
import { ModelDefinitionSchema, RoutingRequestSchema } from "@reaatech/llm-router-core";import { LLMRouter, parseRouterConfig, ProviderClientFactory, type LLMClient, type CompletionOptions, type CompletionResult, type RouterRouteSummary } from "@reaatech/llm-router-engine";import OpenAI from "openai";import { BudgetEnforcementService } from "./budget-service";import { CostTelemetryService } from "./telemetry-service";import { loadAppConfig } from "../lib/config";const ROUTER_YAML = `models: workhorses: - id: gpt-4o-mini provider: openai cost_per_million_input: 0.15 cost_per_million_output: 0.60 max_tokens: 128000
The router uses two strategies: cost-optimized (always picks gpt-4o-mini) and quality (uses gpt-4o as a judge with gpt-4o-mini as the workhorse). Each route call validates the request shape against RoutingRequestSchema at the boundary.
Expected output:pnpm typecheck passes. The RouterService compiles and imports all required symbols from the REAA packages.
Step 11: Wire the Hono API app
Create src/api/app.ts — a Hono application with middleware for auth and tenant context, plus endpoints for budget, cost, and chat:
Step 12: Create the Next.js catch-all route handler
Create app/api/[[...route]]/route.ts — the Next.js catch-all that delegates all HTTP methods to the Hono app:
ts
import type { NextRequest } from "next/server";import { createApiApp } from "../../../src/api/app";import { BudgetEnforcementService } from "../../../src/services/budget-service";import { CostTelemetryService } from "../../../src/services/telemetry-service";import { RouterService } from "../../../src/services/router-service";import { OpenAIService } from "../../../src/services/openai-service";import { HeliconeService } from "../../../src/services/helicone-service";const budget = new BudgetEnforcementService();const telemetry = new CostTelemetryService();const helicone = new HeliconeService();const router = new RouterService(budget, telemetry);const openai = new OpenAIService(budget, telemetry, helicone);const app = createApiApp({ budget, telemetry, router, openai });export async function GET(req: NextRequest) { return app.fetch(req);}export async function POST(req: NextRequest) { return app.fetch(req);}export async function PUT(req: NextRequest) { return app.fetch(req);}export async function DELETE(req: NextRequest) { return app.fetch(req);}
The catch-all route pattern [[...route]] captures every path under /api/ and forwards it to Hono. Because Hono’s app.fetch(request) returns a standard Response, no adapter is needed — it works directly with Next.js route handlers.
Expected output:pnpm typecheck passes.
Step 13: Write the orchestrator entry point
Create src/index.ts — the application orchestrator that initializes all services, loads tenant budgets from the database, and returns the Hono app:
ts
import { BudgetEnforcementService } from "./services/budget-service";import { CostTelemetryService } from "./services/telemetry-service";import { CircuitBreakerService } from "./services/circuit-breaker-service";import { OpenAIService } from "./services/openai-service";import { RouterService } from "./services/router-service";import { HeliconeService } from "./services/helicone-service";import { createApiApp } from "./api/app";import { getDb, initSchema } from "./lib/db";export async function createApp() { const budget = new BudgetEnforcementService(); const telemetry = new CostTelemetryService(); const circuitBreaker = new CircuitBreakerService(); const helicone = new HeliconeService(); const router = new RouterService(budget, telemetry); const openai = new OpenAIService(budget, telemetry); await initSchema().catch(() => {}); const db = getDb(); const rows = await db`SELECT tenant_id, daily_budget FROM tenant_budgets`; const budgets = rows.map((r) => ({ tenantId: String(r.tenant_id), dailyBudget: Number(r.daily_budget), })); budget.loadBudgetsFromDb(budgets); const app = createApiApp({ budget, telemetry, router, openai }); return { app, budget, telemetry, router, openai, circuitBreaker, helicone, db };}export { BudgetEnforcementService } from "./services/budget-service";export { CostTelemetryService } from "./services/telemetry-service";export { CircuitBreakerService } from "./services/circuit-breaker-service";export { OpenAIService } from "./services/openai-service";export { RouterService } from "./services/router-service";export { HeliconeService } from "./services/helicone-service";export { ServiceTitanClient, createServiceTitanClient, ServiceTitanError } from "./services/servicetitan";export { createApiApp } from "./api/app";export { loadAppConfig } from "./lib/config";export { getDb, initSchema, closeDb } from "./lib/db";
Expected output:pnpm typecheck passes. All six service classes are instantiated without error.
Step 14: Write tests and run the suite
Create tests/index.test.ts — a smoke test that verifies every service and export is available:
Expected output: All tests pass. Coverage metrics meet the 90% threshold across lines, branches, functions, and statements. The vitest-report.json is written to disk.
Next steps
Add ServiceTitan job resolution — Wire ServiceTitanClient.getTenantFromJob() into the middleware so tenant context is resolved automatically from a jobId query parameter, removing the need for clients to pass x-tenant-id directly.
Deploy Helicone dashboards — Configure Helicone alerts that fire when per-tenant spend exceeds configurable thresholds, sending webhook notifications to Slack or PagerDuty.
Extend the router — Add more model tiers (gpt-4.1-nano, gpt-4.1-mini) and strategies (latency-optimized, balanced) to the router YAML config so the cost-control layer can adapt to different latency and quality requirements.