Files · Google Gemini AI Spend Control for SMBs
68 (1 binary, 489.6 kB total)attempt 1
README.md·3210 B·markdown
markdown
# Google Gemini AI Spend Control for SMBs
> Real-time LLM cost tracking and budget enforcement for Google Gemini-powered SMB applications.
**Problem:** SMBs adopting Google Gemini for AI face unpredictable per-token costs and risk overspending without centralized visibility or automatic guardrails.
This reference recipe demonstrates budget enforcement using `@reaatech/agent-budget-engine`, cost telemetry via `@reaatech/llm-cost-telemetry-aggregation`, model fallback with `@reaatech/llm-router-core`, and OpenTelemetry span-based spend recording via `@reaatech/agent-budget-otel-bridge`.
## How it works
1. **Pre-flight budget check** — Before every Gemini API call, `BudgetController.check()` estimates cost and verifies remaining budget.
2. **Auto-downgrade** — If budget is constrained, the model is downgraded along the chain: `gemini-2.5-pro` → `gemini-2.5-flash` → `gemini-2.0-flash-lite`.
3. **Spend recording** — After each call, actual tokens and cost are recorded via `BudgetController.record()` and pushed into an aggregation pipeline.
4. **OTel bridge** — GenAI span attributes are converted to spend entries automatically by `SpanListener`.
5. **Dashboard** — `GET /api/spend` returns per-tenant spend summaries; `POST /api/spend` configures budgets.
## Environment Variables
| Variable | Description |
|---|---|
| `GOOGLE_API_KEY` | Gemini Developer API key (from Google AI Studio) |
| `GOOGLE_CLOUD_PROJECT` | GCP project ID for Enterprise Agent Platform |
| `GOOGLE_CLOUD_LOCATION` | GCP region |
| `GOOGLE_GENAI_USE_ENTERPRISE` | Set `true` for Vertex/Enterprise mode |
| `DEFAULT_DAILY_BUDGET_USD` | Default daily spend cap (`100.0`) |
| `DEFAULT_MONTHLY_BUDGET_USD` | Default monthly spend cap (`2000.0`) |
| `GEMINI_CONCURRENCY_LIMIT` | Max concurrent Gemini calls (`5`) |
| `OTEL_SERVICE_NAME` | OpenTelemetry service name (`gemini-spend-control`) |
## API Endpoints
### `GET /api/spend`
Returns spend data. Query param `?tenant=<name>` filters to one tenant; without it returns all tenants.
### `POST /api/spend`
Configure budget limits. Body: `{ "tenant": "acme-corp", "daily": 100, "monthly": 2000 }`
## Budget Response Headers
The root middleware injects these headers on every API response:
- `X-Budget-Remaining` — dollars remaining in the budget
- `X-Budget-Status` — budget state: `active`, `warned`, `degraded`, or `stopped`
- `X-Budget-Limit` — total budget limit in dollars
- `X-Budget-Spent` — dollars spent so far
- `X-Budget-Suggested-Model` — model to use if downgrade was applied
## Model Downgrade Chain
```
gemini-2.5-pro → gemini-2.5-flash → gemini-2.0-flash-lite
```
## Example Usage
```bash
# Spend dashboard
curl http://localhost:3000/api/spend
# Per-tenant
curl "http://localhost:3000/api/spend?tenant=acme-corp"
# Set budget
curl -X POST http://localhost:3000/api/spend \
-H "Content-Type: application/json" \
-d '{"tenant":"acme-corp","daily":50,"monthly":1000}'
```
## Running
```bash
pnpm install
pnpm dev # http://localhost:3000
pnpm test # vitest run --coverage
pnpm typecheck # tsc --noEmit
pnpm lint # eslint
```
## License
MIT — see [LICENSE](./LICENSE).