Files · Google Gemini AI Spend Control for SMBs

68 (1 binary, 489.6 kB total)attempt 1

README.md·3210 B·markdown

markdown

# Google Gemini AI Spend Control for SMBs
 
> Real-time LLM cost tracking and budget enforcement for Google Gemini-powered SMB applications.
 
**Problem:** SMBs adopting Google Gemini for AI face unpredictable per-token costs and risk overspending without centralized visibility or automatic guardrails.
 
This reference recipe demonstrates budget enforcement using `@reaatech/agent-budget-engine`, cost telemetry via `@reaatech/llm-cost-telemetry-aggregation`, model fallback with `@reaatech/llm-router-core`, and OpenTelemetry span-based spend recording via `@reaatech/agent-budget-otel-bridge`.
 
## How it works
 
1. **Pre-flight budget check** — Before every Gemini API call, `BudgetController.check()` estimates cost and verifies remaining budget.
2. **Auto-downgrade** — If budget is constrained, the model is downgraded along the chain: `gemini-2.5-pro` → `gemini-2.5-flash` → `gemini-2.0-flash-lite`.
3. **Spend recording** — After each call, actual tokens and cost are recorded via `BudgetController.record()` and pushed into an aggregation pipeline.
4. **OTel bridge** — GenAI span attributes are converted to spend entries automatically by `SpanListener`.
5. **Dashboard** — `GET /api/spend` returns per-tenant spend summaries; `POST /api/spend` configures budgets.
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `GOOGLE_API_KEY` | Gemini Developer API key (from Google AI Studio) |
| `GOOGLE_CLOUD_PROJECT` | GCP project ID for Enterprise Agent Platform |
| `GOOGLE_CLOUD_LOCATION` | GCP region |
| `GOOGLE_GENAI_USE_ENTERPRISE` | Set `true` for Vertex/Enterprise mode |
| `DEFAULT_DAILY_BUDGET_USD` | Default daily spend cap (`100.0`) |
| `DEFAULT_MONTHLY_BUDGET_USD` | Default monthly spend cap (`2000.0`) |
| `GEMINI_CONCURRENCY_LIMIT` | Max concurrent Gemini calls (`5`) |
| `OTEL_SERVICE_NAME` | OpenTelemetry service name (`gemini-spend-control`) |
 
## API Endpoints
 
### `GET /api/spend`
 
Returns spend data. Query param `?tenant=<name>` filters to one tenant; without it returns all tenants.
 
### `POST /api/spend`
 
Configure budget limits. Body: `{ "tenant": "acme-corp", "daily": 100, "monthly": 2000 }`
 
## Budget Response Headers
 
The root middleware injects these headers on every API response:
- `X-Budget-Remaining` — dollars remaining in the budget
- `X-Budget-Status` — budget state: `active`, `warned`, `degraded`, or `stopped`
- `X-Budget-Limit` — total budget limit in dollars
- `X-Budget-Spent` — dollars spent so far
- `X-Budget-Suggested-Model` — model to use if downgrade was applied
 
## Model Downgrade Chain
 
```
gemini-2.5-pro → gemini-2.5-flash → gemini-2.0-flash-lite
```
 
## Example Usage
 
```bash
# Spend dashboard
curl http://localhost:3000/api/spend
 
# Per-tenant
curl "http://localhost:3000/api/spend?tenant=acme-corp"
 
# Set budget
curl -X POST http://localhost:3000/api/spend \
  -H "Content-Type: application/json" \
  -d '{"tenant":"acme-corp","daily":50,"monthly":1000}'
```
 
## Running
 
```bash
pnpm install
pnpm dev             # http://localhost:3000
pnpm test            # vitest run --coverage
pnpm typecheck       # tsc --noEmit
pnpm lint            # eslint
```
 
## License
 
MIT — see [LICENSE](./LICENSE).