Files · vLLM AI Spend Control for SMB Agent Workflows
73 (1 binary, 518.6 kB total)attempt 1
README.md·4135 B·markdown
markdown
# vLLM AI Spend Control for SMB Agent Workflows
This recipe instruments every vLLM call through a cost interceptor that passes token counts to `@reaatech/agent-budget-spend-tracker`, which accumulates spend using `@reaatech/agent-budget-pricing` mappings for open-source models. `@reaatech/agent-budget-engine` enforces soft and hard caps per agent or tenant, while `@reaatech/llm-cost-telemetry-calculator` converts token usage into dollar amounts. Cost telemetry is exported to Langfuse and Helicone for real-time observability.
## Architecture
The system is organized around a single interception point:
```mermaid
graph LR
A[Client] --> B[POST /api/chat]
B --> C[Cost Interceptor]
C --> D[vLLM API<br/>(@ai-sdk/openai-compatible)]
C --> E[BudgetController]
E --> F[SpendStore]
C --> G[TelemetryService]
G --> H[Langfuse]
G --> I[Helicone]
```
- **Cost Interceptor** (`src/interceptors/cost.interceptor.ts`) wraps every vLLM API call (via `@ai-sdk/openai-compatible`), checks budgets via `BudgetController`, records spend via `SpendStore`, and emits cost traces to Langfuse + Helicone via `TelemetryService`.
- **BudgetController** (`src/modules/budget/budget.service.ts`) enforces soft-cap warnings and hard-cap rejections per scope using `@reaatech/agent-budget-engine`.
- **SpendStore** (`src/modules/budget/spend-store.service.ts`) accumulates token counts and converts them to dollar amounts via `@reaatech/llm-cost-telemetry-calculator` with pricing data from `@reaatech/agent-budget-pricing`.
- **TelemetryService** (`src/modules/telemetry/telemetry.service.ts`) fans out cost events to the Langfuse (`src/modules/telemetry/langfuse.service.ts`) and Helicone (`src/modules/telemetry/helicone.service.ts`) backends.
## Quick Start
```bash
cp .env.example .env.local
# Edit .env.local — set VLLM_BASE_URL to your vLLM server endpoint
pnpm install
pnpm dev
```
Send a test request:
```bash
curl -X POST http://localhost:3000/api/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello"}],
"scope": "tenant-1"
}'
```
## Environment Variables
| Variable | Description |
|---|---|
| `VLLM_BASE_URL` | Base URL of the vLLM OpenAI-compatible API |
| `VLLM_MODEL` | Default model name used in requests |
| `LANGFUSE_PUBLIC_KEY` | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | Langfuse project secret key |
| `LANGFUSE_BASE_URL` | Langfuse API base URL |
| `HELICONE_API_KEY` | Helicone API key for usage telemetry |
| `DATABASE_URL` | PostgreSQL connection string for spend persistence |
| `BUDGET_DEFAULT_LIMIT` | Default dollar limit for any scope without an explicit budget |
| `BUDGET_SOFT_CAP` | Fraction of the limit that triggers a soft-cap warning (e.g., 0.8) |
| `BUDGET_HARD_CAP` | Fraction of the limit that triggers a hard-cap rejection (e.g., 1.0) |
## API Reference
### `POST /api/chat`
Send messages to a vLLM model with budget enforcement and cost telemetry.
**Request body:**
```json
{
"messages": [{ "role": "user", "content": "string" }],
"scope": "string",
"model": "string (optional)"
}
```
| Field | Description |
|---|---|
| `messages` | Array of chat messages in OpenAI format |
| `scope` | Budget scope identifier (e.g., `"agent-1"` or `"tenant-1"`) |
| `model` | Model override (defaults to `VLLM_MODEL`) |
**Response:** Server-sent events (SSE) stream of the vLLM chat completion.
### `GET /api/budget`
Query the current budget state for a scope.
**Query parameters:** `?scope=<scope-id>`
**Response:**
```json
{
"scope": "string",
"limit": 10.0,
"spent": 0.0,
"remaining": 10.0,
"softCapReached": false,
"hardCapReached": false
}
```
### `POST /api/budget`
Define or update a budget for a scope.
**Request body:**
```json
{
"scope": "string",
"limit": 50.0
}
```
**Response:** `201 Created`
### `DELETE /api/budget`
Remove a budget for a scope.
**Query parameters:** `?scope=<scope-id>`
**Response:** `204 No Content`
### `GET /api/health`
Health check endpoint.
**Response:**
```json
{ "status": "ok" }
```
## License
MIT — see [LICENSE](./LICENSE).