Files · Cohere LLM Cost Observability for SMB Support Agents
60 (1 binary, 537.9 kB total)attempt 1
README.md·3463 B·markdown
markdown
# Cohere LLM Cost Observability for SMB Support Agents
> Wrap every Cohere API call with cost telemetry and OTel spans so SMBs can see exactly where their LLM budget goes and stop cost overruns.
## Problem
Small businesses running Cohere-powered support bots have no per-call cost visibility; a single verbose handling loop can silently triple the monthly bill. This recipe layers `@reaatech/llm-cost-telemetry` onto the Cohere SDK to capture token counts and calculate cost in real time, `@reaatech/otel-genai-semconv-core` to emit spec-compliant OpenTelemetry spans, and Langfuse for dashboarding.
## Architecture
```
Cohere SDK → InstrumentedCohereClient → TracingManager / MetricsManager (OTLP) → Langfuse
→ CostLogger (Pino)
→ CostStore (in-memory) → app/api/dashboard API
→ BudgetWatcher (polling loop) → Pino alerts on threshold breach
```
## Packages Used
| Package | Version | Role |
|---|---|---|
| `@reaatech/llm-cost-telemetry` | 0.2.0 | Foundation types, schemas, utilities |
| `@reaatech/llm-cost-telemetry-calculator` | 0.1.1 | Cost calculation with custom pricing |
| `@reaatech/llm-cost-telemetry-observability` | 0.1.1 | OTel tracing, metrics, Pino logger |
| `@reaatech/otel-genai-semconv-core` | 0.1.0 | GenAI semantic convention types |
| `cohere-ai` | 8.0.0 | Cohere TypeScript SDK |
| `helicone` | 1.0.7 | LLM proxy observability |
| `langfuse` | 3.38.20 | LLM observability and tracing |
| `pino` | 10.3.1 | Structured JSON logger |
| `zod` | 4.4.3 | Runtime schema validation |
## Quick Start
```bash
pnpm install
cp .env.example .env # fill in your COHERE_API_KEY and LANGFUSE keys
pnpm dev # start Next.js dev server
pnpm test # run vitest with coverage
```
## API Reference
### `POST /api/dashboard`
Ingest a cost span:
```json
{
"id": "span-uuid",
"provider": "cohere",
"model": "command-a-03-2025",
"inputTokens": 100,
"outputTokens": 50,
"costUsd": 0.001,
"tenant": "acme-corp",
"feature": "chat-support",
"timestamp": "2026-06-20T00:00:00.000Z"
}
```
Returns `201 { stored: true, id }` on success, `400 { error, details }` on validation failure.
### `GET /api/dashboard?tenantId=<id>&granularity=day&startDate=<iso>&endDate=<iso>`
Returns aggregated spend records bucketed by the selected granularity.
## Environment Variables
| Variable | Default | Purpose |
|---|---|---|
| `COHERE_API_KEY` | — | Cohere API key (read by our config) |
| `CO_API_KEY` | — | Cohere API key (read by the Cohere SDK) |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse secret key |
| `LANGFUSE_HOST` | `https://us.cloud.langfuse.com` | Langfuse host |
| `OTEL_SERVICE_NAME` | `cohere-cost-observability` | OTel service name |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | — | OTLP collector endpoint |
| `DEFAULT_DAILY_BUDGET` | `100` | Daily budget in USD |
| `TENANT_BUDGETS` | — | JSON override for per-tenant budgets |
| `POLL_INTERVAL_MS` | `60000` | Budget watcher polling interval |
| `BUDGET_WARN_THRESHOLD` | `80` | Warning threshold percentage |
| `BUDGET_CRIT_THRESHOLD` | `95` | Critical threshold percentage |
## Testing
```bash
pnpm test # vitest run with coverage
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
```
## License
MIT — see [LICENSE](./LICENSE).