Skip to content
reaatech

Files · Cohere LLM Cost Observability for SMB Support Agents

60 (1 binary, 537.9 kB total)attempt 1

README.md·3463 B·markdown
markdown
# Cohere LLM Cost Observability for SMB Support Agents
 
> Wrap every Cohere API call with cost telemetry and OTel spans so SMBs can see exactly where their LLM budget goes and stop cost overruns.
 
## Problem
 
Small businesses running Cohere-powered support bots have no per-call cost visibility; a single verbose handling loop can silently triple the monthly bill. This recipe layers `@reaatech/llm-cost-telemetry` onto the Cohere SDK to capture token counts and calculate cost in real time, `@reaatech/otel-genai-semconv-core` to emit spec-compliant OpenTelemetry spans, and Langfuse for dashboarding.
 
## Architecture
 
```
Cohere SDK → InstrumentedCohereClient → TracingManager / MetricsManager (OTLP) → Langfuse
                                        → CostLogger (Pino)
                                        → CostStore (in-memory) → app/api/dashboard API
                                        → BudgetWatcher (polling loop) → Pino alerts on threshold breach
```
 
## Packages Used
 
| Package | Version | Role |
|---|---|---|
| `@reaatech/llm-cost-telemetry` | 0.2.0 | Foundation types, schemas, utilities |
| `@reaatech/llm-cost-telemetry-calculator` | 0.1.1 | Cost calculation with custom pricing |
| `@reaatech/llm-cost-telemetry-observability` | 0.1.1 | OTel tracing, metrics, Pino logger |
| `@reaatech/otel-genai-semconv-core` | 0.1.0 | GenAI semantic convention types |
| `cohere-ai` | 8.0.0 | Cohere TypeScript SDK |
| `helicone` | 1.0.7 | LLM proxy observability |
| `langfuse` | 3.38.20 | LLM observability and tracing |
| `pino` | 10.3.1 | Structured JSON logger |
| `zod` | 4.4.3 | Runtime schema validation |
 
## Quick Start
 
```bash
pnpm install
cp .env.example .env   # fill in your COHERE_API_KEY and LANGFUSE keys
pnpm dev               # start Next.js dev server
pnpm test              # run vitest with coverage
```
 
## API Reference
 
### `POST /api/dashboard`
 
Ingest a cost span:
```json
{
  "id": "span-uuid",
  "provider": "cohere",
  "model": "command-a-03-2025",
  "inputTokens": 100,
  "outputTokens": 50,
  "costUsd": 0.001,
  "tenant": "acme-corp",
  "feature": "chat-support",
  "timestamp": "2026-06-20T00:00:00.000Z"
}
```
 
Returns `201 { stored: true, id }` on success, `400 { error, details }` on validation failure.
 
### `GET /api/dashboard?tenantId=<id>&granularity=day&startDate=<iso>&endDate=<iso>`
 
Returns aggregated spend records bucketed by the selected granularity.
 
## Environment Variables
 
| Variable | Default | Purpose |
|---|---|---|
| `COHERE_API_KEY` | — | Cohere API key (read by our config) |
| `CO_API_KEY` | — | Cohere API key (read by the Cohere SDK) |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse public key |
| `LANGFUSE_SECRET_KEY` | — | Langfuse secret key |
| `LANGFUSE_HOST` | `https://us.cloud.langfuse.com` | Langfuse host |
| `OTEL_SERVICE_NAME` | `cohere-cost-observability` | OTel service name |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | — | OTLP collector endpoint |
| `DEFAULT_DAILY_BUDGET` | `100` | Daily budget in USD |
| `TENANT_BUDGETS` | — | JSON override for per-tenant budgets |
| `POLL_INTERVAL_MS` | `60000` | Budget watcher polling interval |
| `BUDGET_WARN_THRESHOLD` | `80` | Warning threshold percentage |
| `BUDGET_CRIT_THRESHOLD` | `95` | Critical threshold percentage |
 
## Testing
 
```bash
pnpm test        # vitest run with coverage
pnpm typecheck   # TypeScript type checking
pnpm lint        # ESLint
```
 
## License
 
MIT — see [LICENSE](./LICENSE).