Files · Ollama AI Observability with Cost Allocation for SMBs

61 (1 binary, 620.3 kB total)attempt 1

README.md·3868 B·markdown

markdown

# Ollama AI Observability with Cost Allocation for SMBs
 
> Gain OpenTelemetry tracing and per-department cost attribution for your Ollama LLM deployments running on-prem or at the edge.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build production-grade AI systems with the `@reaatech/*` package family.
 
## Problem
 
On-prem LLM deployments lack visibility: IT teams can't tell which departments are consuming tokens, how much each call costs in terms of compute or proxy fees, or where bottlenecks occur. Without observability, they can't optimize or perform internal chargebacks.
 
## Architecture
 
```
Ollama call → OTel spans (via @reaatech/otel-genai-semconv-instrumentation)
           → cost calculation (via @reaatech/llm-cost-telemetry-calculator)
           → aggregation (via @reaatech/llm-cost-telemetry-aggregation)
           → export to Langfuse + Traceloop
```
 
## API Reference
 
- **POST /api/chat** — Send a prompt with department metadata; receive the response with traceId and costUsd.
  Headers: `x-department`, `x-tenant-id`
  Body: `{ model: string, messages: Array<{role, content}>, stream?: boolean }`
- **GET /api/dash** — Get aggregated cost data and budget status.
  Query: `?period=day|month&tenant=engineering`
 
## Environment Variables
 
| Variable | Description |
|---|---|
| `OLLAMA_HOST` | Ollama server URL (default: http://127.0.0.1:11434) |
| `OLLAMA_DEFAULT_MODEL` | Default model to use |
| `LANGFUSE_PUBLIC_KEY` | Langfuse project public key |
| `LANGFUSE_SECRET_KEY` | Langfuse project secret key |
| `LANGFUSE_BASE_URL` | Langfuse API base URL |
| `TRACELOOP_API_KEY` | Traceloop API key |
| `DEFAULT_DEPARTMENT` | Default department header fallback |
| `DEPARTMENT_BUDGETS` | JSON mapping of per-department daily/monthly budgets |
| `PINO_LOG_LEVEL` | Pino logger level (default: info) |
 
## Cost Allocation
 
All API calls are tagged with department metadata via the `x-department` and `x-tenant-id` HTTP headers. Costs are calculated per-call using a configurable pricing model and aggregated by department. Budget enforcement uses cascading alert thresholds at 50% (log), 75% (notify), and 90% (block).
 
## OpenTelemetry Export
 
Traces are visible in Langfuse for trace visualization and in Traceloop for OpenAI-compatible telemetry dashboards.
 
## Running locally
 
### Prerequisites
 
- **Node.js** >= 22
- **pnpm** >= 10
- **Ollama** running locally (default: http://127.0.0.1:11434) with a model pulled (e.g., `ollama pull llama3.1`)
- Langfuse account (for trace visualization) — optional, falls back to Traceloop
- Traceloop account (for OpenLLMetry observability) — optional
 
### Setup
 
1. Clone the repo and install dependencies:
   ```bash
   pnpm install
   ```
 
2. Copy `.env.example` to `.env.local` and fill in your configuration:
   ```bash
   cp .env.example .env.local
   ```
 
3. Start the development server:
   ```bash
   pnpm dev
   ```
 
### Example: Send a chat request
 
```bash
curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -H "x-department: engineering" \
  -H "x-tenant-id: acme-corp" \
  -d '{"model":"llama3.1","messages":[{"role":"user","content":"What is OpenTelemetry?"}]}'
```
 
### Example: View the cost dashboard
 
```bash
curl http://localhost:3000/api/dash?period=day
```
 
### Run tests
 
```bash
pnpm test            # vitest run with coverage
pnpm typecheck       # TypeScript type checking
pnpm lint            # ESLint
```
 
## Project layout
 
```
app/                  Next.js App Router pages + API routes
src/                  services, lib, adapters
tests/                vitest suite (mirrors src/)
packages/             API references for every dependency (read these first)
DEV_PLAN.md           build plan for this recipe
```
 
## License
 
MIT — see [LICENSE](./LICENSE).