Skip to content
reaatech

Files · Cohere Code Sandbox for Mixpanel Small Business Analytics

71 (1 binary, 568.8 kB total)attempt 1

README.md·6404 B·markdown
markdown
# Cohere Code Sandbox for Mixpanel Small Business Analytics
 
> Let SMBs ask natural-language questions about their Mixpanel data and get safe, budget-constrained Python code executed in a sandbox.
 
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build a self-service Mixpanel analyst powered by Cohere LLM, e2b sandboxed Python execution, and the `@reaatech/*` package family for budget enforcement, LLM caching, structured output repair, and cost telemetry.
 
## How it works
 
```
User query → Budget check → Cache lookup → Mixpanel events fetch → Cohere code generation → e2b sandbox execution → Structured repair → Cached result → Cost recording
```
 
1. A user submits a natural-language query about their Mixpanel data
2. `@reaatech/agent-budget-engine` checks the user's remaining budget — if exhausted, returns a capped response
3. `@reaatech/llm-cache` looks up the query hash — if a semantically similar query was answered before, returns the cached result
4. Mixpanel's REST API (`/api/2.0/events`) fetches recent event data
5. Cohere's chat API generates Python analysis code from the event data and user query
6. The generated Python code executes inside an e2b sandbox with a 30-second timeout
7. `@reaatech/structured-repair-core` fixes any malformed LLM output (markdown fences, type coercion, missing fields)
8. The result is cached for future queries; spend is recorded against the user's budget
9. `@reaatech/llm-cost-telemetry` records a `CostSpan` for observability
 
## Architecture
 
```
app/
  page.tsx                  Dashboard UI (query input, result display, cost metadata)
  api/analyze/route.ts      POST handler — validates input, calls analyzer, returns response
src/
  lib/
    types.ts                Shared types and Zod schemas
    constants.ts            Configuration constants (model IDs, pricing, env defaults)
    spend-store.ts          In-memory SpendStore for BudgetController
    mixpanel-client.ts      Mixpanel REST API client (fetch + retry)
    langfuse-client.ts      Langfuse observability tracing
  services/
    budget-service.ts       BudgetController wrapper (@reaatech/agent-budget-engine)
    telemetry-service.ts    CostSpan creation and recording (@reaatech/llm-cost-telemetry)
    cache-service.ts        CacheEngine wrapper (@reaatech/llm-cache)
    repair-service.ts       LLM output repair (@reaatech/structured-repair-core)
    cohere-service.ts       Cohere chat API client
    sandbox-service.ts      e2b sandbox code execution (concurrency-limited)
    analyzer-service.ts     Main orchestrator — ties all services together
tests/                      Vitest test suite (mirrors src/)
```
 
## Dependencies
 
### REAA packages
| Package | Purpose |
|---------|---------|
| `@reaatech/agent-budget-engine` | Budget enforcement with pre-flight checks, spend recording, and per-user state machine |
| `@reaatech/llm-cache` | Semantic and exact-match caching for LLM queries |
| `@reaatech/llm-cost-telemetry` | CostSpan types, Zod schemas, and cost calculation utilities |
| `@reaatech/structured-repair-core` | Zod schema–driven repair of malformed LLM output |
 
### Third-party packages
| Package | Purpose |
|---------|---------|
| `cohere-ai` | Cohere chat API for Python code generation |
| `e2b` | Sandboxed Python runtime for safe code execution |
| `langfuse` | LLM observability tracing |
| `zod` | Runtime schema validation |
| `p-retry` | Retry logic for API calls |
| `p-limit` | Concurrency limiting for sandbox executions |
| `nanoid` | Unique ID generation |
| `dotenv` | Environment variable loading |
 
## Prerequisites
 
- Node.js >= 22
- pnpm >= 10
 
## Setup
 
1. Install dependencies:
   ```bash
   pnpm install
   ```
 
2. Copy the environment file and fill in your API keys:
   ```bash
   cp .env.example .env
   ```
 
3. Required API keys:
   - `COHERE_API_KEY` — from [Cohere Dashboard](https://dashboard.cohere.com/)
   - `E2B_API_KEY` — from [E2B Dashboard](https://e2b.dev/dashboard)
   - `MIXPANEL_OAUTH_TOKEN` and `MIXPANEL_PROJECT_ID` — from Mixpanel settings
   - `OPENAI_API_KEY` — for the llm-cache embedding provider
   - `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` — from [Langfuse](https://langfuse.com)
 
## Environment Variables
 
| Variable | Default | Description |
|----------|---------|-------------|
| `COHERE_API_KEY` | — | Cohere API key for LLM code generation |
| `E2B_API_KEY` | — | E2B API key for sandboxed Python execution |
| `MIXPANEL_OAUTH_TOKEN` | — | Mixpanel OAuth token for analytics API |
| `MIXPANEL_PROJECT_ID` | — | Mixpanel project identifier |
| `OPENAI_API_KEY` | — | OpenAI API key for LLM cache embedding provider |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse public key for observability |
| `LANGFUSE_SECRET_KEY` | — | Langfuse secret key for observability |
| `DEFAULT_USER_BUDGET_LIMIT` | `10.0` | Per-user daily budget in USD |
| `BUDGET_SOFT_CAP` | `0.8` | Fraction of budget that triggers warning state |
| `BUDGET_HARD_CAP` | `1.0` | Fraction of budget that blocks new requests |
| `CACHE_SIMILARITY_THRESHOLD` | `0.8` | Cosine similarity threshold for semantic cache hits |
| `CACHE_DEFAULT_TTL` | `3600` | Default cache TTL in seconds |
| `COHERE_MODEL` | `command-a-03-2025` | Cohere model identifier for code generation |
 
## Running
 
```bash
pnpm dev              # Start Next.js development server
pnpm build            # Production build
pnpm typecheck        # TypeScript type checking
pnpm lint             # ESLint
pnpm test             # Vitest with coverage
```
 
## API
 
### POST /api/analyze
 
Request:
```json
{
  "query": "What are the top 5 events this week?",
  "userId": "user-123"
}
```
 
Response (success):
```json
{
  "result": {
    "summary": "The top event is 'Page View' with 12,450 occurrences...",
    "dataPoints": [12450, 8300, 5200, 3100, 1800],
    "insights": ["Page View dominates at 40% of all events", "Sign Up dropped 15% week-over-week"]
  },
  "cached": false,
  "capped": false,
  "cost": 0.0023,
  "queryHash": "abc123def456",
  "inputTokens": 450,
  "outputTokens": 120
}
```
 
Response (capped):
```json
{
  "result": "Budget exhausted. Try again later or contact support.",
  "cached": false,
  "capped": true,
  "cost": 0,
  "queryHash": "...",
  "inputTokens": 0,
  "outputTokens": 0
}
```
 
## License
 
MIT — see [LICENSE](./LICENSE).