Files · Cohere Code Sandbox for Mixpanel Small Business Analytics
71 (1 binary, 568.8 kB total)attempt 1
README.md·6404 B·markdown
markdown
# Cohere Code Sandbox for Mixpanel Small Business Analytics
> Let SMBs ask natural-language questions about their Mixpanel data and get safe, budget-constrained Python code executed in a sandbox.
A tutorialized reference solution from [reaatech.com](https://reaatech.com), demonstrating how to build a self-service Mixpanel analyst powered by Cohere LLM, e2b sandboxed Python execution, and the `@reaatech/*` package family for budget enforcement, LLM caching, structured output repair, and cost telemetry.
## How it works
```
User query → Budget check → Cache lookup → Mixpanel events fetch → Cohere code generation → e2b sandbox execution → Structured repair → Cached result → Cost recording
```
1. A user submits a natural-language query about their Mixpanel data
2. `@reaatech/agent-budget-engine` checks the user's remaining budget — if exhausted, returns a capped response
3. `@reaatech/llm-cache` looks up the query hash — if a semantically similar query was answered before, returns the cached result
4. Mixpanel's REST API (`/api/2.0/events`) fetches recent event data
5. Cohere's chat API generates Python analysis code from the event data and user query
6. The generated Python code executes inside an e2b sandbox with a 30-second timeout
7. `@reaatech/structured-repair-core` fixes any malformed LLM output (markdown fences, type coercion, missing fields)
8. The result is cached for future queries; spend is recorded against the user's budget
9. `@reaatech/llm-cost-telemetry` records a `CostSpan` for observability
## Architecture
```
app/
page.tsx Dashboard UI (query input, result display, cost metadata)
api/analyze/route.ts POST handler — validates input, calls analyzer, returns response
src/
lib/
types.ts Shared types and Zod schemas
constants.ts Configuration constants (model IDs, pricing, env defaults)
spend-store.ts In-memory SpendStore for BudgetController
mixpanel-client.ts Mixpanel REST API client (fetch + retry)
langfuse-client.ts Langfuse observability tracing
services/
budget-service.ts BudgetController wrapper (@reaatech/agent-budget-engine)
telemetry-service.ts CostSpan creation and recording (@reaatech/llm-cost-telemetry)
cache-service.ts CacheEngine wrapper (@reaatech/llm-cache)
repair-service.ts LLM output repair (@reaatech/structured-repair-core)
cohere-service.ts Cohere chat API client
sandbox-service.ts e2b sandbox code execution (concurrency-limited)
analyzer-service.ts Main orchestrator — ties all services together
tests/ Vitest test suite (mirrors src/)
```
## Dependencies
### REAA packages
| Package | Purpose |
|---------|---------|
| `@reaatech/agent-budget-engine` | Budget enforcement with pre-flight checks, spend recording, and per-user state machine |
| `@reaatech/llm-cache` | Semantic and exact-match caching for LLM queries |
| `@reaatech/llm-cost-telemetry` | CostSpan types, Zod schemas, and cost calculation utilities |
| `@reaatech/structured-repair-core` | Zod schema–driven repair of malformed LLM output |
### Third-party packages
| Package | Purpose |
|---------|---------|
| `cohere-ai` | Cohere chat API for Python code generation |
| `e2b` | Sandboxed Python runtime for safe code execution |
| `langfuse` | LLM observability tracing |
| `zod` | Runtime schema validation |
| `p-retry` | Retry logic for API calls |
| `p-limit` | Concurrency limiting for sandbox executions |
| `nanoid` | Unique ID generation |
| `dotenv` | Environment variable loading |
## Prerequisites
- Node.js >= 22
- pnpm >= 10
## Setup
1. Install dependencies:
```bash
pnpm install
```
2. Copy the environment file and fill in your API keys:
```bash
cp .env.example .env
```
3. Required API keys:
- `COHERE_API_KEY` — from [Cohere Dashboard](https://dashboard.cohere.com/)
- `E2B_API_KEY` — from [E2B Dashboard](https://e2b.dev/dashboard)
- `MIXPANEL_OAUTH_TOKEN` and `MIXPANEL_PROJECT_ID` — from Mixpanel settings
- `OPENAI_API_KEY` — for the llm-cache embedding provider
- `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` — from [Langfuse](https://langfuse.com)
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `COHERE_API_KEY` | — | Cohere API key for LLM code generation |
| `E2B_API_KEY` | — | E2B API key for sandboxed Python execution |
| `MIXPANEL_OAUTH_TOKEN` | — | Mixpanel OAuth token for analytics API |
| `MIXPANEL_PROJECT_ID` | — | Mixpanel project identifier |
| `OPENAI_API_KEY` | — | OpenAI API key for LLM cache embedding provider |
| `LANGFUSE_PUBLIC_KEY` | — | Langfuse public key for observability |
| `LANGFUSE_SECRET_KEY` | — | Langfuse secret key for observability |
| `DEFAULT_USER_BUDGET_LIMIT` | `10.0` | Per-user daily budget in USD |
| `BUDGET_SOFT_CAP` | `0.8` | Fraction of budget that triggers warning state |
| `BUDGET_HARD_CAP` | `1.0` | Fraction of budget that blocks new requests |
| `CACHE_SIMILARITY_THRESHOLD` | `0.8` | Cosine similarity threshold for semantic cache hits |
| `CACHE_DEFAULT_TTL` | `3600` | Default cache TTL in seconds |
| `COHERE_MODEL` | `command-a-03-2025` | Cohere model identifier for code generation |
## Running
```bash
pnpm dev # Start Next.js development server
pnpm build # Production build
pnpm typecheck # TypeScript type checking
pnpm lint # ESLint
pnpm test # Vitest with coverage
```
## API
### POST /api/analyze
Request:
```json
{
"query": "What are the top 5 events this week?",
"userId": "user-123"
}
```
Response (success):
```json
{
"result": {
"summary": "The top event is 'Page View' with 12,450 occurrences...",
"dataPoints": [12450, 8300, 5200, 3100, 1800],
"insights": ["Page View dominates at 40% of all events", "Sign Up dropped 15% week-over-week"]
},
"cached": false,
"capped": false,
"cost": 0.0023,
"queryHash": "abc123def456",
"inputTokens": 450,
"outputTokens": 120
}
```
Response (capped):
```json
{
"result": "Budget exhausted. Try again later or contact support.",
"cached": false,
"capped": true,
"cost": 0,
"queryHash": "...",
"inputTokens": 0,
"outputTokens": 0
}
```
## License
MIT — see [LICENSE](./LICENSE).