Small business analysts need custom reports but can’t write code. Asking an LLM to generate SQL or Python is one step — safely executing it against production data without destroying anything or blowing cloud budgets is the real challenge.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial builds a Perplexity Code Sandbox — a Next.js app that lets small business analysts ask natural-language questions and get back safe, sandboxed Python or SQL code execution. You’ll wire up Perplexity AI for code generation, the Daytona SDK for ephemeral sandboxes, the REAA circuit breaker and budget engine for safety controls, and an LLM cache with semantic similarity matching. By the end, you’ll have a working dashboard where you type “show me last month’s sales by region” and get back generated code, execution results, and a cost breakdown.
Prerequisites
Node.js >= 22
pnpm 10.x installed globally
A Perplexity API key for code generation
A Daytona API key for sandbox provisioning
An OpenAI API key for cache embeddings
A Langfuse account (optional — tracing is a graceful no-op if unset)
Familiarity with TypeScript and Next.js 16 App Router patterns
Step 1: Scaffold the Next.js project and install dependencies
Create the project with Next.js 16 (App Router) and TypeScript:
Expected output:pnpm install completes without peer-dep warnings. pnpm typecheck passes with zero errors.
Step 2: Configure environment variables
Create .env.example with all the variables the subsystems read:
env
# Env vars used by perplexity-code-sandbox-for-smb-data-reporting.# Keep placeholders only — never commit real values.NODE_ENV=development# API keysPERPLEXITY_API_KEY=<your-perplexity-key>DAYTONA_API_KEY=<your-daytona-key>OPENAI_API_KEY=<your-openai-key># Langfuse (optional — graceful no-op if unset)LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_HOST=https://cloud.langfuse.com# Budget defaultsBUDGET_DEFAULT_LIMIT=5.00# Cache defaultsCACHE_DEFAULT_TTL=3600SEMANTIC_CACHE_THRESHOLD=0.8# Circuit breaker defaultsCIRCUIT_BREAKER_FAILURE_THRESHOLD=5CIRCUIT_BREAKER_RECOVERY_TIMEOUT_MS=30000# Confidence router defaultsCONFIDENCE_ROUTE_THRESHOLD=0.8CONFIDENCE_FALLBACK_THRESHOLD=0.3
Expected output: File written. All values are placeholders — no real keys in committed code.
Step 3: Define shared types with Zod
Create src/lib/types.ts with the Zod schemas that the rest of the system shares:
Expected output: A module that reads PERPLEXITY_API_KEY from the environment and exposes generateCode() and generateRaw() methods. When the API response is malformed, it throws a typed PerplexityApiError.
Step 5: Build the Daytona sandbox adapter
Create src/lib/sandbox/daytona-client.ts. This provisions ephemeral sandboxes and runs Python or SQL code inside them:
ts
import { Daytona, Sandbox, DaytonaTimeoutError,} from "@daytonaio/sdk";import type { ExecutionResult } from "../types.js";export class SandboxTimeoutError extends Error { constructor(message: string) { super(message); this.name = "SandboxTimeoutError"; }}export interface SandboxSession { id: string; status: string;}export class DaytonaClient { daytona: Daytona; sandboxMap: Map<string, Sandbox>; readonly maxExecutionDuration: number; constructor(maxExecutionDuration: number = 30) { const apiKey = process.env.DAYTONA_API_KEY; if (!apiKey) { throw new Error( "DAYTONA_API_KEY is not set. Provide an apiKey or set the DAYTONA_API_KEY environment variable.", ); } this.daytona = new Daytona({ apiKey }); this.sandboxMap = new Map<string, Sandbox>(); this.maxExecutionDuration = maxExecutionDuration; } async createSandbox(): Promise<SandboxSession> { const sandbox = await this.daytona.create(); this.sandboxMap.set(sandbox.id, sandbox); return { id: sandbox.id, status: sandbox.state ?? "unknown" }; } async executeCode( session: SandboxSession, code: string, language: "python" | "sql", ): Promise<ExecutionResult> { const sandbox = this.sandboxMap.get(session.id); if (!sandbox) { throw new Error(`Sandbox ${session.id} not found`); } const startTime = Date.now(); try { if (language === "python") { const result = await sandbox.codeInterpreter.runCode(code, { timeout: this.maxExecutionDuration, }); return { stdout: result.stdout, stderr: result.stderr, exitCode: result.error ? 1 : 0, durationMs: Date.now() - startTime, }; } const result = await sandbox.process.codeRun( code, undefined, this.maxExecutionDuration, ); return { stdout: result.artifacts?.stdout ?? result.result, stderr: "", exitCode: result.exitCode, durationMs: Date.now() - startTime, }; } catch (err) { if (err instanceof DaytonaTimeoutError) { throw new SandboxTimeoutError( `Code execution timed out after ${String(this.maxExecutionDuration)}s`, ); } throw err; } } async destroySandbox(session: SandboxSession): Promise<void> { const sandbox = this.sandboxMap.get(session.id); if (!sandbox) { return; } await sandbox.delete(); this.sandboxMap.delete(session.id); }}let defaultClient: DaytonaClient | undefined;export function createDaytonaClient(maxExecutionDuration?: number): DaytonaClient { if (maxExecutionDuration !== undefined) { return new DaytonaClient(maxExecutionDuration); } if (defaultClient === undefined) { defaultClient = new DaytonaClient(); } return defaultClient;}
Expected output: A module that reads DAYTONA_API_KEY, provisions sandboxes with createSandbox(), runs code with executeCode(), and tears them down with destroySandbox(). Timeouts throw SandboxTimeoutError.
Step 6: Set up the circuit breaker
Create src/lib/circuit-breaker/circuit-breaker.ts. This wraps code execution with a circuit breaker that opens after N consecutive failures:
Expected output: A factory that reads CIRCUIT_BREAKER_FAILURE_THRESHOLD and CIRCUIT_BREAKER_RECOVERY_TIMEOUT_MS from the environment. When the circuit is open, withCircuitBreaker returns a fallback object instead of throwing.
Step 7: Wire up the budget controller
Create src/lib/budget/budget-controller.ts. This enforces per-scope spend limits with soft and hard caps:
Expected output: A budget controller that wraps BudgetController+SpendStore from the REAA packages. It exposes defineScope, checkBudget, recordSpend, getState, and listAll. The default budget limit comes from BUDGET_DEFAULT_LIMIT.
Step 8: Implement the confidence router
Create src/lib/confidence/confidence-router.ts. This classifies natural-language queries to decide whether to route, clarify, or fall back:
Expected output:initLangfuse() returns a Langfuse instance when all env vars are set, or null when keys are missing. withTrace() creates a trace, runs your function, and finalizes it — errors are logged, not re-thrown.
Step 12: Wire up the orchestration service
Create src/services/generate-service.ts. This is the central pipeline that coordinates all subsystems:
ts
import { nanoid } from "nanoid";import { z } from "zod";import type { PerplexityClient } from "../lib/llm/perplexity-client.js";import type { DaytonaClient } from "../lib/sandbox/daytona-client.js";import type { CircuitBreaker } from "@reaatech/circuit-breaker-agents";import type { BudgetControllerInstance } from "../lib/budget/budget-controller.js";import type { CacheEngine } from "@reaatech/llm-cache";type PerplexityClientLike = Pick<PerplexityClient, "generateCode" | "generateRaw">;type DaytonaClientLike = Pick<
import { type NextRequest, NextResponse } from "next/server";import { health, invalidateByModel } from "../../../src/lib/cache/cache-engine.js";export async function GET() { const result = await health(); return NextResponse.json(result, { status: 200 });}export async function DELETE(req: NextRequest) { try { const body = (await req.json()) as { modelVersion?: string }; if (!body.modelVersion) { return NextResponse.json( { error: "modelVersion is required" }, { status: 400 }, ); } const result = await invalidateByModel(body.modelVersion); return NextResponse.json(result, { status: 200 }); } catch { return NextResponse.json( { error: "Invalid request body" }, { status: 400 }, ); }}
Expected output: Three route files under app/api/. All use named function exports (POST, GET, DELETE) as required by the App Router. Invalid bodies return 400, the circuit breaker returns 503, budget violations return 429.
Step 14: Build the dashboard UI
Replace app/page.tsx with a “use client” component that lets the user type a query, submit it to the API, and see the result:
Expected output: A form-based dashboard at the root URL. Type a query, click “Generate”, and see the response, generated code, execution output, and cost breakdown.
Step 15: Run the tests
Create test files under tests/ mirroring the source structure. Here’s one example — the structured output repair test at tests/src/lib/structured-output/repair.test.ts:
ts
import { describe, it, expect } from "vitest";import { repairCodeBlock, repairJson, validateGeneratedCode, RepairError,} from "../../../../src/lib/structured-output/repair";describe("repairCodeBlock", () => { it("strips python fences", () => { const raw = "```python\nprint('hello')\n```"; expect(repairCodeBlock(raw)).toBe("print('hello')"); }); it("strips sql fences", () => { const raw = "```sql\nSELECT * FROM users;\n```"; expect(repairCodeBlock(raw)).toBe("SELECT * FROM users;"); }); it("strips generic fences", () => { const raw = "```\nplain code\n```"; expect(repairCodeBlock(raw)).toBe("plain code"); }); it("trims leading and trailing whitespace", () => { const raw = " \n const x = 1; \n "; expect(repairCodeBlock(raw)).toBe("const x = 1;"); }); it("returns empty string for empty input", () => { expect(repairCodeBlock("")).toBe(""); });});describe("repairJson", () => { it("fixes trailing comma", () => { const raw = '{"key": "value",}'; const result = repairJson(raw); expect(result).toEqual({ key: "value" }); }); it("repairs single-quoted keys", () => { const raw = "{'name': 'Alice'}"; const result = repairJson(raw); expect(result).toEqual({ name: "Alice" }); }); it("throws RepairError on unrecoverable input", () => { expect(() => repairJson("{key unquoted,}")).toThrow(RepairError); });});describe("validateGeneratedCode", () => { describe("python", () => { it("rejects os.system", () => { const result = validateGeneratedCode('import os\nos.system("ls")', "python"); expect(result.valid).toBe(false); expect(result.errors).toContain("Dangerous import: os.system"); }); it("passes clean python code", () => { const code = 'print("hello")'; const result = validateGeneratedCode(code, "python"); expect(result.valid).toBe(true); expect(result.errors).toHaveLength(0); }); }); describe("sql", () => { it("rejects DROP TABLE", () => { const code = "DROP TABLE users;"; const result = validateGeneratedCode(code, "sql"); expect(result.valid).toBe(false); expect(result.errors).toContain("DDL not allowed: DROP TABLE"); }); it("passes valid SELECT", () => { const code = "SELECT * FROM users WHERE id = 1;"; const result = validateGeneratedCode(code, "sql"); expect(result.valid).toBe(true); expect(result.errors).toHaveLength(0); }); });});
Now run the test suite:
terminal
pnpm vitest run --coverage --reporter=json --outputFile=vitest-report.json
Expected output: All tests pass (at least 60 total). Coverage report shows lines/branches/functions/statements each at 90% or higher on runtime code.
Next steps
Add Redis-backed persistence — swap the circuit breaker’s InMemoryAdapter for a RedisAdapter so the circuit state survives server restarts across a multi-replica deployment.
Extend budget scopes — replace the single “default” scope with per-user or per-organization scopes, read from a database instead of environment variables.
Add SQL execution support — the confidence router classifier already routes SQL-related queries; wire them through daytona.executeCode() with language: "sql" and add SQL-specific validation.
Deploy with Langfuse monitoring — set up Langfuse production mode with real keys to visualize trace spans and measure per-query cost across your analytics pipeline.