OpenRouter Code Sandbox for SMB Financial Modeling
A secure, budget‑aware AI code sandbox that lets non‑technical SMB analysts run Python financial models and get repaired, validated results without writing code.
Small business analysts need to run what‑if financial scenarios but lack coding skills or safe execution environments. Manual spreadsheets are error‑prone, and exposing LLM‑generated code to a live environment risks data loss or cost overruns.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a secure, budget-aware AI code sandbox for SMB financial modeling. You’ll create a Next.js API that accepts natural-language financial modeling queries, routes them through a confidence classifier, generates Python code via OpenRouter, checks budget limits, validates the generated code against dangerous system calls, executes it inside an E2B sandbox, repairs malformed output, records cost telemetry, and persists the conversation — all without writing a single line of Python.
You’ll wire up six REAA packages — confidence-router, agent-budget-engine, tool-use-firewall-core, structured-repair-core, llm-cost-telemetry, and session-continuity — along with OpenRouter’s provider, E2B’s code sandbox, and Upstash Redis for session storage.
Expected output: A fresh Next.js 16 project with a src/ directory, app/ directory, pnpm-lock.yaml, and all config files in place.
Step 2: Install the production dependencies
The recipe depends on REAA packages, the OpenRouter AI SDK provider, E2B, Upstash Redis, Zod, Langfuse, and the Vercel AI SDK. Pin every version exactly.
Expected output: pnpm resolves and installs every package plus their transitive dependencies. package.json now lists them under dependencies with exact version strings.
Step 3: Configure environment variables
Create a .env file with all the API keys and configuration values. The recipe reads these at runtime through a typed config loader.
terminal
cp .env.example .env
Fill in your .env file with real values. The required variables are:
LANGFUSE_HOST defaults to https://cloud.langfuse.com if not set; leave the placeholder if you use the cloud version. Set it to your self-hosted URL if needed.
Expected output: A .env file at the project root with all keys populated. The recipe’s loadAppConfig() function reads these — if any required variable is missing, it throws immediately at startup.
Step 4: Define shared types and the config loader
Start with the core types that flow through every pipeline stage. Create src/lib/types.ts:
Expected output: A clean TypeScript file with no errors. The CodeGenOutputSchema Zod schema is used later by structured-repair-core to validate LLM output.
Now create the config loader in src/lib/config.ts:
Expected output:loadAppConfig() returns a typed AppConfig object. If OPENROUTER_API_KEY, E2B_API_KEY, UPSTASH_REDIS_REST_URL, UPSTASH_REDIS_REST_TOKEN, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, or LANGFUSE_HOST are missing, it throws immediately.
Step 5: Wire up the confidence router for query classification
The @reaatech/confidence-router package decides whether a query is clear enough to execute, needs clarification, or should fall back. Create src/lib/classifier.ts:
Expected output: The ConfidenceRouter constructor accepts routeThreshold (above which it routes directly), fallbackThreshold (below which it refuses), and clarificationEnabled (mid-range queries ask a clarifying question). classifyQuery delegates to router.process(query) and returns the decision.
Step 6: Build the budget controller
The @reaatech/agent-budget-engine package caps costs per tenant. Create src/lib/budget.ts:
Expected output:createBudgetController() instantiates a BudgetController backed by an in-memory SpendStore. checkBudget() tests whether an estimated cost fits within the tenant’s budget, and recordSpend() logs actual cost after execution.
Step 7: Implement the firewall for generated code
The @reaatech/tool-use-firewall-core package audits and restricts generated code. Create src/lib/firewall.ts:
ts
import { Logger, redact, safeRegExp, PolicyViolationError } from "@reaatech/tool-use-firewall-core";export function createAuditLogger(): Logger { return new Logger("CodeSandbox");}export function validateCodeInput(code: string): void { const dangerousPatterns = [ safeRegExp("rm\\s+(-[rf]+\\s+)?[/~]"), safeRegExp("exec\\s*\\("), safeRegExp("os\\.system\\s*\\("), safeRegExp("subprocess\\.(call|Popen|run)\\s*\\("), ]; for (const pattern of dangerousPatterns) { if (pattern.test(code)) { const err = new PolicyViolationError({ message: "Dangerous system call blocked in generated code", }); Object.assign(err, { code: "POLICY_VIOLATION" }); throw err; } }}export function sanitizeLogData( data: Record<string, unknown>): Record<string, unknown> { return redact(data) as Record<string, unknown>;}
Expected output:validateCodeInput() checks generated Python code for rm -rf, exec(, os.system(, and subprocess.*( patterns. If any match, it throws a PolicyViolationError. sanitizeLogData() wraps redact() to strip API keys from log entries.
Step 8: Wire up the structured repair core
The @reaatech/structured-repair-core package fixes malformed LLM JSON output — it strips markdown fences, fixes trailing commas, and retries parse failures. Create src/lib/repair.ts:
ts
import { repair, isValid as repairValid } from "@reaatech/structured-repair-core";import { CodeGenOutputSchema, type CodeGenOutput } from "./types.js";export async function repairCodeOutput(rawOutput: string): Promise<CodeGenOutput> { try { return await repair(CodeGenOutputSchema, rawOutput); } catch { return { code: "", explanation: "repair failed", language: "unknown", }; }}export function isValidCodeOutput(raw: unknown): boolean { try { return repairValid(CodeGenOutputSchema, raw as string); } catch { return false; }}
Expected output:repairCodeOutput() passes the raw LLM text and the CodeGenOutputSchema Zod schema to repair(). If the repair itself fails (unrecoverably malformed output), it returns a safe fallback.
Step 9: Add cost telemetry with llm-cost-telemetry and Langfuse
The @reaatech/llm-cost-telemetry package generates cost spans from token and execution-time data. Create src/lib/telemetry.ts:
Expected output:createCostSpan() computes a combined cost from LLM tokens (via calculateCostFromTokens at a $30/Mtok rate) and sandbox execution time (at $0.001/second), then validates the span through CostSpanSchema.parse(). initLangfuse() is called once at service startup.
Step 10: Build the LLM code generation module
This module calls OpenRouter through the Vercel AI SDK to generate Python financial models. Create src/lib/llm.ts:
ts
import { generateText } from "ai";import { openrouter } from "@openrouter/ai-sdk-provider";export async function generateFinancialCode( query: string, modelId: string): Promise<{ raw: string; usage: { inputTokens: number; outputTokens: number } }> { const result = await generateText({ model: openrouter(modelId), system: "You are a financial modeling assistant. Generate Python code in response to the user query. Output MUST be JSON with keys: code, explanation, language. The code key contains Python source. Do NOT include markdown fences.", prompt: query, }); return { raw: result.text, usage: { inputTokens: result.usage.inputTokens ?? 0, outputTokens: result.usage.outputTokens ?? 0, }, };}export async function generateClarification( query: string, options: string[], modelId: string): Promise<string> { const result = await generateText({ model: openrouter(modelId), system: `The user query is ambiguous. Given these possible interpretations: ${options.join(", ")}, ask the user ONE clarifying question to disambiguate with these specific options.`, prompt: query, }); return result.text;}
Expected output:generateFinancialCode() creates an AI SDK model via openrouter(modelId), sends a system prompt instructing the model to output structured JSON, and returns both the raw text and token usage. generateClarification() is used when the confidence router returns a CLARIFY decision.
Step 11: Create the E2B sandbox wrapper
The E2B sandbox provides a secure, isolated environment to execute generated Python code. Create src/lib/sandbox.ts:
Expected output:withSandbox() creates an E2B sandbox, passes it to your callback, and always calls sandbox.kill() in the finally block. executePythonCode() writes the code to a file, runs it with python3, and returns stdout, stderr, exit code, and wall-clock time.
Step 12: Build the Upstash Redis storage adapter
The @reaatech/session-continuity package needs an IStorageAdapter to persist sessions and messages. Create src/lib/upstash-adapter.ts:
ts
import { Redis } from "@upstash/redis";import type { IStorageAdapter, Session, Message, HealthStatus } from "@reaatech/session-continuity";export class UpstashStorageAdapter implements IStorageAdapter { constructor(private redis: Redis) {} async createSession( session: Omit<Session, "id" | "createdAt" | "lastActivityAt"> ): Promise<Session> { const id = crypto.randomUUID(); const nowDate = new Date(); const
Expected output:UpstashStorageAdapter implements all 12 methods of IStorageAdapter. Sessions are stored as Redis hashes (session:{id}), messages are stored in sorted sets for ordering (session:{id}:messages pointing to msg:{sessionId}:{msgId} hashes). The adapter handles optimistic concurrency via version numbers.
Step 13: Wire up session continuity
The @reaatech/session-continuity package manages conversation context. Create src/lib/session.ts:
ts
import { SessionManager, type IStorageAdapter, type Session, type Message, type TokenCounter, TokenBudgetExceededError, SessionNotFoundError, type MessageMetadata,} from "@reaatech/session-continuity";export { SessionManager } from "@reaatech/session-continuity";export class SimpleTokenCounter implements TokenCounter { readonly model = "simple"; readonly tokenizer = "simple-char-estimate"; count(text: string): number { return Math.ceil(text.length / 4); } countMessages(messages: Message[]): number { let total = 0; for (const msg of messages) { if (typeof msg.content === "string") { total += this.count(msg.content); } } return total; }}export function createSessionManager( adapter: IStorageAdapter, tokenBudget: number): SessionManager { return new SessionManager({ storage: adapter, tokenCounter: new SimpleTokenCounter(), tokenBudget: { maxTokens: tokenBudget, reserveTokens: 500, overflowStrategy: "compress", }, compression: { strategy: "sliding_window", targetTokens: Math.floor(tokenBudget * 0.85), minMessages: 4, }, });}export async function getOrCreateSession( manager: SessionManager, id?: string): Promise<{ session: Session; isNew: boolean }> { if (id) { try { const session = await manager.getSession(id); return { session, isNew: false }; } catch (error) { if (!(error instanceof SessionNotFoundError)) { throw error; } } } const session = await manager.createSession(); return { session, isNew: true };}export async function appendConversationTurn( manager: SessionManager, sessionId: string, userQuery: string, assistantResponse: string, metadata?: Record<string, unknown>): Promise<void> { const assembleMessageOptions = ( content: string, meta?: Record<string, unknown> ): { metadata?: MessageMetadata } => { if (!meta) return {}; return { metadata: { annotations: meta } }; }; try { await manager.addMessage(sessionId, { role: "user", content: userQuery, }); await manager.addMessage(sessionId, { role: "assistant", content: assistantResponse, ...assembleMessageOptions(assistantResponse, metadata), }); } catch (error) { if (error instanceof TokenBudgetExceededError) { await manager.compressContext(sessionId, "sliding_window"); await manager.addMessage(sessionId, { role: "user", content: userQuery, }); await manager.addMessage(sessionId, { role: "assistant", content: assistantResponse, ...assembleMessageOptions(assistantResponse, metadata), }); } else { throw error; } }}export async function getSessionMessages( manager: SessionManager, sessionId: string): Promise<Message[]> { return manager.getConversationContext(sessionId);}
Expected output:SimpleTokenCounter provides a rough character-based token estimate. createSessionManager() configures an 8000-token budget with sliding-window compression when the budget is exceeded. appendConversationTurn() handles TokenBudgetExceededError by compressing the context and retrying once.
Step 14: Build the orchestration service
This is the core pipeline — the executeCode() function that ties every module together. Create src/services/code-execution.ts:
ts
import type { CodeExecutionRequest, CodeExecutionResponse } from "../lib/types.js";import { loadAppConfig } from "../lib/config.js";import { createRouter, classifyQuery } from "../lib/classifier.js";import { createBudgetController, checkBudget, recordSpend } from "../lib/budget.js";import { validateCodeInput, createAuditLogger } from "../lib/firewall.js";import { repairCodeOutput } from "../lib/repair.js";import { createCostSpan, initLangfuse } from "../lib/telemetry.js";import { getOrCreateSession, appendConversationTurn, createSessionManager } from "../lib/session.js";import { UpstashStorageAdapter } from "../lib/upstash-adapter.js";import { withSandbox, executePythonCode } from "../lib/sandbox.js"
Expected output: The pipeline runs in sequence: load config, classify the query, create or resume a session, check budget, generate Python code via OpenRouter, repair malformed LLM output, validate code against the firewall, execute in the E2B sandbox, create a cost telemetry span, record spend, append the conversation turn, and return the result. If any step throws, the catch block logs via createAuditLogger() and returns a structured error response with the session ID preserved.
Step 15: Create the API route handlers
The POST /api/code route receives the user query and delegates to executeCode(). Create app/api/code/route.ts:
ts
import { type NextRequest, NextResponse } from "next/server";import { executeCode } from "@/src/services/code-execution.js";export async function POST(req: NextRequest): Promise<NextResponse> { try { const body = await req.json() as { query?: string; sessionId?: string; tenant?: string }; if (typeof body.query !== "string") { return NextResponse.json( { error: "Invalid request body. 'query' is required." }, { status: 400 } ); } const result = await executeCode({ query: body.query, sessionId: body.sessionId, tenant: body.tenant, }); const status = result.status === "error" ? 500 : 200; return NextResponse.json(result, { status }); } catch (error) { return NextResponse.json( { error: String(error) }, { status: 400 } ); }}
Create app/api/health/route.ts for the health check:
ts
import { NextResponse } from "next/server";export function GET(): NextResponse { return NextResponse.json({ status: "ok", timestamp: new Date().toISOString(), });}
Expected output:POST /api/code accepts { query, sessionId?, tenant? } in the request body, validates that query is a string, delegates to the orchestration service, and returns the result with the appropriate HTTP status code. GET /api/health returns { status: "ok", timestamp }.
Step 16: Create the home page and entry exports
Replace the scaffold placeholder in app/page.tsx with the recipe’s home page:
tsx
export default function Home() { return ( <div style={{ maxWidth: 640, margin: "80px auto", padding: "0 24px", fontFamily: "system-ui, sans-serif" }}> <h1>OpenRouter Code Sandbox</h1> <p style={{ fontSize: 18, lineHeight: 1.6 }}> A secure, budget‑aware AI code sandbox for SMB financial modeling. Submit a query to <code>POST /api/code</code> to generate and execute Python financial models via OpenRouter + E2B. </p> <h2>API Endpoints</h2> <ul> <li><code>POST /api/code</code> — body: <code>{"{ query, sessionId?, tenant? }"}</code></li> <li><code>GET /api/health</code> — health check</li> </ul> <p>See README.md for full documentation.</p> </div> );}
Set up the programmatic entry point in src/index.ts:
ts
export { executeCode } from "./services/code-execution.js";
Expected output: The home page at / shows the API documentation. The package’s programmatic entry exports executeCode for external consumers.
Step 17: Write the tests
The recipe includes unit tests for every module and integration tests for the route handlers. Create them under tests/ mirroring the src/ structure.
Start with the test setup file at tests/setup.ts to initialize MSW (Mock Service Worker) for HTTP interception:
ts
import { setupServer } from "msw/node";import { handlers } from "./mocks/handlers.js";import { afterAll, afterEach, beforeAll } from "vitest";const server = setupServer(...handlers);beforeAll(() => { server.listen({ onUnhandledRequest: "error" }); });afterEach(() => { server.resetHandlers(); });afterAll(() => { server.close(); });
Create MSW request handlers in tests/mocks/handlers.ts:
Expected output: All tests pass. The LLM, budget, and firewall module tests mock their external dependencies. The orchestration service test mocks all 11 injected modules and tests 7 paths — happy, clarify, fallback, budget exceeded, LLM error, sandbox error, and firewall blocked. The route handler tests invoke route functions directly — no HTTP server needed.
Step 18: Run the verification gates
Run type checking, linting, and tests with coverage:
pnpm test shows numFailedTests: 0 and all coverage thresholds at or above 90%
node .../preflight.js exits 0 and prints { "ok": true, ... }
Next steps
Add a frontend UI: Replace the minimal home page with a chat-style interface that lets users type financial queries and see the generated code, execution output, and cost in real time
Extend the budget engine: Implement tenant-level budget persistence by wiring SpendStore to your database instead of using the in-memory store
Add a rate limiter: Protect the /api/code endpoint with rate limiting per tenant using Upstash
Add streamed execution: Replace the synchronous POST /api/code with a streaming response that shows code as it’s being generated