Small businesses want to use AI to generate and execute code for data analysis but lack safe execution environments, exposing them to data breaches, runaway compute costs, or corrupted databases.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
In this tutorial you’ll build an Express server that accepts natural-language financial analysis prompts, generates Python code using OpenAI, and runs it inside an isolated E2B sandbox — so customer-submitted logic never touches your host system. You’ll also wire up an intelligent handoff router that decides when to call OpenAI versus execute code directly, gateway middleware for authentication and rate limiting, a chaos engineering layer that injects controlled failures to verify sandbox resilience, and health check endpoints for monitoring. By the end you’ll have a production-style code sandbox API you can extend for any domain where you need to safely run AI-generated code.
Prerequisites
Node.js >= 22 (the package.json engines field requires it)
pnpm 10.x (the project uses pnpm@10.15.1 as its package manager)
E2B API key — sign up at e2b.dev and create an API key
An API key for the mesh gateway — this is the key your clients will send in the x-api-key header; you choose the value
Familiarity with TypeScript and Express routing
Step 1: Scaffold the project and install dependencies
Create an empty directory and set up the project skeleton. You’ll initialize a package.json, add a tsconfig.json, and install every dependency this service needs.
Now create the postinstall script that patches the vitest binary to merge coverage data into the JSON test report. This only matters for CI, but the package.json references it so the script must exist.
Create bin/postinstall.sh:
terminal
#!/bin/bash# postinstall.sh - Installs vitest wrapper for coverage mergeBASEDIR="$(cd "$(dirname "$0")/.." && pwd)"# Save original vitest binarycp -f "$BASEDIR/node_modules/.bin/vitest" "$BASEDIR/node_modules/.bin/vitest-original" 2>/dev/null# Install wrappercat > "$BASEDIR/node_modules/.bin/vitest" << 'WRAPPER'#!/bin/bashBASEDIR=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")# Parse --outputFileREPORT=""ARGS=()for arg in "$@"; do ARGS+=("$arg") if [[ "$arg" == --outputFile=* ]]; then REPORT="${arg#--outputFile=}" fidone# Run original vitest"$BASEDIR/vitest-original" "${ARGS[@]}"VITEST_EXIT=$?# Post-process: merge coverage into JSON outputif [ -n "$REPORT" ] && [ -f "$REPORT" ]; then COV="coverage/coverage-final.json" if [ -f "$COV" ]; then node -e "const r = JSON.parse(require('fs').readFileSync('$REPORT','utf8'));const c = JSON.parse(require('fs').readFileSync('$COV','utf8'));const t = {};for(const[k,ik] of Object.entries({lines:'s',branches:'b',functions:'f',statements:'s'})){ let cov=0,tot=0; for(const fc of Object.values(c)){ for(const h of Object.values(fc[ik]||{})){ tot++; if(typeof h==='number'?h>0:h.some(x=>x>0)) cov++; } } t[k]={pct:tot===0?0:Math.round(cov/tot*10000)/100,covered:cov,total:tot};}t.lines={...t.statements};r.coverage={total:t};require('fs').writeFileSync('$REPORT',JSON.stringify(r));" fifiexit $VITEST_EXITWRAPPERchmod +x "$BASEDIR/node_modules/.bin/vitest"echo "vitest wrapper installed"
Make it executable and install everything:
terminal
chmod +x bin/postinstall.shpnpm install
Expected output: pnpm prints the dependency tree and then vitest wrapper installed from the postinstall hook.
Step 2: Configure environment variables
The server needs three API keys and a few operational settings. Create a .env file with your real keys.
The ExecuteResponse carries the sandbox output, timing, and optionally the AI-generated code so clients can see what code was synthesized before execution.
Now create the environment validation layer using Zod. It parses and caches the config so every module can call getEnv() without re-validating on every request.
Create src/config/env.ts:
ts
import { z } from 'zod';const envSchema = z.object({ OPENAI_API_KEY: z.string().min(1), E2B_API_KEY: z.string().min(1), API_KEY: z.string().min(1), PORT: z.coerce.number().default(3000), SANDBOX_TIMEOUT: z.coerce.number().default(30000), RATE_LIMIT_WINDOW_MS: z.coerce.number().default(900000), RATE_LIMIT_MAX_REQUESTS: z.coerce.number().default(100),});export type Env = z.infer<typeof envSchema>;let _env: Env | undefined;export function parseEnv(overrides?: Record<string, string>): Env { const source = overrides ?? process.env; const result = envSchema.safeParse(source); if (!result.success) { throw result.error; } return result.data;}export function getEnv(): Env { if (!_env) { _env = parseEnv(); } return _env;}export function setEnv(env: Env): void { _env = env;}
Next, create a tiny structured logger that writes JSON to stdout. All log entries include a timestamp and support meta fields for extra context.
This module holds a lazy-initialized singleton OpenAI client and the generateCode function. When a prompt arrives that looks like natural language (not raw source code), the execute route calls generateCode, which sends the prompt to OpenAI’s chat completions API. A system message instructs the model to return only valid, runnable Python with no markdown fences — and a stripMarkdownFences helper cleans up any leftover formatting.
The E2B client provisions isolated Linux sandboxes via the @e2b/code-interpreter SDK. The executeCode function creates a sandbox, runs the Python code, and immediately kills the sandbox in a finally block — you never pay for idle sandbox time. There is also a createSandbox helper with a single retry to handle transient provisioning errors.
Step 6: Configure the handoff router and chaos engine
The handoff router from @reaatech/agent-handoff-routing uses a capability-based scoring algorithm to decide which agent handles a request. You’ll register two agents: an OpenAI code generator (openai-gen) and an E2B sandbox executor (e2b-exec). The execute route later calls routeHandoff with a payload that declares required skills — ['code-generation', 'python'] for natural-language prompts or ['code-execution'] for raw source code.
The chaos engine wraps sandbox calls to inject simulated failures — latency spikes and timeouts — so you can verify the service degrades gracefully. In the default passthrough mode, every call passes through untouched.
Health checks let orchestrators and load balancers verify the service is alive. GET /health is a lightweight liveness probe that always returns 200. GET /health/deep is a readiness probe — it creates a real sandbox, runs a test Python command, and returns 503 if the sandbox is unreachable.
This is the core of the service. POST /execute accepts a JSON body with a code field. The route uses a simple heuristic to decide what to do: if the input contains patterns like def , import , or print(, it’s treated as raw code and routed directly to the sandbox. Otherwise, it’s treated as a natural-language prompt — the handoff router dispatches it to the OpenAI code generation agent, the generated code is fed into the chaos engine’s intercept pipeline, and finally it runs in the sandbox. A concurrency limit of 5 (via p-limit) prevents resource exhaustion.
Create src/routes/execute.ts:
ts
import { Router, type Router as RouterType, Request, Response } from 'express';import pLimit from 'p-limit';import { z } from 'zod';import { routeHandoff, createHandoffPayload } from '../lib/handoff-router.js';import { generateCode } from '../lib/openai-client.js';import { executeCode } from '../lib/e2b-client.js';import { interceptSandboxCall } from '../lib/chaos-engine.js';import type { ExecuteResponse, ApiError } from '../types.js';const executeSchema = z.object({ code: z.string().min(
The companion error-handling middleware sits at the bottom of the Express middleware stack. It catches unhandled errors that bubble up from upstream middleware and maps them to structured JSON error responses — Zod validation errors become 400s, sandbox-related errors become 502s, timeouts become 504s, and everything else is a 500.
The entry point loads environment variables, configures middleware, mounts the health and execute routers, and starts listening. The gateway middleware from @reaatech/agent-mesh-gateway provides TLS handling, API key authentication, and token-bucket rate limiting — all three are applied globally, with auth and rate limiting scoped to the /execute path.
Create src/index.ts:
ts
import express, { type Express } from 'express';import cors from 'cors';import 'dotenv/config';import { tlsMiddleware, authMiddleware, rateLimiterMiddleware } from '@reaatech/agent-mesh-gateway';import { parseEnv, setEnv } from './config/env.js';import { logger } from './utils/logger.js';import healthRouter from './routes/health.js';import executeRouter from './routes/execute.js';import { errorHandler } from './routes/execute.error-handling.js';const env = parseEnv();setEnv(env);const app: Express = express();app.use(tlsMiddleware);app.use(express.json({ limit: '1mb' }));app.use(cors());app.use('/health', healthRouter);app.use('/execute', rateLimiterMiddleware, authMiddleware, executeRouter);app.use(errorHandler);app.listen(env.PORT, () => { logger('info', `Server started on port ${env.PORT}`, { port: env.PORT, service: 'openai-code-sandbox' });});export default app;
You now have every source file in place. Verify the project compiles:
terminal
pnpm typecheck
Expected output: no errors, no output (TypeScript exits silently on success).
Step 10: Run the tests
The test suite uses Vitest with extensive mocking — the @e2b/code-interpreter, openai, @reaatech/agent-mesh-gateway, @reaatech/agent-chaos-core, and @reaatech/agent-handoff-routing modules are all mocked so tests run offline with no real API calls. Start with the health check tests to confirm your routes are wired correctly.
Create a Vitest config so the test runner knows where to find test files and what coverage thresholds to enforce.
Expected output: Vitest runs all test suites and prints a summary. With the complete artifact you should see 94 tests pass across 34 suites. Key test output includes:
code
✓ GET /health returns 200 with healthy status, service, and version
✓ GET /health/deep returns 200 when sandbox execution succeeds
✓ GET /health/deep returns 503 when runCode returns an error
✓ GET /health/deep returns 503 when Sandbox.create throws
✓ POST /execute with valid Python code returns 200 with result
✓ POST /execute with NL prompt generates code then executes it
✓ POST /execute with invalid body (missing code) returns 400
✓ POST /execute when sandbox creation fails returns 502
✓ POST /execute when rate limited returns 429
✓ POST /execute when unauthenticated returns 401
✓ Two concurrent POST /execute requests both succeed
Test Files 34 passed (34)
Tests 94 passed (94)
Step 11: Start the server and test it live
With your .env configured and all tests passing, fire up the development server:
terminal
pnpm dev
Expected output: the terminal prints a JSON log line like:
json
{"level":"info","message":"Server started on port 3000","timestamp":"2025-...","port":3000,"service":"openai-code-sandbox"}
Finally, try a natural-language financial prompt. The server will generate Python code, run it, and return both the result and the generated code:
terminal
curl -X POST http://localhost:3000/execute \ -H "Content-Type: application/json" \ -H "x-api-key: a-secret-value-you-choose-for-clients" \ -d '{"code": "calculate the NPV of cash flows [100, 200, 300] at a 5% discount rate"}'
Expected output includes a generatedCode field containing the Python script OpenAI wrote, plus the computed result:
json
{"result":"...","stdout":"...","stderr":"","durationMs":...","generatedCode":"# Python code to calculate NPV\n..."}
Next steps
Explore the src/lib/sandbox-lifecycle.ts module in the artifact — it wraps sandbox execution with a Promise.race timeout for even tighter control over long-running sandboxes.
Check out src/lib/health-config.ts which uses @reaatech/agent-runbook-health-checks to auto-generate Kubernetes probe YAML and health endpoint boilerplate from an analysis context.
Switch the chaos engine from mode: 'passthrough' to mode: 'inject' and run the test suite again to see the fault injection in action — the ChaosEngine tests in tests/chaos-engine.test.ts exercise latency injection, timeouts, and selector matching.