Databricks Code Sandbox for Secure SMB Data Analysis

An AI agent that translates natural language into safe SQL and Python queries, runs them on Databricks, and returns results with cost tracking and guardrails.

databricks code-sandbox smb-data-analysis nextjs express openai e2b rag cost-tracking security-guardrails

The problem

Small businesses with data in Databricks need ad‑hoc reports and analyses, but hiring a data engineer for every query isn’t feasible. Non‑technical staff often write inefficient or unsafe code, risking runaway costs.

Built from

Intro

This tutorial walks you through building the Databricks Code Sandbox — a Next.js application that lets users ask analytical questions in plain English and get safe SQL or Python results from their Databricks warehouse. An AI agent classifies intent, generates code via LLM, repairs malformed output, dry-runs the code in an E2B sandbox, enforces security policies through a firewall, tracks costs, and executes the approved query on Databricks — all with session continuity for multi-turn conversations.

You’ll use five REAA packages (confidence-router, structured-repair-core, tool-use-firewall-core, llm-cost-telemetry, session-continuity), the OpenAI SDK for code generation, the E2B sandbox for isolated dry-runs, and the Databricks SDK for query execution. The final app exposes three API routes (POST /api/analyze, POST+GET /api/session, GET /api/budget) and a single-page frontend.

Prerequisites

Node.js >= 22 and pnpm >= 9
A Databricks workspace with a SQL warehouse — the DATABRICKS_HOST, DATABRICKS_TOKEN, and DATABRICKS_WAREHOUSE_ID env vars
An OpenAI API key (or compatible provider via custom baseURL)
An E2B API key for the sandbox environment
A Langfuse account (optional — the app runs without it)
Familiarity with TypeScript, Next.js App Router, and async/await patterns

Step 1: Bootstrap the project and configure environment

Start by creating a Next.js project with the App Router:

terminal

npx create-next-app@latest databricks-code-sandbox \

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

169 kB·103 tests·96.5% coverage·vitest passing

SHA-2569e87fd39a6ab2bc1d6473c651444d8a884f49a126cb6818ee19a6b0f1a46d249

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22 and pnpm >= 9

A Databricks workspace with a SQL warehouse — the DATABRICKS_HOST, DATABRICKS_TOKEN, and DATABRICKS_WAREHOUSE_ID env vars

An OpenAI API key (or compatible provider via custom baseURL)

An E2B API key for the sandbox environment

A Langfuse account (optional — the app runs without it)

Familiarity with TypeScript, Next.js App Router, and async/await patterns

import OpenAI from "openai"; import type { CodeGenInput } from "../lib/types.js"; export function createLlmClient(baseURL?: string): OpenAI { const apiKey = process.env.OPENAI_API_KEY; if (!apiKey) { throw new Error("OPENAI_API_KEY not set"); } const config: { apiKey: string; baseURL?: string } = { apiKey }; if (baseURL) { config.baseURL = baseURL; } return new OpenAI(config); } export function getModelName(): string { return process.env.OPENAI_MODEL ?? "gpt-5.2-mini"; } export class LlmError extends Error { status: number; step: string; constructor(message: string, status: number, step: string) { super(message); this.name = "LlmError"; this.status = status; this.step = step; } } export async function generateCode( input: CodeGenInput, baseURL?: string, ): Promise<{ code: string; inputTokens: number; outputTokens: number }> { const apiKey = process.env.OPENAI_API_KEY; if (!apiKey) { throw new Error("OPENAI_API_KEY not set"); } const config: { apiKey: string; baseURL?: string } = { apiKey }; if (baseURL) { config.baseURL = baseURL; } const client = new OpenAI(config); const model = getModelName(); const systemPrompt = input.intent === "sql" ? "You are a SQL expert. Return only valid SQL code without markdown fences or explanation." : "You are a Python expert. Return only valid Python code without markdown fences or explanation."; try { const completion = await client.chat.completions.create({ model, messages: [ { role: "system", content: systemPrompt }, { role: "user", content: input.query }, ], temperature: 0.1, }); const code = completion.choices[0]?.message?.content ?? ""; const inputTokens = completion.usage?.prompt_tokens ?? 0; const outputTokens = completion.usage?.completion_tokens ?? 0; return { code, inputTokens, outputTokens }; } catch (err: unknown) { const status = typeof err === "object" && err !== null && "status" in err ? (err as { status: number }).status : 500; const message = err instanceof Error ? err.message : "OpenAI API error"; throw new LlmError(message, status, "generate"); } }

Databricks Code Sandbox for Secure SMB Data Analysis

The problem

Built from

Intro

Prerequisites

Step 1: Bootstrap the project and configure environment

Example artifact

Comments

Intro

Prerequisites

Step 1: Bootstrap the project and configure environment

Step 2: Define shared types

Step 3: Build the intent classifier and LLM generator

Step 4: Create the code repair and sandbox services

Step 5: Build the firewall and Databricks services

Step 6: Implement cost telemetry, session management, and observability

Step 7: Wire the pipeline and API routes

Step 8: Build the frontend

Step 9: Write and run tests

Step 10: Type-check and lint

Next steps