Anthropic Code Sandbox for SMB Data Cleansing Pipelines

Safely run LLM-generated data transformation code in an isolated sandbox, with cost tracking and automatic output repair.

anthropic code-execution data-cleaning e2b express sandbox structured-output-repair cost-tracking idempotency

The problem

Small businesses often need to clean and transform CSV, JSON, or database exports but lack the infrastructure to safely execute LLM-generated code. Running it directly risks data corruption, runaway costs, or exposure of sensitive records.

Built from

Intro

This recipe builds an API service that accepts raw CSV, JSON, or SQL data, asks Claude to generate a data-cleaning transformation, executes that code in an isolated E2B sandbox, auto-repairs any malformed output using structured repair, and returns the cleaned result — all while tracking per-job token costs and supporting idempotency for safe retries. It is built on Next.js 16 (App Router) and integrates four REAA packages for repair, confidence-based routing, cost telemetry, and idempotency middleware.

Prerequisites

Node.js >= 22 and pnpm@10 installed
An Anthropic API key for Claude (set as ANTHROPIC_API_KEY)
An E2B API key for the code sandbox (set as E2B_API_KEY)
Basic familiarity with TypeScript, Next.js App Router, and Zod schemas

Step 1: Scaffold the project and configure environment

Start from the scaffold directory — it ships with Next.js 16, TypeScript, Vitest, ESLint, and all third-party dependencies already pinned in package.json. Verify the key files are in place:

terminal

ls package.json next.config.ts tsconfig.json vitest.config.ts .env.example

Open .env.example — it lists the four environment variables the service needs:

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

161 kB·89 tests·100.0% coverage·vitest passing

SHA-25638a93e4d250eda476d59941d6fd86ec37e2233801520326c79361379407b487a

Book a conversation All solutions

Comments

Loading comments…

import { z } from "zod"; export const JobRequestSchema = z.object({ inputType: z.enum(["csv", "json", "sql"]), inputData: z.string().min(1), instructions: z.string().optional(), idempotencyKey: z.string().optional(), }); export const JobResultSchema = z.object({ id: z.string(), cleanedData: z.string(), originalFormat: z.string(), transformationsApplied: z.array(z.string()), executionTimeMs: z.number(), tokensUsed: z.object({ input: z.number(), output: z.number(), }), }); export const FormatClassificationSchema = z.object({ predictions: z .array( z.object({ label: z.string(), confidence: z.number().min(0).max(1), }), ) .min(1), }); export const CodeGenOutputSchema = z.object({ code: z.string(), description: z.string(), expectedOutputShape: z.string(), }); export const SandboxResultSchema = z.object({ text: z.string(), error: z.string().nullable(), }); export type JobRequest = z.infer<typeof JobRequestSchema>; export type JobResult = z.infer<typeof JobResultSchema>; export type FormatClassification = z.infer<typeof FormatClassificationSchema>; export type CodeGenOutput = z.infer<typeof CodeGenOutputSchema>; export type SandboxResult = z.infer<typeof SandboxResultSchema>; export class FormatNotRecognizedError extends Error { readonly code = "FORMAT_NOT_RECOGNIZED"; constructor(message: string) { super(message); this.name = "FormatNotRecognizedError"; } } export class BudgetExceededError extends Error { readonly code = "BUDGET_EXCEEDED"; constructor(message: string) { super(message); this.name = "BudgetExceededError"; } } export class CodeExecutionError extends Error { readonly code = "CODE_EXECUTION_ERROR"; constructor(message: string) { super(message); this.name = "CodeExecutionError"; } } export class ValidationError extends Error { readonly code = "VALIDATION_ERROR"; constructor(message: string) { super(message); this.name = "ValidationError"; } }

import Anthropic from "@anthropic-ai/sdk"; import { ConfidenceRouter } from "@reaatech/confidence-router"; import { repair, UnrepairableError } from "@reaatech/structured-repair-core"; import { FormatClassificationSchema, FormatNotRecognizedError } from "../lib/schemas.js"; export async function classifyInputFormat( data: string, ): Promise<{ label: string; predictions: Array<{ label: string; confidence: number }> }> { const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, }); const message = await client.messages.create({ model: process.env.ANTHROPIC_MODEL ?? "claude-sonnet-4-6", max_tokens: 512, system: "You are a data format classifier. Classify the input data as one of: csv, json, sql, or unknown. Return only valid JSON with predictions array.", messages: [ { role: "user", content: data, }, ], }); const block = message.content[0]; if (block.type !== "text") { throw new FormatNotRecognizedError("No text block in response"); } const text = block.text; let classification: { predictions: Array<{ label: string; confidence: number }> }; try { classification = await repair(FormatClassificationSchema, text); } catch (error) { if (error instanceof UnrepairableError) { throw new FormatNotRecognizedError(`Failed to parse classification: ${error.message}`); } throw error; } const router = new ConfidenceRouter({ routeThreshold: 0.7, fallbackThreshold: 0.3, }); const decision = router.decide({ predictions: classification.predictions, }); if (decision.type === "ROUTE") { const target = decision.target; if (target === undefined) { throw new FormatNotRecognizedError("No target in routing decision"); } return { label: target, predictions: classification.predictions }; } if (decision.type === "CLARIFY") { return { label: "CLARIFY", predictions: classification.predictions }; } throw new FormatNotRecognizedError("Format could not be determined"); }

import Anthropic from "@anthropic-ai/sdk"; import { repair, UnrepairableError } from "@reaatech/structured-repair-core"; import { CodeGenOutputSchema, ValidationError } from "../lib/schemas.js"; import type { CostTracker } from "./cost-tracker.js"; export async function generateTransformationCode( costTracker: CostTracker, inputData: string, format: string, instructions?: string, ): Promise<{ code: string; description: string; tokensUsed: { input: number; output: number } }> { costTracker.enforceBudgetOrThrow("code-gen"); const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, }); const model = process.env.ANTHROPIC_MODEL ?? "claude-sonnet-4-6"; const message = await client.messages.create({ model, max_tokens: 4096, system: "You are a data transformation expert. Output ONLY valid JSON with keys: code, description, expectedOutputShape. The code must be a self-contained JavaScript function that takes the input data string and returns cleaned output.", messages: [ { role: "user", content: `Transform this ${format} data:\n\n${inputData}\n\nInstructions: ${instructions ?? "Clean and normalize"}`, }, ], }); const block = message.content[0]; if (block.type !== "text") { throw new ValidationError("No text block in response"); } const text = block.text; let output: { code: string; description: string; expectedOutputShape: string }; try { output = await repair(CodeGenOutputSchema, text); } catch (error) { if (error instanceof UnrepairableError) { throw new ValidationError(`Failed to parse code generation output: ${error.message}`); } throw error; } costTracker.recordSpan( "anthropic", model, message.usage.input_tokens, message.usage.output_tokens, "code-gen", ); return { code: output.code, description: output.description, tokensUsed: { input: message.usage.input_tokens, output: message.usage.output_tokens, }, }; }

import { generateId } from "@reaatech/llm-cost-telemetry"; import { classifyInputFormat } from "./format-classifier.js"; import { generateTransformationCode } from "./code-generator.js"; import { SandboxExecutor } from "./sandbox-executor.js"; import { validateResult } from "./result-validator.js"; import { CostTracker } from "./cost-tracker.js"; import { FormatNotRecognizedError, type JobRequest, type JobResult, } from "../lib/schemas.js"; interface PipelineServices { classifier: { classifyInputFormat: typeof classifyInputFormat }; costTracker: CostTracker; generator: { generateTransformationCode: typeof generateTransformationCode }; executor: SandboxExecutor; validator: { validateResult: typeof validateResult }; } export class JobManager { private jobs: Map<string, JobResult> = new Map(); private classifier: PipelineServices["classifier"]; private costTracker: PipelineServices["costTracker"]; private generator: PipelineServices["generator"]; private executor: PipelineServices["executor"]; private validator: PipelineServices["validator"]; constructor(services: PipelineServices) { this.classifier = services.classifier; this.costTracker = services.costTracker; this.generator = services.generator; this.executor = services.executor; this.validator = services.validator; } async runTransformationPipeline(request: JobRequest): Promise<JobResult> { const startTime = Date.now(); const classification = await this.classifier.classifyInputFormat(request.inputData); if (classification.label === "CLARIFY") { throw new FormatNotRecognizedError("Input format could not be reliably classified"); } const { code, tokensUsed } = await this.generator.generateTransformationCode( this.costTracker, request.inputData, classification.label, request.instructions, ); const sandboxResult = await this.executor.executeCode(code, request.inputData); const validation = await this.validator.validateResult(sandboxResult.text); const result: JobResult = { id: generateId(), cleanedData: validation.valid ? typeof validation.data === "object" && validation.data !== null ? JSON.stringify(validation.data) : String(validation.data) : sandboxResult.text, originalFormat: classification.label, transformationsApplied: [code], executionTimeMs: Date.now() - startTime, tokensUsed, }; this.jobs.set(result.id, result); return result; } getJob(id: string): JobResult | undefined { return this.jobs.get(id); } }

import { type NextRequest, NextResponse } from "next/server"; import { z } from "zod"; import { IdempotencyError } from "@reaatech/idempotency-middleware"; import { JobRequestSchema, FormatNotRecognizedError, BudgetExceededError, CodeExecutionError, ValidationError, } from "@/src/lib/schemas.js"; import { classifyInputFormat } from "@/src/services/format-classifier.js"; import { CostTracker } from "@/src/services/cost-tracker.js"; import { generateTransformationCode } from "@/src/services/code-generator.js"; import { SandboxExecutor } from "@/src/services/sandbox-executor.js"; import { validateResult } from "@/src/services/result-validator.js"; import { JobManager } from "@/src/services/job-manager.js"; import { middleware } from "@/src/services/idempotency.js"; const costTracker = new CostTracker(); const sandboxExecutor = new SandboxExecutor(); export const jobManager = new JobManager({ classifier: { classifyInputFormat }, costTracker, generator: { generateTransformationCode }, executor: sandboxExecutor, validator: { validateResult }, }); export async function POST(req: NextRequest) { try { const rawBody: unknown = await req.json(); const parsed = JobRequestSchema.parse(rawBody); const idempotencyKey = req.headers.get("idempotency-key"); const runPipeline = async () => { return jobManager.runTransformationPipeline(parsed); }; let result; if (idempotencyKey) { result = await middleware.execute( idempotencyKey, { method: "POST", path: "/api/jobs", body: parsed }, runPipeline, ); } else { result = await runPipeline(); } return NextResponse.json(result, { status: 200 }); } catch (error) { if (error instanceof z.ZodError) { return NextResponse.json( { error: "Validation failed", details: error.issues }, { status: 422 }, ); } if (error instanceof FormatNotRecognizedError) { return NextResponse.json({ error: error.message }, { status: 422 }); } if (error instanceof BudgetExceededError) { return NextResponse.json({ error: error.message }, { status: 402 }); } if (error instanceof CodeExecutionError) { return NextResponse.json({ error: error.message }, { status: 500 }); } if (error instanceof ValidationError) { return NextResponse.json({ error: error.message }, { status: 422 }); } if (error instanceof IdempotencyError) { return NextResponse.json( { error: error.message }, { status: error.getStatusCode() }, ); } throw error; } }

Anthropic Code Sandbox for SMB Data Cleansing Pipelines

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the project and configure environment

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the project and configure environment

Step 2: Define the shared Zod schemas and error classes

Step 3: Build the format classifier

Step 4: Create the cost tracker

Step 5: Build the code generator

Step 6: Build the sandbox executor

Step 7: Create the result validator

Step 8: Wire the idempotency middleware

Step 9: Create the Job Manager orchestrator

Step 10: Preload environment variables

Step 11: Create the API route handlers

Step 12: Update the entry point and run the tests

Step 13: Try the recipe

Next steps