Databricks Security Guardrails for SMB Data Pipelines

Add PII redaction, prompt injection defense, and content policy enforcement to your Databricks model-serving pipelines — no retraining required.

databricks security-guardrails pii-redaction prompt-injection nextjs langfuse presidio guardrail-chain

The problem

Small businesses feeding customer data into Databricks-hosted LLMs risk accidental PII exposure and prompt injection attacks, but lack the security engineering capacity to build custom guardrails for every model endpoint.

Built from

Intro

This tutorial walks you through building a pluggable security layer that sits between your users and a Databricks model-serving endpoint. Every incoming chat request passes through Presidio PII detection and a configurable sequence of guardrails (PII redaction, prompt injection detection, toxicity filtering, topic boundaries, cost pre-checks) before reaching Databricks. The model’s output is also scanned. All guardrail activity is logged, metered, and traced through Langfuse for audit trails. By the end you’ll have a working Next.js 16 API that you can point at any Databricks model endpoint.

Prerequisites

Node.js >= 22
pnpm 10.x (the packageManager field in package.json specifies the exact version)
A Databricks workspace with at least one model-serving endpoint deployed (needed for the live proxy, but the test suite runs fully mocked)
A Langfuse account for observability (optional for local dev; observability skips gracefully when env vars are absent)

Step 1: Scaffold the Project and Install Dependencies

The project shell is already on disk — package.json, tsconfig.json, vitest.config.ts, next.config.ts, and root config files are provided by the scaffold agent. Your job is to verify the dependencies and install them.

Open package.json and confirm it contains these dependencies:

json

Example artifact

A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.

Download example (zip)Browse files

158 kB·111 tests·100.0% coverage·vitest passing

SHA-256c115d8c5e35fad47fadd4996e7f7d711e90053c29579ad6df16d08c24f68eacb

Book a conversation All solutions

Comments

Loading comments…

Intro

Prerequisites

Node.js >= 22
pnpm 10.x (the packageManager field in package.json specifies the exact version)
A Databricks workspace with at least one model-serving endpoint deployed (needed for the live proxy, but the test suite runs fully mocked)
A Langfuse account for observability (optional for local dev; observability skips gracefully when env vars are absent)

Step 1: Scaffold the Project and Install Dependencies

Open package.json and confirm it contains these dependencies:

json

import { injectionGuard, piiGuard, secretGuard, GuardrailsEngine, SelectionType, } from "@presidio-dev/hai-guardrails"; export class PresidioAnalyzer { private engine: GuardrailsEngine; constructor() { this.engine = new GuardrailsEngine({ guards: [ injectionGuard( { roles: ["user"] }, { mode: "heuristic", threshold: 0.7 }, ), piiGuard({ selection: SelectionType.All }), secretGuard({ selection: SelectionType.All }), ], }); } async run( input: string, ): Promise<Array<{ role: string; passed: boolean }>> { const messages: Array<{ role: string; content: string }> = [ { role: "user", content: input }, ]; const results = await this.engine.run(messages); return messages.map((msg, index) => { const allPassed = results.messagesWithGuardResult.every( (guardResult) => { const result = guardResult.messages.find( (m) => m.index === index, ); return result ? result.passed : true; }, ); return { role: msg.role, passed: allPassed }; }); } hasPIIViolation( messages: Array<{ role: string; passed: boolean }>, ): boolean { return messages.some((msg) => !msg.passed); } async sanitizedText(input: string): Promise<string> { const results = await this.run(input); if (!this.hasPIIViolation(results)) { return input; } const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g; const phoneRegex = /(\+?1[\s.-]?)?$?\d{3}$?[\s.-]?\d{3}[\s.-]?\d{4}/g; const ssnRegex = /\d{3}-\d{2}-\d{4}/g; const ccRegex = /\b(?:\d{4}[-\s]?){3}\d{4}\b/g; let sanitized = input.replace(emailRegex, "[REDACTED]"); sanitized = sanitized.replace(phoneRegex, "[REDACTED]"); sanitized = sanitized.replace(ssnRegex, "[REDACTED]"); sanitized = sanitized.replace(ccRegex, "[REDACTED]"); return sanitized; } } export function createDefaultAnalyzer(): PresidioAnalyzer { return new PresidioAnalyzer(); }

import type { DatabricksRequest, DatabricksResponse } from "../types.js"; export class DatabricksTimeoutError extends Error { constructor(message?: string) { super(message ?? "Databricks request timed out"); this.name = "DatabricksTimeoutError"; } } export class DatabricksApiError extends Error { status: number; body: unknown; constructor(status: number, body: unknown) { super(`Databricks API error: ${String(status)}`); this.name = "DatabricksApiError"; this.status = status; this.body = body; } } export class DatabricksClient { private host: string; private token: string; constructor() { this.host = process.env.DATABRICKS_HOST ?? ""; this.token = process.env.DATABRICKS_TOKEN ?? ""; } async forward( request: DatabricksRequest, timeoutMs: number = 30_000, ): Promise<DatabricksResponse> { const { endpoint, ...requestBody } = request; const url = `${this.host}/serving-endpoints/${endpoint}/invocations`; const controller = new AbortController(); const timeoutId = setTimeout(() => { controller.abort(); }, timeoutMs); try { const response = await fetch(url, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${this.token}`, }, body: JSON.stringify(requestBody), signal: controller.signal, }); if (!response.ok) { let body: unknown = null; try { body = await response.json(); } catch { // response body not JSON } throw new DatabricksApiError(response.status, body); } const data = (await response.json()) as DatabricksResponse; return data; } catch (error) { if (error instanceof DatabricksApiError) { throw error; } if (error instanceof Error && error.name === "AbortError") { throw new DatabricksTimeoutError("Databricks request timed out"); } throw error; } finally { clearTimeout(timeoutId); } } async healthCheck(): Promise<boolean> { try { const response = await fetch(this.host, { method: "GET" }); return response.status === 200; } catch { return false; } } }

import { type NextRequest, NextResponse } from 'next/server'; import type { DatabricksRequest, DatabricksResponse } from '../../../../src/types.js'; import { GuardrailService } from '../../../../src/services/guardrail-service.js'; import { DatabricksClient, DatabricksApiError, DatabricksTimeoutError } from '../../../../src/api/databricks-proxy.js'; import { getLogger } from '@reaatech/guardrail-chain-observability'; export async function POST(req: NextRequest): Promise<NextResponse> { try { let body: DatabricksRequest; try { body = (await req.json()) as DatabricksRequest; } catch { return NextResponse.json({ error: 'invalid_json' }, { status: 400 }); } if (!body.endpoint) { return NextResponse.json({ error: 'missing endpoint' }, { status: 400 }); } const service = new GuardrailService(); const { outcome, sanitizedInput } = await service.processInput(body.endpoint, body.messages); if (outcome.verdict === 'reject') { const logger = getLogger(); logger.warn({ guardrailId: outcome.violations[0]?.guardrailId }, 'guardrail blocked request'); return NextResponse.json( { error: 'guardrail_blocked', violations: outcome.violations }, { status: 403 }, ); } const client = new DatabricksClient(); const forwardedBody: DatabricksRequest = { ...body, messages: [{ role: 'user', content: sanitizedInput ?? body.messages.map((m) => m.content).join('\n') }], }; let databricksResponse: DatabricksResponse; try { databricksResponse = await client.forward(forwardedBody); } catch (error) { if (error instanceof DatabricksApiError) { return NextResponse.json( { error: 'upstream_error', status: error.status, detail: error.body }, { status: 502 }, ); } if (error instanceof DatabricksTimeoutError) { return NextResponse.json( { error: 'upstream_timeout' }, { status: 504 }, ); } throw error; } const outputContent = databricksResponse.choices[0]?.message?.content ?? ''; const outputOutcome = await service.processOutput(outputContent); if (outputOutcome.verdict === 'reject') { return NextResponse.json( { error: 'output_guardrail_blocked', violations: outputOutcome.violations }, { status: 500 }, ); } return NextResponse.json(databricksResponse); } catch (error) { getLogger().error({ error: String(error) }, 'internal error'); return NextResponse.json({ error: 'internal_error' }, { status: 500 }); } }

Databricks Security Guardrails for SMB Data Pipelines

The problem

Built from

Intro

Prerequisites

Step 1: Scaffold the Project and Install Dependencies

Example artifact

Comments

Intro

Prerequisites

Step 1: Scaffold the Project and Install Dependencies

Step 2: Define the Shared Types

Step 3: Build the Configuration System

Step 4: Implement Presidio PII Detection

Step 5: Create the Databricks Proxy Client

Step 6: Build the Guardrail Service

Step 7: Create the API Route Handlers

Step 8: Wire Up Langfuse Observability

Step 9: Set Up Next.js Instrumentation

Step 10: Run the Tests

Next steps