SMBs using Databricks for business intelligence struggle to let AI agents interact with their data safely—each customer needs isolated access, rate limits, and audit trails, which are complex to build from scratch.
A complete, working implementation of this recipe — downloadable as a zip or browsable file by file. Generated by our build pipeline; tested with full coverage before publishing.
This tutorial walks you through building a multi-tenant MCP server that wraps Databricks SQL queries as tools for AI agents. You will use the @reaatech/mcp-server-* ecosystem to compose an Express server with tenant resolution, per-tenant rate limiting, tool visibility policies, row-level security, OpenTelemetry tracing, and Langfuse observability — all in one TypeScript project. By the end, you will have a working MCP server that an AI agent (Claude, or any MCP client) can connect to and run SQL queries against your Databricks warehouse, with each tenant seeing only its own data.
Prerequisites
Node.js >= 22 and pnpm 10 installed
A Databricks SQL warehouse with a hostname, HTTP path, and personal access token
A Langfuse account (optional — observability still works without it)
Familiarity with TypeScript and Express
Step 1: Scaffold the project and install dependencies
Create a package.json at the project root. The project uses Next.js 16 as its build and dev runner (next dev, next build, next start), while the actual application server is an Express app provided by @reaatech/mcp-server-engine. The scaffold also includes a tsconfig.json, vitest.config.ts, next.config.ts, and ESLint flat config (not shown here, but present in the finished project).
Every version is pinned exactly — no ^ or ~. With the package.json in place, install everything:
terminal
pnpm install
Expected output:pnpm resolves all dependencies and writes pnpm-lock.yaml. The node_modules/ directory now contains the @reaatech/* ecosystem, @databricks/sql, @modelcontextprotocol/sdk, and the rest of the stack.
Step 2: Configure environment variables
Create .env from the template below. Every process.env.X you will reference in source code must be listed here.
env
# Env vars used by databricks-mcp-server-for-smb-data-analytics.# Keep placeholders only -- never commit real values.NODE_ENV=developmentDATABRICKS_HOST=<your-databricks-workspace-url>DATABRICKS_PATH=/sql/2.0/warehouses/<warehouse-id>DATABRICKS_TOKEN=<your-databricks-personal-access-token>PORT=8080LOG_LEVEL=infoAPI_KEY=<your-mcp-server-shared-secret>AUTH_MODE=api-keyAUTH_BYPASS_IN_DEV=trueCORS_ORIGIN=*TENANT_RESOLVER_HEADER=x-tenant-idRATE_LIMIT_RPM=60RATE_LIMIT_TPM=10000LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>LANGFUSE_BASE_URL=https://cloud.langfuse.comOTEL_EXPORTER_OTLP_ENDPOINT=<your-otlp-endpoint>OTEL_SERVICE_NAME=databricks-mcp-server
Your Databricks credentials and server configuration now live in .env. Make sure this file is in your .gitignore so secrets stay local.
Step 3: Define the shared domain types
Create src/types/index.ts with the core types your server will use everywhere.
DatabricksError carries a machine-readable code (like DATABRICKS_QUERY_FAILED or NOT_CONNECTED) so callers can handle errors programmatically. QueryExecutor is an interface your session manager will implement.
Expected output: The file ./src/types/index.ts exports 8 interfaces, 1 class, and all the types needed across your modules.
Step 4: Create the validated config module
Create src/config.ts to read Databricks credentials from the environment and wrap the server config from @reaatech/mcp-server-core.
ts
import { envConfig, isProduction as coreIsProduction } from '@reaatech/mcp-server-core';import type { DatabricksConfig, ServerConfig } from './types/index.js';export function getDatabricksConfig(): DatabricksConfig { const host = process.env.DATABRICKS_HOST; const path = process.env.DATABRICKS_PATH; const token = process.env.DATABRICKS_TOKEN; const missing: string[] = []; if (!host) missing.push('DATABRICKS_HOST'); if (!path) missing.push('DATABRICKS_PATH'); if (!token) missing.push('DATABRICKS_TOKEN'); if (missing.length > 0) { throw new Error( `Missing required Databricks environment variables: ${missing.join(', ')}` ); } return { host, path, token } as DatabricksConfig;}export function getServerConfig(): ServerConfig { return { port: envConfig.PORT, logLevel: envConfig.LOG_LEVEL, corsOrigin: envConfig.CORS_ORIGIN, };}export function isProduction(): boolean { return coreIsProduction();}
The envConfig proxy from @reaatech/mcp-server-core lazily validates and caches environment variables at first access, giving you fail-fast startup semantics. The getDatabricksConfig() function adds its own validation for the three Databricks-specific variables.
Expected output:./src/config.ts exports getDatabricksConfig(), getServerConfig(), and isProduction(). Calling getDatabricksConfig() without the required env vars throws a descriptive error naming every missing variable.
Step 5: Build the Databricks SQL session manager
Create src/services/databricks.ts — a class that wraps the @databricks/sql Thrift driver and manages the connection lifecycle.
ts
import { DBSQLClient } from '@databricks/sql';import type { DatabricksConfig, QueryResult } from '../types/index.js';import { DatabricksError } from '../types/index.js';import type { QueryExecutor } from '../types/index.js';export class DatabricksSessionManager implements QueryExecutor { #client: DBSQLClient | null = null; async connect(config: DatabricksConfig): Promise<void> { const client = new DBSQLClient(); await client.connect({ host: config.host, path: config.path, token: config.token, }); this.#client = client; } async executeQuery(sql: string): Promise<QueryResult> { if (!this.#client) { throw new DatabricksError( 'NOT_CONNECTED', 'DatabricksSessionManager is not connected. Call connect() first.', ); } const client = this.#client; const session = await client.openSession(); try { const queryOperation = await session.executeStatement(sql); try { const data = await queryOperation.fetchAll(); let columns: string[] = []; const schema = await queryOperation.getSchema(); if (schema) { columns = schema.columns.map((col) => col.columnName); } return { columns, rows: data as Record<string, unknown>[], rowCount: data.length, }; } finally { await queryOperation.close(); } } catch (error) { throw new DatabricksError( 'DATABRICKS_QUERY_FAILED', error instanceof Error ? error.message : String(error), ); } finally { await session.close(); } } async close(): Promise<void> { if (this.#client) { await this.#client.close(); this.#client = null; } }}
Key design choices:
#client is an ECMAScript private field — the only way to interact is through connect(), executeQuery(), and close().
executeQuery() always cleans up: queryOperation.close() and session.close() run in finally blocks.
Errors from the Databricks driver are wrapped in DatabricksError with a machine-readable code.
close() is idempotent — calling it before connect() is a no-op.
Expected output:./src/services/databricks.ts exports DatabricksSessionManager. You can now connect to Databricks and run arbitrary SQL, getting back structured QueryResult objects.
Step 6: Compose the multi-tenant middleware
Create src/middleware/tenant.ts to wire together the tenant resolver, rate limiter, and tool visibility policies from the @reaatech multi-tenant packages.
TenantContextStore — AsyncLocalStorage-backed context propagation so any code downstream can call store.get() and see the current tenant.
HeaderTenantResolver — reads the x-tenant-id header from each request to identify the tenant.
MemoryRateLimitStore + DefaultRateLimiter — fixed-window rate limits per tenant (60 requests/min and 10,000 tokens/min by default).
toolVisibility — a basic plan sees only run_sql and list_tables; an enterprise plan sees all five tools.
Expected output:./src/middleware/tenant.ts exports createTenantMiddleware(), getTenantContext(), and resolveTenantAndRun(). The context store and resolver are module-level singletons.
Step 7: Implement the MCP tool handlers
Create src/tools/query.ts — this is where you define the five MCP tools and their execution logic.
ts
import { textContent, errorResponse } from '@reaatech/mcp-server-core';import { withSpan, recordToolInvocation, logger } from '@reaatech/mcp-server-observability';import { getTenantContext } from '../middleware/tenant.js';import type { QueryResult, ToolDefinition, ToolHandlerResult } from '../types/index.js';function applyRls(sql: string, tenantId: string): string { const trimmed = sql.trim(); if (/^SELECT\b/i.test(trimmed)) { return `SELECT * FROM (${trimmed}) AS user_query WHERE tenant_id = '${tenantId}'`;
The five tools are:
Tool
What it does
run_sql
Executes a SELECT query, applies RLS, formats results as a markdown table
list_tables
Queries information_schema.tables for a catalog of available tables
describe_table
Runs DESCRIBE TABLE with backtick-escaped table names
validate_sql
Runs EXPLAIN to check syntax without executing
estimate_rows
Wraps the query in COUNT(*) to estimate result size
Each handler wraps its logic in withSpan() for OpenTelemetry tracing and calls recordToolInvocation() for metrics. Row-level security (RLS) wraps every SELECT query in a subquery filtered by tenant_id, and large result sets are truncated at 1,000 rows.
Expected output:./src/tools/query.ts exports the createToolHandlers() factory which returns an object with tools (5 definitions) and handleToolCall() (a dispatcher).
Step 8: Wire up the Express MCP server
Create src/server.ts — the entry point that initializes observability, creates the Express app, connects to Databricks, and registers the MCP tools through the multi-tenant middleware.
initObservability() — bootstraps OpenTelemetry (no-op if OTEL_EXPORTER_OTLP_ENDPOINT is not set).
createApp() — builds the fully configured Express app with security headers, CORS, JSON body parsing, auth, rate limiting, idempotency, sanitization, health endpoints (/health, /ready, /live), and dual MCP transports (Streamable HTTP + SSE).
A raw MCP Server is created to hold the tool definitions.
DatabricksSessionManager connects to your Databricks warehouse.
createTenantMiddleware() wraps the MCP server with tenant resolution, rate limiting, and visibility filtering.
Tools are registered via middleware.handle() — the middleware filters tools/list results by the tenant’s visibility policy and enforces rate limits on tools/call before the handler runs.
A custom GET /databricks-health endpoint pings Databricks directly.
startServer() listens on the configured port and registers graceful shutdown handlers.
Expected output:./src/server.ts exports app and mcpServer. Running pnpm dev boots the server on port 8080 (or the PORT you configured).
Step 9: Write test fixtures and global test setup
Create tests/fixtures/databricks-responses.ts with reusable QueryResult objects:
Expected output: Three files under tests/fixtures/ and one tests/setup.ts that mocks all external packages. The setup file runs before every test suite.
Step 10: Write and run the test suite
Write tests/config.test.ts to cover the config module. This test verifies that getDatabricksConfig() parses environment variables correctly and throws when any are missing:
ts
import { vi, describe, it, expect, beforeEach } from 'vitest';const mockIsProduction = vi.hoisted(() => vi.fn(() => false));vi.mock('@reaatech/mcp-server-core', () => ({ envConfig: { PORT: 3000, LOG_LEVEL: 'info', CORS_ORIGIN: '*', }, isProduction: mockIsProduction,}));import { getDatabricksConfig, getServerConfig, isProduction } from '../src/config.js';describe('getDatabricksConfig', () => { beforeEach(() => { vi.unstubAllEnvs(); }); it('returns config when all three env vars are set', () => { vi.stubEnv('DATABRICKS_HOST', 'https://dbc-abc123.cloud.databricks.com'); vi.stubEnv('DATABRICKS_PATH', '/sql/1.0/warehouses/wh-abc'); vi.stubEnv('DATABRICKS_TOKEN', 'dapi-secret-token'); const config = getDatabricksConfig(); expect(config).toEqual({ host: 'https://dbc-abc123.cloud.databricks.com', path: '/sql/1.0/warehouses/wh-abc', token: 'dapi-secret-token', }); }); it('throws when DATABRICKS_HOST is missing', () => { vi.stubEnv('DATABRICKS_PATH', '/sql/1.0/warehouses/wh-abc'); vi.stubEnv('DATABRICKS_TOKEN', 'dapi-secret-token'); expect(() => getDatabricksConfig()).toThrow('DATABRICKS_HOST'); }); // Additional tests cover missing DATABRICKS_PATH, missing DATABRICKS_TOKEN, // all-three-missing, two-missing, and empty-string-as-missing for each variable. // See the full file: tests/config.test.ts in the finished project.});describe('getServerConfig', () => { it('returns default values from envConfig', () => { const config = getServerConfig(); expect(config).toEqual({ port: 3000, logLevel: 'info', corsOrigin: '*', }); });});
Write tests/databricks.test.ts for the session manager — it uses vi.hoisted() mocks to verify the full DBSQLClient lifecycle: connect, openSession, executeStatement, fetchAll, getSchema, operation close, and session close. It also validates error wrapping into DatabricksError with the DATABRICKS_QUERY_FAILED code, the NOT_CONNECTED code when connect() was never called, and that close() is safe to call multiple times or before connecting.
Write tests/middleware.test.ts for the tenant middleware — verifies that createTenantMiddleware() returns an object with a handle method, that getTenantContext() returns null when unset and { tenantId } when the context store has a value, and that resolveTenantAndRun() makes the tenant context visible inside the callback and propagates its return value and errors.
Write tests/query.test.ts for all five tool handlers — tests run_sql (formatted markdown output, 1000-row truncation, RLS wrapping, validation for empty/non-string/missing SQL), list_tables, describe_table (backtick escaping, validation), validate_sql (EXPLAIN prefixing), estimate_rows (COUNT(*) wrapping, RLS interaction), unknown tools (error response), and recordToolInvocation calls on every execution path.
Write tests/server.test.ts for the server module — dynamically imports src/server.js in beforeAll to verify that initObservability, createApp, and startServer are called at startup, that the MCP Server is created with name databricks-mcp-smb and version 0.1.0, and that GET /databricks-health returns 200 when the Databricks ping succeeds and 503 when it fails.
Write a tests/index.test.ts to verify the package’s public re-exports are present.
Now run the quality checks:
terminal
pnpm typecheck
Expected output: Zero TypeScript errors.
terminal
pnpm lint
Expected output: Zero ESLint errors and warnings.
terminal
pnpm test
Expected output: 86 tests pass across 6 test files with zero failures. Coverage thresholds of 90% across lines, branches, functions, and statements are met. The coverage report is written to vitest-report.json.
Next steps
Add JWT-based tenant resolution — swap HeaderTenantResolver for JWTTenantResolver from the same @reaatech/multi-tenant-mcp-tenant-resolver package. Set a JWT_SECRET and configure the claim field to extract the tenant ID from a signed token.
Distribute rate limiting with Redis — replace MemoryRateLimitStore with RedisRateLimitStore from @reaatech/multi-tenant-mcp-rate-limiter. This lets you run multiple server instances behind a load balancer with coordinated rate limits.
Deploy with Docker — write a multi-stage Dockerfile that runs the server on PORT=8080, passes Databricks and Langfuse credentials as environment variables, and uses the built-in /health endpoint for orchestrator probes.
}
return sql;
}
function sanitizeTableName(name: string): string {
return name.replace(/`/g, '``');
}
function formatMarkdownTable(columns: string[], rows: Record<string, unknown>[]): string {