@reaatech/media-pipeline-mcp-deepgram

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Deepgram provider for the media pipeline framework. Provides speech-to-text transcription with smart formatting and speaker diarization using the Nova-2 model. Supports native streaming via WebSocket frames and HMAC-signed webhook callbacks for async batch operations.

Installation

terminal

npm install @reaatech/media-pipeline-mcp-deepgram
# or
pnpm add @reaatech/media-pipeline-mcp-deepgram

Feature Overview

Speech-to-text transcription with Nova-2 (word-level timestamps, confidence scores)
Speaker diarization with labeled utterances and segment metadata
Smart formatting: auto-capitalization, punctuation, number/date normalization
Language detection and multi-language support
Streaming support for both operations (supportsStreaming)
Webhook support for async callbacks (supportsWebhooks)
SHA-256 hashing of raw audio in cache keys to avoid storing multi-megabyte buffers

Quick Start

typescript

import { DeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";
 
const provider = new DeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! });
 
// Transcribe audio to text
const result = await provider.execute({
  operation: "audio.stt",
  params: { audio_data: audioBuffer, language: "en", diarize: true },
  config: {},
});
console.log(JSON.parse(result.data.toString()).transcript);
 
// Diarize speakers in an audio recording
const speakers = await provider.execute({
  operation: "audio.diarize",
  params: { audio_data: meetingAudioBuffer, language: "en" },
  config: {},
});
const output = JSON.parse(speakers.data.toString());
console.log(`Found ${output.speakers} speakers across ${output.segments.length} segments`);

Supported Operations

Operation	Default Model	Description	Output Format
`audio.stt`	`nova-2`	Speech-to-text with smart formatting, timestamps, and optional diarization	JSON with `transcript`, `confidence`, `segments`
`audio.diarize`	`nova-2`	Speaker identification with labeled utterances, start/end times, and confidence	JSON with `speakers` count and per-speaker `segments`

Configuration Parameters

`audio.stt`

Parameter	Type	Default	Description
`audio_data`	`Buffer`	required	Raw audio data buffer
`language`	`string`	`en`	BCP-47 language code
`model`	`string`	`nova-2`	Model ID (`nova-2`, `whisper`)
`diarize`	`boolean`	`false`	Enable speaker diarization in STT output

`audio.diarize`

Parameter	Type	Default	Description
`audio_data`	`Buffer`	required	Raw audio data buffer
`language`	`string`	`en`	BCP-47 language code
`model`	`string`	`nova-2`	Model ID

API Reference

`DeepgramProvider`

typescript

class DeepgramProvider extends MediaProvider {
  constructor(config: DeepgramProviderConfig)
 
  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

`DeepgramProviderConfig`

typescript

interface DeepgramProviderConfig {
  apiKey: string;
  models?: {
    stt?: string;      // Default: "nova-2"
    diarize?: string;  // Default: "nova-2"
  };
  timeout?: number;    // Request timeout in ms
}

Factory Function

typescript

import { defineDeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";
 
const provider = defineDeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! });

Key Methods

Method	Returns	Description
`healthCheck()`	`ProviderHealth`	Validates API key by fetching project info from the Deepgram API
`estimateCost(input)`	`CostEstimate`	Estimates cost based on audio size (bytes / 960KB per minute) and model per-minute rate
`execute(input)`	`ProviderOutput`	Runs STT or diarization, returns JSON output with transcript/segments metadata

Non-Retryable Errors

The provider classifies these errors as non-retryable: authentication failed, invalid API key, permission denied, insufficient credits, unsupported model, invalid audio format.

Cost Estimation

Per-Minute Pricing

Model	Operation	Cost / Minute
`nova-2`	`audio.stt`	$0.0059
`nova-2`	`audio.diarize`	$0.0079
`whisper`	`audio.stt`	$0.0040

Cost is estimated by converting the audio buffer size to minutes (using 960KB/min as an approximation), then multiplying by the per-minute rate.

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters.

Deterministic parameters: audio_data (SHA-256 hashed), audio_url, model, language, diarize, punctuate, smart_format, utterances, detect_topics, detect_entities, redact

Non-deterministic parameters: request_id

Raw audio bytes are hashed with SHA-256 during normalization so cache keys remain compact. All boolean-style feature flags are coerced to booleans for consistent matching.

Health Check

The health check sends a GET request to https://api.deepgram.com/v1/projects using the configured API key. Returns { healthy: true, latency: <ms> } if the API responds with 2xx, or { healthy: false, error: "<message>" } on failure.

@reaatech/media-pipeline-mcp-provider-core — Base provider class
@reaatech/media-pipeline-mcp-server — MCP server
@reaatech/media-pipeline-mcp-openai — Alternative STT provider (Whisper-1)

License

MIT

@reaatech/media-pipeline-mcp-deepgram

@reaatech/media-pipeline-mcp-deepgram

Installation

Feature Overview

Quick Start

Supported Operations

Configuration Parameters

audio.stt

audio.diarize

API Reference

DeepgramProvider

DeepgramProviderConfig

Factory Function

Key Methods

Non-Retryable Errors

Cost Estimation

Per-Minute Pricing

Cache Configuration

Health Check

Related Packages

License

`audio.stt`

`audio.diarize`

`DeepgramProvider`

`DeepgramProviderConfig`