Skip to content
reaatechREAATECH

@reaatech/media-pipeline-mcp-deepgram

npm v0.3.0

A Deepgram provider for the media-pipeline framework that exposes `audio.stt` and `audio.diarize` operations via a `DeepgramProvider` class, using Nova-2 for speech-to-text transcription with smart formatting, speaker diarization, and WebSocket streaming support.

@reaatech/media-pipeline-mcp-deepgram

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Deepgram provider for the media pipeline framework. Provides speech-to-text transcription with smart formatting and speaker diarization using the Nova-2 model. Supports native streaming via WebSocket frames and HMAC-signed webhook callbacks for async batch operations.

Installation

terminal
npm install @reaatech/media-pipeline-mcp-deepgram
# or
pnpm add @reaatech/media-pipeline-mcp-deepgram

Feature Overview

  • Speech-to-text transcription with Nova-2 (word-level timestamps, confidence scores)
  • Speaker diarization with labeled utterances and segment metadata
  • Smart formatting: auto-capitalization, punctuation, number/date normalization
  • Language detection and multi-language support
  • Streaming support for both operations (supportsStreaming)
  • Webhook support for async callbacks (supportsWebhooks)
  • SHA-256 hashing of raw audio in cache keys to avoid storing multi-megabyte buffers

Quick Start

typescript
import { DeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";
 
const provider = new DeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! });
 
// Transcribe audio to text
const result = await provider.execute({
  operation: "audio.stt",
  params: { audio_data: audioBuffer, language: "en", diarize: true },
  config: {},
});
console.log(JSON.parse(result.data.toString()).transcript);
 
// Diarize speakers in an audio recording
const speakers = await provider.execute({
  operation: "audio.diarize",
  params: { audio_data: meetingAudioBuffer, language: "en" },
  config: {},
});
const output = JSON.parse(speakers.data.toString());
console.log(`Found ${output.speakers} speakers across ${output.segments.length} segments`);

Supported Operations

OperationDefault ModelDescriptionOutput Format
audio.sttnova-2Speech-to-text with smart formatting, timestamps, and optional diarizationJSON with transcript, confidence, segments
audio.diarizenova-2Speaker identification with labeled utterances, start/end times, and confidenceJSON with speakers count and per-speaker segments

Configuration Parameters

audio.stt

ParameterTypeDefaultDescription
audio_dataBufferrequiredRaw audio data buffer
languagestringenBCP-47 language code
modelstringnova-2Model ID (nova-2, whisper)
diarizebooleanfalseEnable speaker diarization in STT output

audio.diarize

ParameterTypeDefaultDescription
audio_dataBufferrequiredRaw audio data buffer
languagestringenBCP-47 language code
modelstringnova-2Model ID

API Reference

DeepgramProvider

typescript
class DeepgramProvider extends MediaProvider {
  constructor(config: DeepgramProviderConfig)
 
  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

DeepgramProviderConfig

typescript
interface DeepgramProviderConfig {
  apiKey: string;
  models?: {
    stt?: string;      // Default: "nova-2"
    diarize?: string;  // Default: "nova-2"
  };
  timeout?: number;    // Request timeout in ms
}

Factory Function

typescript
import { defineDeepgramProvider } from "@reaatech/media-pipeline-mcp-deepgram";
 
const provider = defineDeepgramProvider({ apiKey: process.env.DEEPGRAM_API_KEY! });

Key Methods

MethodReturnsDescription
healthCheck()ProviderHealthValidates API key by fetching project info from the Deepgram API
estimateCost(input)CostEstimateEstimates cost based on audio size (bytes / 960KB per minute) and model per-minute rate
execute(input)ProviderOutputRuns STT or diarization, returns JSON output with transcript/segments metadata

Non-Retryable Errors

The provider classifies these errors as non-retryable: authentication failed, invalid API key, permission denied, insufficient credits, unsupported model, invalid audio format.

Cost Estimation

Per-Minute Pricing

ModelOperationCost / Minute
nova-2audio.stt$0.0059
nova-2audio.diarize$0.0079
whisperaudio.stt$0.0040

Cost is estimated by converting the audio buffer size to minutes (using 960KB/min as an approximation), then multiplying by the per-minute rate.

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters.

Deterministic parameters: audio_data (SHA-256 hashed), audio_url, model, language, diarize, punctuate, smart_format, utterances, detect_topics, detect_entities, redact

Non-deterministic parameters: request_id

Raw audio bytes are hashed with SHA-256 during normalization so cache keys remain compact. All boolean-style feature flags are coerced to booleans for consistent matching.

Health Check

The health check sends a GET request to https://api.deepgram.com/v1/projects using the configured API key. Returns { healthy: true, latency: <ms> } if the API responds with 2xx, or { healthy: false, error: "<message>" } on failure.

License

MIT