Skip to content
reaatechREAATECH

@reaatech/media-pipeline-mcp-elevenlabs

npm v0.3.0

An ElevenLabs provider for the media pipeline framework that exposes a `MediaProvider` class (`ElevenLabsProvider`) with `execute`, `healthCheck`, and `estimateCost` methods for generating text-to-speech audio with configurable voice, speed, model, and output format.

@reaatech/media-pipeline-mcp-elevenlabs

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

ElevenLabs provider for the media pipeline framework. Delivers high-quality text-to-speech synthesis with configurable voice selection, speaking speed, voice stability tuning, similarity boost, and style exaggeration. Supports multiple output formats and native audio-byte streaming.

Installation

terminal
npm install @reaatech/media-pipeline-mcp-elevenlabs
# or
pnpm add @reaatech/media-pipeline-mcp-elevenlabs

Feature Overview

  • High-quality TTS with eleven_monolingual_v1, eleven_multilingual_v2, and eleven_turbo_v2 models
  • Named voice selection (Rachel, Josh, Daniel, Charlotte) plus custom voice IDs
  • Fine-grained voice tuning: stability (0-1), similarity boost (0-1), style exaggeration (0-1)
  • Speaking speed control via SSML prosody tags
  • Multiple output formats: MP3, WAV, OGG, FLAC, AAC
  • Streaming support for TTS audio bytes (supportsStreaming)
  • Character-count-based cost estimation

Quick Start

typescript
import { ElevenLabsProvider } from "@reaatech/media-pipeline-mcp-elevenlabs";
 
const provider = new ElevenLabsProvider({ apiKey: process.env.ELEVENLABS_API_KEY! });
 
const audio = await provider.execute({
  operation: "audio.tts",
  params: {
    text: "Welcome to our media pipeline. This audio was generated with ElevenLabs.",
    voice: "Rachel",
    speed: 1.0,
    model: "eleven_turbo_v2",
  },
  config: {},
});
 
// Save or pipe the audio
import { writeFileSync } from "node:fs";
writeFileSync("output.mp3", audio.data);
console.log(`Generated ${audio.metadata.characterCount} chars in ${audio.metadata.duration}s`);

Supported Operations

OperationDefault ModelDescriptionOutput Format
audio.ttseleven_monolingual_v1Text-to-speech with voice and parameter controlAudio bytes in mp3, wav, ogg, flac, or aac

Configuration Parameters

audio.tts

ParameterTypeDefaultDescription
textstringrequiredText to convert to speech
voicestringRachelVoice name (Rachel, Josh, Daniel, Charlotte) or custom voice ID
speednumber1.0Speaking rate multiplier (uses SSML prosody)
modelstringeleven_monolingual_v1TTS model ID
response_formatstringmp3Output audio format: mp3, wav, ogg, flac, aac

Voice Tuning (internal defaults)

The provider applies these voice settings automatically on every request:

ParameterDefaultDescription
stability0.5Voice stability (0 = more variable, 1 = more consistent)
similarity_boost0.75Speaker similarity to target voice (0-1)
style0.0Style exaggeration (0-1)
use_speaker_boosttrueEnhance speaker clarity

API Reference

ElevenLabsProvider

typescript
class ElevenLabsProvider extends MediaProvider {
  constructor(config: ElevenLabsProviderConfig)
 
  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

ElevenLabsProviderConfig

typescript
interface ElevenLabsProviderConfig {
  apiKey: string;
  voices?: {
    default?: string;
    [voiceName: string]: string | undefined;
  };
  model?: string;    // Default model ID
  timeout?: number;  // Request timeout in ms
}

Factory Function

typescript
import { defineElevenLabsProvider } from "@reaatech/media-pipeline-mcp-elevenlabs";
 
const provider = defineElevenLabsProvider({ apiKey: process.env.ELEVENLABS_API_KEY! });

Voice Resolution Logic

Voice parameters are resolved in this order:

  1. If a custom voices map is configured, the name is looked up there first
  2. If the value starts with voice_ or is exactly 20 characters, it’s treated as a raw voice ID
  3. If the name matches a built-in preset, that voice ID is used
  4. Falls back to "Rachel"

Key Methods

MethodReturnsDescription
healthCheck()ProviderHealthValidates API key by fetching /v1/voices from the ElevenLabs API
estimateCost(input)CostEstimateEstimates cost based on text character count × per-character rate
execute(input)ProviderOutputSynthesizes audio and returns raw audio bytes with metadata

Non-Retryable Errors

The provider classifies these errors as non-retryable: authentication failed, invalid API key, permission denied, insufficient credits, voice not found, invalid voice ID.

Cost Estimation

Per-Character Pricing

ModelCost / Character
eleven_turbo_v2$0.0002
eleven_monolingual_v1$0.0003
eleven_multilingual_v2$0.0005

Example Estimates

Text LengthModelEst. Cost
100 charseleven_turbo_v2$0.02
100 charseleven_monolingual_v1$0.03
500 charseleven_multilingual_v2$0.25

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters.

Deterministic parameters: text, voice_id, voice, model, voice_settings

Non-deterministic parameters: (none)

The normalize() function trims and collapses whitespace in text, and preserves voice settings as-is. All parameters are deterministic, so identical text + voice + model combinations will produce matching cache keys.

Health Check

The health check sends a GET request to https://api.elevenlabs.io/v1/voices using the xi-api-key header. Returns { healthy: true, latency: <ms> } on 2xx response, or { healthy: false, error: "<message>" } on failure.

License

MIT