@reaatech/voice-agent-tts

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Provider-agnostic text-to-speech interface with five adapter implementations: Deepgram Aura, AWS Polly, Google Cloud Text-to-Speech, ElevenLabs, and Cartesia. Streaming audio output via AsyncIterable<AudioChunk>, cancelable synthesis, and Twilio-ready audio formatting.

Installation

terminal

npm install @reaatech/voice-agent-tts
pnpm add @reaatech/voice-agent-tts

Provider SDKs (install only what you use)

The cloud adapters load their provider SDKs lazily and declare them as optional peer dependencies, so you only install the SDK for the provider you actually use. Deepgram needs no extra SDK.

terminal

# AWS Polly
npm install @aws-sdk/client-polly @aws-sdk/credential-provider-ini
 
# Google Cloud Text-to-Speech
npm install @google-cloud/text-to-speech

Feature Overview

Unified TTS interface — TTSProvider with synthesize() returning AsyncIterable<AudioChunk>
Deepgram Aura adapter — Low-latency HTTP/2 streaming with voice selection and mulaw encoding
AWS Polly adapter — Neural engine with SSML support, multiple voice IDs, sample rate configuration
Google Cloud TTS adapter — 220+ voices, speaking rate, pitch, volume control, and SSML gender
ElevenLabs adapter — Streaming HTTP/2 with ultra-realistic voices (Turbo v2.5, Flash v2.5)
Cartesia adapter — Ultra-low latency streaming with Sonic model and emotion control
Cancelable synthesis — cancel() stops in-progress TTS immediately (barge-in support)
Twilio audio formatting — Automatic mulaw 8kHz conversion via formatAudioForTwilio()
Silence generation — createSilenceChunk() for injecting pauses between utterances
Text chunking — chunkTextForStreaming() to split long responses for streaming TTS
Provider factory — createTTSProvider() for runtime provider selection

Quick Start

typescript

import { DeepgramTTSProvider } from '@reaatech/voice-agent-tts';
 
const tts = new DeepgramTTSProvider();
 
for await (const chunk of tts.synthesize('Hello, how can I help you today?', {
  provider: 'deepgram',
  apiKey: process.env.DEEPGRAM_API_KEY,
  voice: 'asteria',
  model: 'aura',
  encoding: 'mulaw',
  sampleRate: 8000,
})) {
  // Send chunk.buffer to Twilio Media Stream
  twilioHandler.sendAudio(chunk);
}

API Reference

TTSProvider Interface

typescript

interface TTSProvider {
  readonly name: string;
  synthesize(text: string, config: DeepgramTTSConfig | AWSPollyConfig | GoogleCloudTTSConfig): AsyncIterable<AudioChunk>;
  readonly supportsStreaming: boolean;
  readonly firstByteLatencyMs: number | null;
  cancel(): void;
  connect?(config: unknown): Promise<void>;
}

TTSProviderInterface (Static Utilities)

typescript

class TTSProviderInterface {
  static formatAudioForTwilio(chunk: AudioChunk): AudioChunk;
  static createSilenceChunk(durationMs: number, sampleRate?: number): AudioChunk;
  static chunkTextForStreaming(text: string, maxChunkSize?: number): string[];
}

Method	Description
`formatAudioForTwilio`	Converts any audio chunk to mulaw 8kHz for Twilio Media Streams
`createSilenceChunk`	Creates a mulaw silence buffer of specified duration (default 8kHz)
`chunkTextForStreaming`	Splits long text at sentence boundaries for sentence-by-sentence TTS

DeepgramTTSProvider

typescript

class DeepgramTTSProvider implements TTSProvider {
  readonly name = 'deepgram';
  readonly supportsStreaming = true;
  constructor(options?: DeepgramTTSOptions);
  getLastFirstByteLatency(): number | null;
}
 
interface DeepgramTTSOptions {
  apiUrl?: string;   // default: 'api.deepgram.com'
  version?: string;  // default: 'v1'
}
 
interface DeepgramTTSConfig extends TTSConfig {
  model?: 'aura';
  voice?: string;        // e.g., 'asteria', 'luna', 'stella', 'arcas'
  encoding?: 'mulaw' | 'linear16' | 'pcm';
  sampleRate?: number;   // 8000, 16000, 24000, 48000
  container?: 'none' | 'wav';
}

AWSPollyProvider

typescript

class AWSPollyProvider extends EventEmitter implements TTSProvider {
  readonly name = 'aws-polly';
  readonly supportsStreaming = true;
  constructor(options?: AWSPollyOptions);
  connect(config: AWSPollyConfig): Promise<void>;
  onError(cb: (error: Error) => void): void;
  close(): Promise<void>;
  isConnected(): boolean;
}
 
interface AWSPollyOptions {
  region?: string;          // default: 'us-east-1'
  defaultVoiceId?: string;  // default: 'Joanna'
  defaultEngine?: Engine;   // default: NEURAL
}
 
interface AWSPollyConfig extends TTSConfig {
  region: string;
  voiceId?: string;          // Joanna, Matthew, Salli, etc.
  engine?: 'standard' | 'neural';
  languageCode?: string;
  sampleRate?: number;       // 8000, 16000, 22050
  textType?: 'text' | 'ssml';
}

GoogleCloudTTSProvider

typescript

class GoogleCloudTTSProvider implements TTSProvider {
  readonly name = 'google-cloud-tts';
  readonly supportsStreaming = true;
  constructor(options?: GoogleCloudTTSOptions);
  getLastFirstByteLatency(): number | null;
}
 
interface GoogleCloudTTSOptions {
  projectId?: string;
  keyFilename?: string;
}
 
interface GoogleCloudTTSConfig extends TTSConfig {
  projectId: string;
  voiceName?: string;              // e.g., 'en-US-Standard-A'
  languageCode?: string;           // e.g., 'en-US'
  ssmlGender?: 'MALE' | 'FEMALE' | 'NEUTRAL';
  audioEncoding?: 'MP3' | 'LINEAR16' | 'OGG_OPUS' | 'MULAW' | 'ALAW';
  sampleRateHertz?: number;
  speakingRate?: number;           // 0.25–4.0
  pitch?: number;                  // -20.0–20.0
  volumeGainDb?: number;           // -96.0–16.0
}

ElevenLabsProvider

typescript

class ElevenLabsProvider implements TTSProvider {
  readonly name = 'elevenlabs';
  readonly supportsStreaming = true;
  constructor(options?: ElevenLabsOptions);
  getLastFirstByteLatency(): number | null;
}
 
interface ElevenLabsConfig extends TTSConfig {
  modelId?: 'eleven_turbo_v2_5' | 'eleven_flash_v2_5';
  voiceId?: string;
  stability?: number;
  similarityBoost?: number;
  optimizeStreamingLatency?: number;
  outputFormat?: 'mp3_44100' | 'pcm_8000' | 'mulaw_8000';
}

Streaming HTTP/2 adapter for ElevenLabs ultra-realistic voices. Supports latency optimization and multiple output formats.

CartesiaProvider

typescript

class CartesiaProvider implements TTSProvider {
  readonly name = 'cartesia';
  readonly supportsStreaming = true;
  constructor(options?: CartesiaOptions);
  getLastFirstByteLatency(): number | null;
}
 
interface CartesiaConfig extends TTSConfig {
  modelId?: 'sonic' | 'sonic-2';
  voiceId?: string;
  speed?: 'slowest' | 'slow' | 'normal' | 'fast' | 'fastest';
  emotion?: 'anger' | 'positivity' | 'surprise' | 'sadness' | 'curiosity' | 'neutral';
  language?: string;
  outputFormat?: 'raw' | 'wav' | 'mp3';
  sampleRate?: number;
}

Ultra-low latency streaming adapter with Sonic model and emotion control. Sub-100ms P50 latency for real-time use.

Provider Factory

typescript

import { createTTSProvider } from '@reaatech/voice-agent-tts';
 
const tts = createTTSProvider({
  provider: 'deepgram',             // 'deepgram' | 'aws-polly' | 'google-cloud-tts' | 'elevenlabs' | 'cartesia'
  config: { provider: 'deepgram', apiKey: '...' },
});

Usage Patterns

Barge-In (Cancel In-Progress TTS)

typescript

// Start TTS
const ttsStream = tts.synthesize(text, config);
 
// User interrupts — cancel immediately
tts.cancel();
// The synthesize() generator will exit cleanly

Sentence-Level Streaming for Low Latency

typescript

import { TTSProviderInterface } from '@reaatech/voice-agent-tts';
 
const sentences = TTSProviderInterface.chunkTextForStreaming(longText, 200);
 
for (const sentence of sentences) {
  for await (const chunk of tts.synthesize(sentence, config)) {
    handler.sendAudio(chunk);
  }
}

Silence Between Utterances

typescript

import { TTSProviderInterface } from '@reaatech/voice-agent-tts';
 
// 500ms silence gap
const silence = TTSProviderInterface.createSilenceChunk(500);
handler.sendAudio(silence);

@reaatech/voice-agent-core — Core types, pipeline, config
@reaatech/voice-agent-stt — Speech-to-text providers
@reaatech/voice-agent-telephony — Twilio Media Streams handler

License

MIT

@reaatech/voice-agent-tts

@reaatech/voice-agent-tts

Installation

Provider SDKs (install only what you use)

Feature Overview

Quick Start

API Reference

TTSProvider Interface

TTSProviderInterface (Static Utilities)

DeepgramTTSProvider

AWSPollyProvider

GoogleCloudTTSProvider

ElevenLabsProvider

CartesiaProvider

Provider Factory

Usage Patterns

Barge-In (Cancel In-Progress TTS)

Sentence-Level Streaming for Low Latency

Silence Between Utterances

Related Packages

License