Skip to content
reaatech

@reaatech/voice-agent-core

npm v0.1.0

A Zod-validated configuration system and pipeline orchestrator for building voice-enabled AI agents, providing a `createPipeline()` function that coordinates STT, MCP, and TTS stages with latency enforcement, session management, and OpenTelemetry observability.

@reaatech/voice-agent-core

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Core pipeline orchestration, session management, latency enforcement, configuration, and types for building voice-enabled AI agents. Runtime dependencies are limited to zod and uuid; @opentelemetry/api is a peer dependency.

Installation

terminal
npm install @reaatech/voice-agent-core @opentelemetry/api
pnpm add @reaatech/voice-agent-core @opentelemetry/api

@opentelemetry/api is a required peer dependency — install it alongside core. It is a tiny, dependency-free package and acts as a no-op when no OpenTelemetry SDK is registered.

Feature Overview

  • Pipeline orchestrator — Full STT → MCP → TTS pipeline with event-driven lifecycle
  • Latency budget enforcer — Per-stage timing with hard caps, overflow detection, and metrics
  • Session manager — Multi-turn conversation state with TTL expiry and automatic cleanup
  • Transport abstraction — Pluggable Transport interface for multi-provider telephony support
  • Speech-to-speech pipelineSpeechToSpeechPipeline for OpenAI Realtime / Gemini Live single-hop mode
  • Provider failoverCompositeSTTProvider, CompositeTTSProvider, FailoverManager with circuit-breaking
  • VAD & endpointing — Pluggable voice activity detection with energy-based and semantic detectors
  • DTMF input — Keypad digit accumulation with inter-digit timeout and MCP integration
  • Thinking affordances — Filler audio during MCP processing to avoid dead air
  • Call recordingRecordingManager with memory/filesystem/S3 storage backends
  • Cost trackingCostTracker with real pricing for 12 providers and OTel metrics
  • Zod-validated configdefineConfig() with full TypeScript intellisense and runtime validation
  • Observability — OpenTelemetry tracing spans, histograms, and counters for every stage
  • Mock providers — Built-in MockSTTProvider, MockTTSProvider, and MockMCPClient for testing
  • 50+ exported typesAudioChunk, Utterance, AgentResponse, Session, Turn, and more

Quick Start

typescript
import { createPipeline, createLatencyBudget, initializeSessionManager } from '@reaatech/voice-agent-core';
 
const sessionManager = initializeSessionManager({
  defaultTTL: 3600,
  maxTurns: 20,
  maxTokens: 4000,
});
 
const latencyEnforcer = new LatencyBudgetEnforcer(
  createLatencyBudget({
    target: 800,
    hardCap: 1200,
    stt: 200,
    mcp: 400,
    tts: 200,
  })
);
 
const pipeline = createPipeline({
  sessionManager,
  latencyEnforcer,
  sttProvider: mySTTProvider,
  ttsProvider: myTTSProvider,
  mcpClient: myMCPClient,
  config: myConfig,
});
 
await pipeline.startSession({ sessionId: 'abc', status: 'active' });
pipeline.on('pipeline:turn:end', (event) => {
  console.log('Turn complete:', event.data.metrics);
});

API Reference

Types

TypeDescription
AudioChunkRaw audio buffer with sample rate, encoding, channels, timestamp
UtteranceTranscribed text with confidence, isFinal flag, timestamp
AgentResponseMCP agent output: text, tool calls, latency
SessionMulti-turn session with ID, TTL, conversation turns, status
TurnSingle conversation turn: user utterance, agent response, latency
PipelineEventTyped event from the pipeline with sessionId, turnId, data
LatencyBudgetPer-stage timing targets and hard caps
VoiceAgentKitConfigComplete kit configuration (MCP, STT, TTS, latency, session, barge-in)
TransportPluggable transport layer interface
S2SProviderSpeech-to-speech provider interface
VADProviderVoice activity detection provider interface
RecordingConfigCall recording configuration
CostTrackingConfigPer-call cost tracking configuration
PipelineModePipeline mode: ‘staged’ or ‘speech-to-speech’

Pipeline

typescript
class Pipeline extends EventEmitter {
  constructor(dependencies: PipelineDependencies);
  startSession(session: { sessionId: string; status: string }): Promise<void>;
  processAudioChunk(sessionId: string, chunk: AudioChunk): Promise<void>;
  bargeIn(sessionId: string): void;
  endSession(sessionId: string): Promise<void>;
  destroy(): void;
}

Pipeline events:

EventDescription
pipeline:startSession started
pipeline:stt:startSTT processing begun for a turn
pipeline:stt:interimInterim (non-final) transcript received
pipeline:stt:finalFinal transcript received
pipeline:stt:eosEnd-of-speech detected
pipeline:mcp:requestRequest sent to MCP server
pipeline:mcp:responseResponse received from MCP server
pipeline:tts:startTTS synthesis begun
pipeline:tts:first_byteFirst audio byte emitted from TTS
pipeline:tts:chunkAudio chunk emitted
pipeline:tts:completeTTS synthesis complete
pipeline:turn:endTurn complete with latency metrics
pipeline:errorError at any stage
pipeline:endSession ended

SpeechToSpeechPipeline

For speech-to-speech mode with providers like OpenAI Realtime or Gemini Live:

typescript
class SpeechToSpeechPipeline extends EventEmitter {
  startSession(session): Promise<void>;
  processAudioChunk(sessionId, chunk): Promise<void>;
  bargeIn(sessionId): void;
  endSession(sessionId): Promise<void>;
}

Created via createPipelineForMode(config) which automatically selects SpeechToSpeechPipeline when config.mode === 'speech-to-speech'.

SessionManager

typescript
class SessionManager {
  constructor(options: SessionManagerOptions);
  createSession(params: { callSid, mcpEndpoint, sttProvider, ttsProvider, metadata? }): Session;
  getSession(sessionId: string): Session | undefined;
  getSessionByCallSid(callSid: string): Session | undefined;
  updateSession(sessionId: string, updates: Partial<Session>): Session | undefined;
  addTurn(sessionId: string, turn: Omit<Turn, 'turnId'>): Turn | undefined;
  getConversationHistory(sessionId: string, maxTurns?: number): Turn[];
  closeSession(sessionId: string): boolean;
  getActiveSessionCount(): number;
  getAllSessions(): Session[];
  destroy(): void;
}
OptionTypeDefaultDescription
defaultTTLnumberSession time-to-live in seconds
maxTurnsnumberMaximum conversation turns retained per session
maxTokensnumberMaximum token budget (for future use)
cleanupIntervalnumber60000Interval for expired session cleanup in ms

LatencyBudgetEnforcer

typescript
class LatencyBudgetEnforcer extends EventEmitter {
  constructor(budget: LatencyBudget);
  startTurn(turnId: string): void;
  startStage(turnId: string, stage: string): void;
  endStage(turnId: string, stage: string): number;
  endTurn(turnId: string): LatencyMetrics;
  checkStageBudget(stage, elapsedMs): { withinBudget, remainingMs, exceeded };
  checkTotalBudget(elapsedMs): { withinTarget, withinHardCap, remainingTargetMs, remainingHardCapMs };
  getStageBudget(stage: 'stt' | 'mcp' | 'tts'): number;
  getTotalTargetBudget(): number;
  getTotalHardCap(): number;
}

Latency budget defaults:

StageTarget
STT200ms
MCP400ms
TTS200ms
Total800ms (hard cap 1200ms)

Configuration

typescript
import { defineConfig, loadConfig, getDefaultConfig, VoiceAgentKitConfigSchema } from '@reaatech/voice-agent-core';
 
const config = defineConfig({
  mcp: {
    endpoint: 'https://my-agent.example.com/mcp',
    timeout: 400,
  },
  stt: {
    provider: 'deepgram',
    model: 'nova-2',
    language: 'en',
    sampleRate: 8000,
  },
  tts: {
    provider: 'deepgram',
    voice: 'asteria',
    model: 'aura',
  },
  latency: {
    total: { target: 800, hardCap: 1200 },
    stages: { stt: 200, mcp: 400, tts: 200 },
  },
  session: {
    ttl: 3600,
    history: { maxTurns: 20, maxTokens: 4000 },
  },
  bargeIn: {
    enabled: true,
    minSpeechDuration: 300,
    confidenceThreshold: 0.7,
    silenceThreshold: 0.3,
  },
});

Observability

typescript
import { initializeObservability, getObservability, shutdownObservability } from '@reaatech/voice-agent-core';
 
await initializeObservability({
  serviceName: 'voice-agent-kit',
  serviceVersion: '1.0.0',
  enabled: true,
  otlpEndpoint: 'http://localhost:4318/v1/traces',
});
 
const obs = getObservability();
const span = obs.startSpan('voice.stt', { sessionId, provider: 'deepgram' });

OpenTelemetry metrics exported:

MetricTypeDescription
voice.turn.duration_msHistogramEnd-to-end turn latency
voice.stt.latency_msHistogramTime to final transcript
voice.tts.first_byte_msHistogramTime to first audio byte
voice.mcp.latency_msHistogramMCP round-trip time
voice.barge_in.countCounterBarge-in event count
voice.session.activeUpDownCounterActive session count
voice.latency_budget.exceededCounterBudget exceeded per stage
voice.cost.per_turnHistogramPer-turn cost in cents
voice.cost.totalCounterCumulative cost
voice.cost.per_minuteGaugeCost rate per minute

Transport

typescript
import type { Transport, TransportConfig, TransportSessionMetadata } from '@reaatech/voice-agent-core';

The Transport interface abstracts telephony/browser transport providers. Implementations exist for Twilio, Telnyx, SignalWire, Vonage, and WebRTC.

VAD & Endpointing

typescript
import { createVADProvider, EnergyVADProvider, SemanticEndpointDetector } from '@reaatech/voice-agent-core';
 
const vad = createVADProvider({ provider: 'energy', silenceTimeoutMs: 500 });
ProviderDescription
EnergyVADProviderRMS-based energy detection with adaptive noise floor
SemanticEndpointDetectorWraps any VAD with utterance-aware endpoint detection

Recording

typescript
import { createRecordingManager } from '@reaatech/voice-agent-core';
 
const recording = createRecordingManager({
  enabled: true,
  storage: 'filesystem',
  directory: './recordings',
  saveAudio: true,
  saveTranscript: true,
});
StorageDescription
memoryIn-memory storage with LRU eviction
filesystemSaves WAV + markdown transcript + JSON metadata to disk
s3Uploads to S3 (requires @aws-sdk/client-s3)

Cost Tracking

typescript
import { createCostTracker } from '@reaatech/voice-agent-core';
 
const cost = createCostTracker({
  enabled: true,
  currency: 'USD',
  providers: {
    deepgram: { stt: { pricePerMinute: 0.0059 }, tts: { pricePerCharacter: 0.000015 } },
  },
});
 
cost.trackSTTUsage(sessionId, turnId, audioDurationMs);
cost.trackTTSUsage(sessionId, turnId, characterCount);
const sessionCost = cost.getSessionCost(sessionId);

Mock Providers

typescript
import {
  MockSTTProvider,
  MockTTSProvider,
  MockMCPClient,
  createMockSTTProvider,
  createMockTTSProvider,
  createMockMCPClient,
} from '@reaatech/voice-agent-core';

Pre-built mock implementations for testing pipelines without live provider connections. MockSTT emits configurable utterances, MockTTS yields fake audio chunks, MockMCPClient returns canned responses.

License

MIT