Skip to content
reaatechREAATECH

@reaatech/voice-agent-core

pending npm

Orchestrates STT, MCP, and TTS pipelines for voice-enabled AI agents using an event-driven `Pipeline` class and session management utilities. It provides latency enforcement, OpenTelemetry-instrumented lifecycle hooks, and Zod-validated configuration schemas.

@reaatech/voice-agent-core

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Core pipeline orchestration, session management, latency enforcement, configuration, and types for building voice-enabled AI agents. Zero runtime dependencies beyond zod, uuid, and OpenTelemetry API.

Installation

terminal
npm install @reaatech/voice-agent-core
pnpm add @reaatech/voice-agent-core

Feature Overview

  • Pipeline orchestrator — Full STT → MCP → TTS pipeline with event-driven lifecycle
  • Latency budget enforcer — Per-stage timing with hard caps, overflow detection, and metrics
  • Session manager — Multi-turn conversation state with TTL expiry and automatic cleanup
  • Zod-validated configdefineConfig() with full TypeScript intellisense and runtime validation
  • Observability — OpenTelemetry tracing spans, histograms, and counters for every stage
  • Mock providers — Built-in MockSTTProvider, MockTTSProvider, and MockMCPClient for testing
  • 25+ exported typesAudioChunk, Utterance, AgentResponse, Session, Turn, and more

Quick Start

typescript
import { createPipeline, createLatencyBudget, initializeSessionManager } from '@reaatech/voice-agent-core';
 
const sessionManager = initializeSessionManager({
  defaultTTL: 3600,
  maxTurns: 20,
  maxTokens: 4000,
});
 
const latencyEnforcer = new LatencyBudgetEnforcer(
  createLatencyBudget({
    target: 800,
    hardCap: 1200,
    stt: 200,
    mcp: 400,
    tts: 200,
  })
);
 
const pipeline = createPipeline({
  sessionManager,
  latencyEnforcer,
  sttProvider: mySTTProvider,
  ttsProvider: myTTSProvider,
  mcpClient: myMCPClient,
  config: myConfig,
});
 
await pipeline.startSession({ sessionId: 'abc', status: 'active' });
pipeline.on('pipeline:turn:end', (event) => {
  console.log('Turn complete:', event.data.metrics);
});

API Reference

Types

TypeDescription
AudioChunkRaw audio buffer with sample rate, encoding, channels, timestamp
UtteranceTranscribed text with confidence, isFinal flag, timestamp
AgentResponseMCP agent output: text, tool calls, latency
SessionMulti-turn session with ID, TTL, conversation turns, status
TurnSingle conversation turn: user utterance, agent response, latency
PipelineEventTyped event from the pipeline with sessionId, turnId, data
LatencyBudgetPer-stage timing targets and hard caps
VoiceAgentKitConfigComplete kit configuration (MCP, STT, TTS, latency, session, barge-in)

Pipeline

typescript
class Pipeline extends EventEmitter {
  constructor(dependencies: PipelineDependencies);
  startSession(session: { sessionId: string; status: string }): Promise<void>;
  processAudioChunk(sessionId: string, chunk: AudioChunk): Promise<void>;
  bargeIn(sessionId: string): void;
  endSession(sessionId: string): Promise<void>;
  destroy(): void;
}

Pipeline events:

EventDescription
pipeline:startSession started
pipeline:stt:startSTT processing begun for a turn
pipeline:stt:interimInterim (non-final) transcript received
pipeline:stt:finalFinal transcript received
pipeline:stt:eosEnd-of-speech detected
pipeline:mcp:requestRequest sent to MCP server
pipeline:mcp:responseResponse received from MCP server
pipeline:tts:startTTS synthesis begun
pipeline:tts:first_byteFirst audio byte emitted from TTS
pipeline:tts:chunkAudio chunk emitted
pipeline:tts:completeTTS synthesis complete
pipeline:turn:endTurn complete with latency metrics
pipeline:errorError at any stage
pipeline:endSession ended

SessionManager

typescript
class SessionManager {
  constructor(options: SessionManagerOptions);
  createSession(params: { callSid, mcpEndpoint, sttProvider, ttsProvider, metadata? }): Session;
  getSession(sessionId: string): Session | undefined;
  getSessionByCallSid(callSid: string): Session | undefined;
  updateSession(sessionId: string, updates: Partial<Session>): Session | undefined;
  addTurn(sessionId: string, turn: Omit<Turn, 'turnId'>): Turn | undefined;
  getConversationHistory(sessionId: string, maxTurns?: number): Turn[];
  closeSession(sessionId: string): boolean;
  getActiveSessionCount(): number;
  getAllSessions(): Session[];
  destroy(): void;
}
OptionTypeDefaultDescription
defaultTTLnumberSession time-to-live in seconds
maxTurnsnumberMaximum conversation turns retained per session
maxTokensnumberMaximum token budget (for future use)
cleanupIntervalnumber60000Interval for expired session cleanup in ms

LatencyBudgetEnforcer

typescript
class LatencyBudgetEnforcer extends EventEmitter {
  constructor(budget: LatencyBudget);
  startTurn(turnId: string): void;
  startStage(turnId: string, stage: string): void;
  endStage(turnId: string, stage: string): number;
  endTurn(turnId: string): LatencyMetrics;
  checkStageBudget(stage, elapsedMs): { withinBudget, remainingMs, exceeded };
  checkTotalBudget(elapsedMs): { withinTarget, withinHardCap, remainingTargetMs, remainingHardCapMs };
  getStageBudget(stage: 'stt' | 'mcp' | 'tts'): number;
  getTotalTargetBudget(): number;
  getTotalHardCap(): number;
}

Latency budget defaults:

StageTarget
STT200ms
MCP400ms
TTS200ms
Total800ms (hard cap 1200ms)

Configuration

typescript
import { defineConfig, loadConfig, getDefaultConfig, VoiceAgentKitConfigSchema } from '@reaatech/voice-agent-core';
 
const config = defineConfig({
  mcp: {
    endpoint: 'https://my-agent.example.com/mcp',
    timeout: 400,
  },
  stt: {
    provider: 'deepgram',
    model: 'nova-2',
    language: 'en',
    sampleRate: 8000,
  },
  tts: {
    provider: 'deepgram',
    voice: 'asteria',
    model: 'aura',
  },
  latency: {
    total: { target: 800, hardCap: 1200 },
    stages: { stt: 200, mcp: 400, tts: 200 },
  },
  session: {
    ttl: 3600,
    history: { maxTurns: 20, maxTokens: 4000 },
  },
  bargeIn: {
    enabled: true,
    minSpeechDuration: 300,
    confidenceThreshold: 0.7,
    silenceThreshold: 0.3,
  },
});

Observability

typescript
import { initializeObservability, getObservability, shutdownObservability } from '@reaatech/voice-agent-core';
 
await initializeObservability({
  serviceName: 'voice-agent-kit',
  serviceVersion: '1.0.0',
  enabled: true,
  otlpEndpoint: 'http://localhost:4318/v1/traces',
});
 
const obs = getObservability();
const span = obs.startSpan('voice.stt', { sessionId, provider: 'deepgram' });

OpenTelemetry metrics exported:

MetricTypeDescription
voice.turn.duration_msHistogramEnd-to-end turn latency
voice.stt.latency_msHistogramTime to final transcript
voice.tts.first_byte_msHistogramTime to first audio byte
voice.mcp.latency_msHistogramMCP round-trip time
voice.barge_in.countCounterBarge-in event count
voice.session.activeUpDownCounterActive session count
voice.latency_budget.exceededCounterBudget exceeded per stage

Mock Providers

typescript
import {
  MockSTTProvider,
  MockTTSProvider,
  MockMCPClient,
  createMockSTTProvider,
  createMockTTSProvider,
  createMockMCPClient,
} from '@reaatech/voice-agent-core';

Pre-built mock implementations for testing pipelines without live provider connections. MockSTT emits configurable utterances, MockTTS yields fake audio chunks, MockMCPClient returns canned responses.

License

MIT