These packages provide a framework for building multi-step media processing workflows, such as image generation, audio transcription, and document extraction, as chainable tools within an MCP server. You would adopt them to automate complex media pipelines that require artifact passing, quality validation, and multi-provider orchestration. The system is designed around a unified pipeline engine that uses variable interpolation to pass outputs between steps while enforcing quality gates and resilience patterns like circuit breakers.
Orchestrates media processing workflows through a `PipelineExecutor` class that handles sequential step execution, artifact tracking, and quality gate validation. It provides a Zod-based type system for defining pipelines and integrates with custom providers to manage media operations, cost tracking, and lifecycle events.
An Anthropic provider for the media pipeline framework that wraps Claude Sonnet's vision models to perform image description, OCR, table extraction, structured field extraction, and document summarization. It exports an `AnthropicProvider` class with an `execute()` method that accepts an operation name and parameters, and supports streaming token-by-token responses for all text-shaped operations.
A factory function that creates an `AudioGenOperations` instance providing text-to-speech, speech-to-text, speaker diarization, source separation, music generation, and sound effects, with automatic multi-provider routing to OpenAI, ElevenLabs, Deepgram, or any conformant provider.
A ComfyUI provider for the media pipeline framework that runs image generation, image editing, and video generation on your own GPU via local ComfyUI workflows, with zero API cost. It exports a `ComfyUIProvider` class that accepts a `baseUrl` pointing to a running ComfyUI instance and optionally a `workflowsDir` for custom JSON workflows.
Core framework for media pipeline orchestration, providing a Zod-validated type system, pipeline execution engine with variable interpolation, quality gate evaluation, artifact registry, budget enforcement, persistence-based resume, cost tracking, event bus, and a configurable mock provider for testing.
A typed cost ledger for tracking per-operation expenses in a media pipeline, providing an `InMemoryCostLedger` class with `charge()`, `preflight()`, `totalForRun()`, and `totalForTenant()` methods that support USD micro-precision, run-scoped and tenant-scoped queries, and preflight budget checks.
A Deepgram provider for the media-pipeline framework that exposes `audio.stt` and `audio.diarize` operations via a `DeepgramProvider` class, using Nova-2 for speech-to-text transcription with smart formatting, speaker diarization, and WebSocket streaming support.
A factory function (`createDocumentExtractionOperations`) that returns a `DocumentExtractionOperations` instance providing OCR, table extraction, schema-driven field extraction, and content summarization, delegating each operation to registered LLM providers (e.g., Google, Anthropic, OpenAI) with automatic fallback chains.
An ElevenLabs provider for the media pipeline framework that exposes a `MediaProvider` class (`ElevenLabsProvider`) with `execute`, `healthCheck`, and `estimateCost` methods for generating text-to-speech audio with configurable voice, speed, model, and output format.
A Fal.ai provider for the media pipeline framework that exposes a `FalProvider` class supporting image generation, upscaling, background removal, text-to-video, and image-to-video operations via the fal.ai API, with native webhook support and streaming queue events for long-running tasks.
A Google Cloud provider for the media-pipeline framework that exposes Document AI (OCR, table extraction, field extraction) and Vertex AI Gemini (image description) as a unified set of operations via an `execute` method on the `GoogleProvider` class.
An image editing operations factory that provides Sharp-based local processing (resize, crop, composite) and provider-delegated operations (upscale, background removal, inpainting, image description) through a multi-provider routing system.