Skip to content
reaatechREAATECH

@reaatech/media-pipeline-mcp-anthropic

npm v0.3.0

An Anthropic provider for the media pipeline framework that wraps Claude Sonnet's vision models to perform image description, OCR, table extraction, structured field extraction, and document summarization. It exports an `AnthropicProvider` class with an `execute()` method that accepts an operation name and parameters, and supports streaming token-by-token responses for all text-shaped operations.

@reaatech/media-pipeline-mcp-anthropic

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Anthropic provider for the media pipeline framework. Leverages Claude Sonnet’s vision-capable models for image description, OCR, table extraction, structured field extraction, and document summarization. Supports streaming token-by-token responses across all text-shaped operations.

Installation

terminal
npm install @reaatech/media-pipeline-mcp-anthropic
# or
pnpm add @reaatech/media-pipeline-mcp-anthropic

Feature Overview

  • Vision-based image description at three detail levels (brief, detailed, structured)
  • Document OCR with plain text, structured JSON, or markdown output
  • Table extraction from documents in markdown or JSON formats
  • Structured field extraction with configurable JSON schema
  • Document summarization with adjustable length and style
  • Streaming support for all operations (supportsStreaming)
  • Per-token cost tracking via usage.input_tokens / usage.output_tokens

Quick Start

typescript
import { AnthropicProvider } from "@reaatech/media-pipeline-mcp-anthropic";
 
const provider = new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! });
 
// Describe an image
const description = await provider.execute({
  operation: "image.describe",
  params: { image_data: imageBuffer, detail_level: "detailed", mime_type: "image/png" },
  config: {},
});
 
// OCR a scanned document
const text = await provider.execute({
  operation: "document.ocr",
  params: { image_data: docBuffer, output_format: "markdown", mime_type: "image/png" },
  config: {},
});
 
// Extract structured fields from an invoice
const fields = await provider.execute({
  operation: "document.extract_fields",
  params: {
    image_data: invoiceBuffer,
    field_schema: { invoice_number: "string", date: "date", total_amount: "number", vendor_name: "string" },
    mime_type: "image/png",
  },
  config: {},
});

Supported Operations

OperationDefault ModelDescriptionOutput Options
image.describeclaude-sonnet-4-20250514Vision-based image analysisbrief / detailed / structured
document.ocrclaude-sonnet-4-20250514Text extraction from document imagesplain_text / structured_json / markdown
document.extract_tablesclaude-sonnet-4-20250514Table extraction with structural parsingmarkdown / json
document.extract_fieldsclaude-sonnet-4-20250514Schema-driven field extractionJSON matching provided schema
document.summarizeclaude-sonnet-4-20250514Content summarization with style controlshort / medium / long / detailed

Configuration Parameters

image.describe

ParameterTypeDefaultDescription
image_dataBufferrequiredInput image as raw buffer
detail_levelstringdetailedDescription detail: brief, detailed, structured
mime_typestringimage/pngImage MIME type (image/png, image/jpeg, image/gif, image/webp)

document.ocr

ParameterTypeDefaultDescription
image_dataBufferrequiredDocument image as raw buffer
output_formatstringplain_textOutput format: plain_text, structured_json, markdown
mime_typestringimage/pngImage MIME type

document.extract_tables

ParameterTypeDefaultDescription
image_dataBufferrequiredDocument image as raw buffer
output_formatstringmarkdownOutput format: markdown, json
mime_typestringimage/pngImage MIME type

document.extract_fields

ParameterTypeDefaultDescription
image_dataBufferrequiredDocument image as raw buffer
field_schemaRecord<string, string>requiredJSON schema mapping field names to types (string, number, date, boolean)
mime_typestringimage/pngImage MIME type

document.summarize

ParameterTypeDefaultDescription
contentstringPlain text content to summarize (used if no image_data)
image_dataBufferDocument image as raw buffer (optional, for image-based docs)
lengthstringmediumSummary length: short (1-2 sentences), medium (1 paragraph), long (2-3 paragraphs), detailed
stylestringneutralWriting style
mime_typestringimage/pngImage MIME type (when using image_data)

API Reference

AnthropicProvider

typescript
class AnthropicProvider extends MediaProvider {
  constructor(config: AnthropicProviderConfig)
 
  healthCheck(): Promise<ProviderHealth>
  estimateCost(input: ProviderInput): Promise<CostEstimate>
  execute(input: ProviderInput): Promise<ProviderOutput>
}

AnthropicProviderConfig

typescript
interface AnthropicProviderConfig {
  apiKey: string;       // Anthropic API key
  model?: string;       // Default: "claude-sonnet-4-20250514"
  maxTokens?: number;   // Default: 4096
  timeout?: number;     // Request timeout in ms
}

Factory Function

typescript
import { defineAnthropicProvider } from "@reaatech/media-pipeline-mcp-anthropic";
 
const provider = defineAnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! });

Key Methods

MethodReturnsDescription
healthCheck()ProviderHealthValidates API connectivity by creating a minimal message
estimateCost(input)CostEstimateEstimates cost based on operation, model, and estimated token usage
execute(input)ProviderOutputRuns the requested operation and returns output with metadata

Non-Retryable Errors

The provider classifies these errors as non-retryable: authentication failed, invalid API key, permission denied, insufficient credits, content filtering, policy violation.

Cost Estimation

Token Pricing (per 1M tokens)

ModelInputOutput
claude-sonnet-4-20250514$3.00$15.00
claude-3-5-sonnet-20241022$3.00$15.00

Per-Operation Estimates

OperationEst. Input TokensEst. Output TokensEst. Cost
image.describe1,200300~$0.0081
document.ocr800300~$0.0069
document.extract_tables800300~$0.0069
document.extract_fields800300~$0.0069
document.summarize800300~$0.0069

Actual cost varies with token usage and model selection. Costs are computed from usage.input_tokens and usage.output_tokens returned by the API.

Cache Configuration

The provider exposes static cacheConfig with deterministic and non-deterministic parameters for intelligent response caching.

Deterministic parameters: prompt, model, system, max_tokens, temperature, top_p, top_k, stop_sequences, image_data, image_url, document_data

Non-deterministic parameters: metadata, user_id

The normalize() function trims whitespace, collapses spaces, and drops non-deterministic fields so that equivalent requests produce matching cache keys. Image bytes are not hashed separately; the image content itself forms part of the deterministic key set.

Health Check

The health check sends a lightweight message creation request (max_tokens: 10) to the Anthropic API to verify connectivity and API key validity. Returns { healthy: true, latency: <ms> } on success or { healthy: false, error: "<message>" } on failure.

License

MIT