These packages give you a semantic caching layer for LLM calls that returns cached responses for both exact prompt matches and semantically similar prompts above a configurable cosine similarity threshold. You'd adopt them to reduce API costs and latency by avoiding redundant LLM calls, especially when users ask the same question in different phrasings. The system is built as a modular engine with pluggable storage adapters (Redis, DynamoDB, Qdrant) and optional cost tracking, observability, and HTTP server packages that compose together through well-defined interfaces rather than a monolithic service.
A caching engine for LLM calls that provides both exact-match (SHA-256 hash) and semantic (cosine similarity on embeddings) cache lookups, with model-aware fingerprinting, use-case segmentation, and adaptive TTL. It exports a `CacheEngine` class that requires `StorageAdapter` and `VectorStorageAdapter` implementations (e.g., in-memory, Redis, DynamoDB) and an `Embedder` for semantic matching.
A DynamoDB storage adapter for `@reaatech/llm-cache` that persists exact-match cache entries with native TTL, GSI-backed metadata queries, and batch operations chunked to AWS limits. Exports a `DynamoDBAdapter` class implementing the `StorageAdapter` interface from `@reaatech/llm-cache`.
A Qdrant vector database adapter for `@reaatech/llm-cache` that implements the `VectorStorageAdapter` interface, providing HNSW approximate nearest neighbor search with metadata filtering and deterministic UUID-based point IDs.
A Redis storage adapter for the `@reaatech/llm-cache` library that implements the `StorageAdapter` interface, providing exact-match cache operations with automatic TTL via `SETEX`, batch operations, and metadata queries using `SCAN`.
A cost calculator and pricing database for LLM API usage, providing a `CostCalculator` class that computes per-request costs from token counts and model pricing, and tracks savings from cache hits. It ships with reference pricing for 40+ models across OpenAI, Anthropic, and Google, and implements the `CostCalculatorLike` interface for drop-in integration with `@reaatech/llm-cache`.
A structured JSON logger and Prometheus-compatible metrics collector for LLM cache operations, providing automatic PII redaction on 17 sensitive field names, correlation ID propagation via `child()`, and cardinality-protected counters and histograms with zero runtime dependencies.
An HTTP server wrapper for llm-cache that exposes a REST API for cache operations, Prometheus metrics, and health endpoints, configurable via environment variables for storage (memory, Redis, DynamoDB) and vector search (memory, Qdrant) backends. Exports `createApp()` returning an `App` object with an `http.Server`, cache engine instance, and `shutdown()` method, plus a `main()` convenience function for direct CLI or programmatic use.