Skip to content
reaatech

@reaatech/llm-cache

npm v0.1.0

A caching engine for LLM calls that provides both exact-match (SHA-256 hash) and semantic (cosine similarity on embeddings) cache lookups, with model-aware fingerprinting, use-case segmentation, and adaptive TTL. It exports a `CacheEngine` class that requires `StorageAdapter` and `VectorStorageAdapter` implementations (e.g., in-memory, Redis, DynamoDB) and an `Embedder` for semantic matching.

@reaatech/llm-cache

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Canonical caching engine for LLM calls — semantic and exact-match caching with embedding-based similarity matching, model-aware fingerprinting, use-case segmentation, and adaptive TTL.

Installation

terminal
npm install @reaatech/llm-cache
# or
pnpm add @reaatech/llm-cache

Feature Overview

  • Exact-match cache — SHA-256 hash of the full prompt for sub-millisecond cache hits
  • Semantic cache — Embed prompts and search for similar cached entries above a configurable cosine similarity threshold
  • Generation config fingerprinting — Model, temperature, top_p, system prompt, and tools are hashed so different configurations never collide
  • Use-case segmentation — Isolate caches by use case to prevent cross-contamination (e.g., summarization vs. classification)
  • Adaptive TTL — Different TTLs for factual, creative, analytical, and sensitive data
  • Cost-aware — Optional CostCalculatorLike integration for tracking savings per cache hit
  • Encryption-ready — Pluggable EncryptionService for encrypting prompts, responses, and embeddings at the storage layer
  • Zod-validated configCacheConfigSchema validates the full configuration object at startup

Quick Start

typescript
import { CacheEngine, InMemoryAdapter, OpenAIEmbedder } from "@reaatech/llm-cache";
 
const cache = new CacheEngine({
  storage: new InMemoryAdapter(),
  vectorStorage: new InMemoryAdapter(),
  embedder: new OpenAIEmbedder({
    provider: "openai",
    model: "text-embedding-3-small",
    dimensions: 1536,
    apiKey: process.env.OPENAI_API_KEY,
  }),
  config: {
    storage: { adapter: "memory" },
    vectorStorage: { adapter: "memory" },
    embedding: {
      provider: "openai",
      model: "text-embedding-3-small",
      dimensions: 1536,
      batchSize: 100,
      maxRetries: 3,
    },
    similarity: { threshold: 0.8, metric: "cosine", maxResults: 10 },
    ttl: {
      default: 3600,
      factual: 1800,
      creative: 7200,
      analytical: 3600,
      sensitive: 600,
      byUseCase: {},
    },
    segmentation: { enabled: true, defaultUseCase: "general" },
    cost: { enabled: true, currency: "USD" },
    observability: { metrics: true, tracing: false, logging: "info" },
  },
});
 
// Store a response
await cache.set(
  "What is TypeScript?",
  { answer: "A typed superset of JavaScript" },
  { model: "gpt-4", modelVersion: "gpt-4-0613" },
);
 
// Exact match — < 1ms
const exact = await cache.get("What is TypeScript?", {
  model: "gpt-4",
  modelVersion: "gpt-4-0613",
});
// → { hit: true, type: "exact", entry: {...} }
 
// Semantic match — uses embedding similarity
const semantic = await cache.get("Tell me about TypeScript", {
  model: "gpt-4",
  modelVersion: "gpt-4-0613",
});
// → { hit: true, type: "semantic", confidence: 0.92, entry: {...} }

API Reference

CacheEngine

The main caching orchestrator. Performs multi-stage lookup: exact match → semantic search → cache miss.

typescript
import { CacheEngine } from "@reaatech/llm-cache";
 
const engine = new CacheEngine({ storage, vectorStorage, embedder, config });

CacheEngineDependencies

PropertyTypeRequiredDescription
storageStorageAdapterYesExact-match metadata store (e.g., InMemoryAdapter, RedisAdapter, DynamoDBAdapter)
vectorStorageVectorStorageAdapterYesVector search store for semantic matching (e.g., InMemoryAdapter, QdrantAdapter)
embedderEmbeddingProviderYesEmbedding generation (e.g., OpenAIEmbedder)
configCacheConfigYesFull cache configuration (Zod-validated)
costCalculatorCostCalculatorLikeNoOptional cost tracking integration
encryptionServiceEncryptionServiceNoOptional encryption for prompts/responses/embeddings

Methods

MethodReturnsDescription
get(prompt, options?)Promise<CacheResult>Look up a prompt: exact → semantic → miss
set(prompt, response, options?, metadata?)Promise<CacheEntry>Store a response and its embedding
invalidate(criteria)Promise<InvalidateResult>Delete entries matching criteria (useCase, modelVersion, olderThan, etc.)
healthCheck()Promise<{ storage: HealthStatus; vectorStorage: HealthStatus }>Check storage and vector backend health

CacheOptions

PropertyTypeDescription
useCasestringCache segment namespace
modelstringModel identifier
modelVersionstringSpecific model version
generationConfigHashstringPre-computed fingerprint (auto-generated if omitted)
temperaturenumberSampling temperature (affects fingerprint)
topPnumberNucleus sampling parameter (affects fingerprint)
maxTokensnumberMax completion tokens (affects fingerprint)
systemPromptstringSystem prompt (affects fingerprint)
toolsunknown[]Tool definitions (affect fingerprint)
responseFormattext" | "json_object" | "json_schemaResponse format (affects fingerprint)

CacheResult

A discriminated union returned by get():

typescript
type CacheResult =
  | { hit: true; type: "exact" | "semantic"; entry: CacheEntry; confidence?: number; similarity?: number; cachedAt: Date; age: number }
  | { hit: false; reason: "not_found" | "below_threshold" | "expired" | "dimension_mismatch" };

CacheMetadata

Pass to set() to control TTL, sensitivity, and token tracking:

PropertyTypeDescription
queryTypefactual" | "creative" | "analyticalDetermines TTL from config
ttlnumberOverride TTL in seconds
sensitivebooleanMark entry for encryption and shorter TTL
tokens{ prompt: number; completion: number }Token usage for cost calculation

InvalidationCriteria

PropertyTypeDescription
useCasestringInvalidate all entries in a use case
modelVersionstringInvalidate by model version
generationConfigHashstringInvalidate by fingerprint
embeddingModelstringInvalidate by embedding model
olderThanDateInvalidate entries created before this time
promptHashstringInvalidate a specific prompt hash

Adapters & Embedder

ExportDescription
InMemoryAdapterIn-memory storage/vector adapter with LRU eviction and TTL cleanup
OpenAIEmbedderOpenAI embedding provider with batch processing and retry
SimilarityMatcherCosine similarity matcher with configurable threshold

Config

ExportDescription
CacheConfigTypeScript interface for the full configuration tree
CacheConfigSchemaZod schema — use safeParse to validate at startup

Utility Functions

ExportDescription
buildPromptHash(prompt)SHA-256 hex hash of a prompt string
buildCacheFingerprint(options)SHA-256 hash of the generation configuration
buildExactMatchKey(options)Composite key: promptHash:generationConfigHash

Encryption

ExportDescription
EncryptionServiceAES-256-GCM encryption for prompts, responses, and embeddings
EncryptedPayloadType for the encrypted output ({ ciphertext, iv, tag })

Usage Patterns

Use Case Segmentation

typescript
// Each use case has an isolated cache namespace
await cache.set("classify: spam", { label: "spam" }, {
  model: "gpt-4",
  modelVersion: "gpt-4-0613",
  useCase: "classification",
});
 
// Same prompt in a different use case will miss
const result = await cache.get("classify: spam", {
  model: "gpt-4",
  modelVersion: "gpt-4-0613",
  useCase: "summarization",
});
// → { hit: false, reason: "not_found" }

Sensitive Data Handling

typescript
const entry = await cache.set(
  "Patient: John Doe, SSN: 123-45-6789",
  response,
  { model: "gpt-4", modelVersion: "gpt-4-0613" },
  { sensitive: true, ttl: 600 },
);
// Entry gets the config's `sensitive` TTL (600s default)
// Encryption is applied if encryptionService is configured

Model Rotation

typescript
// After upgrading from gpt-4 to gpt-4-turbo, invalidate old entries
const removed = await cache.invalidate({ modelVersion: "gpt-4-0613" });
console.log(`Cleared ${removed.total} old model entries`);
 
// New requests with gpt-4-turbo will generate fresh cache entries
const result = await cache.get(prompt, {
  model: "gpt-4",
  modelVersion: "gpt-4-turbo",
});

License

MIT

@reaatech/llm-cache — Products — REAA Technologies