Skip to content
reaatechREAATECH

@reaatech/llm-cache-server

pending npm

Provides a REST API server for managing LLM cache operations, including semantic search and exact-match lookups. It exposes a configurable HTTP interface that supports Redis, DynamoDB, and Qdrant backends via environment variables.

@reaatech/llm-cache-server

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

HTTP server wrapper for llm-cache providing a REST API for cache operations, Prometheus metrics, and health endpoints. Supports multiple storage and vector adapter backends via environment variables — deploy as a sidecar or centralized caching service.

Installation

terminal
npm install @reaatech/llm-cache-server
# or
pnpm add @reaatech/llm-cache-server

Feature Overview

  • REST API — JSON endpoints for get, set, and invalidate cache operations
  • Pluggable storage — switch between memory, redis, and dynamodb via STORAGE_ADAPTER
  • Pluggable vector search — switch between memory and qdrant via VECTOR_STORAGE_ADAPTER
  • API key authentication — Bearer token auth via LLM_CACHE_API_KEY (constant-time comparison)
  • Prometheus metricsGET /metrics returns Prometheus text exposition format
  • Health probesGET /health (liveness) and GET /ready (readiness with storage checks)
  • Correlation ID — every response carries X-Correlation-Id for distributed tracing
  • Configurable body limitMAX_BODY_BYTES caps incoming request size

Quick Start

CLI

terminal
export LLM_CACHE_API_KEY=my-secret-key
export OPENAI_API_KEY=sk-...
export STORAGE_ADAPTER=redis
export REDIS_URL=redis://localhost:6379
export VECTOR_STORAGE_ADAPTER=qdrant
export QDRANT_URL=http://localhost:6333
 
npx @reaatech/llm-cache-server
# → llm-cache server listening on port 3000

Docker

terminal
docker compose up

Programmatic

typescript
import { createApp, main } from "@reaatech/llm-cache-server";
 
// Option A: start the default server
main().catch(console.error);
 
// Option B: create the app and customize
const app = await createApp();
app.server.listen(3000, () => console.log("Listening on :3000"));
 
// Graceful shutdown
process.on("SIGTERM", () => app.shutdown().then(() => process.exit(0)));

API Reference

createApp(): Promise<App>

Creates a fully configured HTTP server with cache engine, storage adapters, and embedder. Configuration is loaded from environment variables via loadConfig().

typescript
import { createApp } from "@reaatech/llm-cache-server";
 
const app = await createApp();

App

PropertyTypeDescription
serverhttp.ServerNode.js HTTP server
cacheCacheEngineThe configured cache engine instance
shutdown() => Promise<void>Graceful shutdown — closes server and storage connections

main(): Promise<void>

Convenience function that calls createApp(), starts listening on the configured port, and registers SIGTERM/SIGINT handlers for graceful shutdown.

loadConfig(): ServerConfig

Loads and validates configuration from environment variables. Returns the full ServerConfig object.

typescript
import { loadConfig } from "@reaatech/llm-cache-server";
 
const config = loadConfig();
// → { port: 3000, storageAdapter: "redis", vectorStorageAdapter: "qdrant", ... }

ServerConfig

PropertyTypeDefaultDescription
portnumber3000HTTP server port
storageAdaptermemory" | "redis" | "dynamodbmemoryExact-match storage backend
vectorStorageAdaptermemory" | "qdrantmemorySemantic search backend
redisUrlstringRedis connection URL
dynamodbRegionstringAWS region for DynamoDB
dynamodbTablestringDynamoDB table name
dynamodbEndpointstringDynamoDB endpoint override
qdrantUrlstringQdrant server URL
qdrantCollectionstringQdrant collection name
qdrantApiKeystringQdrant API key
openaiApiKeystring(required)OpenAI API key for embeddings
openaiOrganizationstringOpenAI organization ID
apiKeystringBearer token for server authentication
maxBodyBytesnumber1048576Max request body size in bytes
cacheConfigCacheConfig(see env vars)Full cache configuration object

REST Endpoints

MethodPathAuthDescription
GET/healthNoLiveness probe — always returns 200
GET/readyNoReadiness probe — checks storage and vector backend health
POST/cache/getYesLookup a prompt; returns CacheResult
POST/cache/setYesStore a response; returns { id, cached }
POST/cache/invalidateYesInvalidate cache entries by criteria
GET/metricsYesPrometheus text or JSON metrics snapshot
GET/statsYesStorage and vector adapter stats

POST /cache/get

json
{
  "prompt": "What is TypeScript?",
  "options": {
    "model": "gpt-4",
    "modelVersion": "gpt-4-0613",
    "useCase": "qa"
  }
}

Response (hit):

json
{ "hit": true, "type": "exact", "entry": { /* CacheEntry */ }, "confidence": 1.0 }

POST /cache/set

json
{
  "prompt": "What is TypeScript?",
  "response": { "choices": [{ "message": { "content": "A typed superset of JavaScript" } }] },
  "options": { "model": "gpt-4", "modelVersion": "gpt-4-0613" },
  "metadata": { "queryType": "factual", "tokens": { "prompt": 10, "completion": 20 } }
}

POST /cache/invalidate

json
{
  "criteria": { "useCase": "qa", "modelVersion": "gpt-4-0613" }
}

Response:

json
{ "total": 42, "storage": 42, "vectorStorage": 0 }

Environment Variables

All environment variables used by the server. See .env.example for the complete annotated reference.

VariableRequiredDefaultAdapters
PORTNo3000All
LLM_CACHE_API_KEYNoAll (enables auth)
MAX_BODY_BYTESNo1048576All
OPENAI_API_KEYYesAll
OPENAI_ORGANIZATIONNoAll
STORAGE_ADAPTERNomemoryredis, dynamodb
REDIS_URLConditionalRedis
DYNAMODB_REGIONConditionalDynamoDB
DYNAMODB_TABLEConditionalDynamoDB
DYNAMODB_ENDPOINTNoDynamoDB
VECTOR_STORAGE_ADAPTERNomemoryqdrant
QDRANT_URLConditionalQdrant
QDRANT_COLLECTIONNollm-cacheQdrant
QDRANT_API_KEYNoQdrant
SIMILARITY_THRESHOLDNo0.8All
SIMILARITY_MAX_RESULTSNo10All
TTL_DEFAULTNo3600All
TTL_FACTUALNo1800All
TTL_CREATIVENo7200All
TTL_ANALYTICALNo3600All
TTL_SENSITIVENo600All
LOG_LEVELNoinfoAll
METRICS_ENABLEDNotrueAll

Usage Patterns

Authentication

Set LLM_CACHE_API_KEY to require Bearer token authentication on all /cache/* and /metrics endpoints. The comparison is constant-time to prevent timing attacks. Endpoints /health and /ready remain public.

terminal
export LLM_CACHE_API_KEY=my-secret-key
 
curl -X POST http://localhost:3000/cache/get \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is TypeScript?"}'

Redis + Qdrant (Production)

terminal
export STORAGE_ADAPTER=redis
export REDIS_URL=redis://:password@redis.internal:6379
export VECTOR_STORAGE_ADAPTER=qdrant
export QDRANT_URL=http://qdrant.internal:6333
export QDRANT_COLLECTION=llm-cache
export OPENAI_API_KEY=sk-...
 
npx @reaatech/llm-cache-server

DynamoDB + In-Memory Vector (Testing)

terminal
export STORAGE_ADAPTER=dynamodb
export DYNAMODB_REGION=us-east-1
export DYNAMODB_TABLE=llm-cache
export DYNAMODB_ENDPOINT=http://localhost:8000
export VECTOR_STORAGE_ADAPTER=memory
export OPENAI_API_KEY=sk-...
 
npx @reaatech/llm-cache-server

Docker Compose

The project’s docker-compose.yml starts Qdrant, Redis, and the cache server:

terminal
docker compose up
 
# Health check
curl http://localhost:3000/health
# → { "status": "ok", "timestamp": "..." }

License

MIT