Skip to content
reaatechREAATECH

@reaatech/session-continuity-tokenizers

pending npm

Provides classes for calculating exact or heuristic token counts for OpenAI and Anthropic models, implementing the `TokenCounter` interface from `@reaatech/session-continuity`. It includes a factory for automatic model-based selection and requires `@anthropic-ai/tokenizer` as an optional peer dependency for Anthropic support.

@reaatech/session-continuity-tokenizers

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Token counting implementations implementing the TokenCounter interface from @reaatech/session-continuity. Provides three tokenizers — exact WASM-based tiktoken (OpenAI), exact Anthropic, and a fast heuristic estimator — plus a factory that auto-selects the right tokenizer by model name.

Installation

terminal
npm install @reaatech/session-continuity-tokenizers
# or
pnpm add @reaatech/session-continuity-tokenizers

For Anthropic token counting, install the optional peer dependency:

terminal
npm install @anthropic-ai/tokenizer

Feature Overview

  • TiktokenTokenizer — exact token counts for OpenAI models via WASM-based tiktoken (supports gpt-4, gpt-4o, gpt-3.5-turbo, text-davinci-003, embedding models)
  • AnthropicTokenizer — exact token counts for Anthropic models (lazy-loads @anthropic-ai/tokenizer; falls back gracefully if not installed)
  • EstimateTokenizer — fast heuristic: Math.ceil(text.length / charsPerToken) with configurable ratio
  • TokenizerFactory — auto-selects the correct tokenizer by model name; supports custom registry for user-defined models
  • Consistent message counting — per-message overhead (3 tokens for role) plus 3 tokens for the message list, accounting for tool calls/results

Quick Start

typescript
import {
  TiktokenTokenizer,
  AnthropicTokenizer,
  EstimateTokenizer,
  TokenizerFactory,
} from '@reaatech/session-continuity-tokenizers';
 
// Exact: OpenAI
const openai = new TiktokenTokenizer('gpt-4');
openai.count('Hello, world!'); // → exact token count
openai.countMessages(messages); // → token count with overhead
 
// Exact: Anthropic
const claude = new AnthropicTokenizer('claude-3-sonnet');
 
// Fast: heuristic
const estimate = new EstimateTokenizer(4); // 4 chars per token
 
// Auto-select by model name
const auto = TokenizerFactory.create('gpt-4o');

API Reference

TiktokenTokenizer

Constructor

typescript
new TiktokenTokenizer(model?: string)  // default: "gpt-4"

Model-to-encoding mappings:

ModelEncoding
gpt-4, gpt-4-turbo, gpt-4-32kcl100k_base
gpt-4o, gpt-4o-minio200k_base
gpt-3.5-turbocl100k_base
text-davinci-003p50k_base
text-embedding-ada-002, text-embedding-3-small, text-embedding-3-largecl100k_base

Unknown models fall back to cl100k_base.

Public Methods

MethodReturnsDescription
count(text)numberExact token count via WASM-based tiktoken
countMessages(messages)numberTotal tokens including per-message overhead and tool calls/results
dispose()voidFrees WASM encoding resources
model (getter)stringThe model name
tokenizer (getter)tiktokenTokenizer name

AnthropicTokenizer

Constructor

typescript
new AnthropicTokenizer(model?: string)  // default: "claude-3-sonnet"

Requires optional peer dependency @anthropic-ai/tokenizer. Lazy-loads on first count() call.

Public Methods

MethodReturnsDescription
count(text)numberExact token count via Anthropic tokenizer
countMessages(messages)numberTotal tokens with overhead and tool accounting
dispose()voidFrees encoding resources if supported
model (getter)stringThe model name
tokenizer (getter)anthropicTokenizer name

EstimateTokenizer

Constructor

typescript
new EstimateTokenizer(charsPerToken?: number)  // default: 4

Throws if charsPerToken <= 0.

Public Methods

MethodReturnsDescription
count(text)numberMath.ceil(text.length / charsPerToken)
countMessages(messages)numberEstimated total with overhead
dispose()voidNo-op
model (getter)estimateModel name
tokenizer (getter)estimateTokenizer name

TokenizerFactory

Static Methods

MethodSignatureDescription
create(model: string): TokenCounterAuto-selects by model name. OpenAI → TiktokenTokenizer, Anthropic → AnthropicTokenizer (falls back to EstimateTokenizer with warning if @anthropic-ai/tokenizer not installed). Custom-registered models use registered constructor. Ultimate fallback: EstimateTokenizer(4).
register(name: string, ctor: new () => TokenCounter): voidRegister a custom tokenizer
getSupportedModels(): string[]All known model names + registry keys
setLogger(logger: Logger | undefined): voidCustom logger for warnings (pass undefined to suppress)

Recognized model prefixes:

PrefixMatchesTokenizer
gpt-, text-davinci-, text-embedding-OpenAI modelsTiktokenTokenizer
claude-Anthropic modelsAnthropicTokenizer

Usage Patterns

With SessionManager

typescript
import { SessionManager } from '@reaatech/session-continuity';
import { TiktokenTokenizer } from '@reaatech/session-continuity-tokenizers';
 
const manager = new SessionManager({
  storage: myStorage,
  tokenCounter: new TiktokenTokenizer('gpt-4o'),
  tokenBudget: { maxTokens: 128000, reserveTokens: 4096, overflowStrategy: 'compress' },
});

Registering a Custom Tokenizer

typescript
import { TokenizerFactory } from '@reaatech/session-continuity-tokenizers';
import type { TokenCounter, Message } from '@reaatech/session-continuity';
 
class MyCustomTokenizer implements TokenCounter {
  readonly model = 'my-model';
  readonly tokenizer = 'custom';
 
  count(text: string): number {
    /* ... */ return 0;
  }
  countMessages(messages: Message[]): number {
    /* ... */ return 0;
  }
  dispose(): void {}
}
 
TokenizerFactory.register('my-model', () => new MyCustomTokenizer());
const tokenizer = TokenizerFactory.create('my-model');

Standalone Token Counting

typescript
import { TiktokenTokenizer } from '@reaatech/session-continuity-tokenizers';
import type { Message } from '@reaatech/session-continuity';
 
const tokenizer = new TiktokenTokenizer('gpt-4');
 
// Count a single string
const promptTokens = tokenizer.count('Explain quantum computing in simple terms.');
 
// Count an array of messages (includes role overhead)
const messages: Message[] = [
  { id: '1', sessionId: 's1', role: 'system', content: 'You are helpful.', createdAt: new Date() },
  { id: '2', sessionId: 's1', role: 'user', content: 'Hello!', createdAt: new Date() },
];
const totalTokens = tokenizer.countMessages(messages);
 
// Clean up when done
tokenizer.dispose();

License

MIT