DocsGuidesFull-Text Search

Full-Text Search

TopGun’s InvertedIndex provides fast full-text search with O(K) performance, where K is the number of matching tokens. On 100K documents, text search drops from 50-100ms (full scan) to under 1ms.

Token-Based Search

Inverted index maps tokens to documents for instant lookups.

Flexible Tokenization

Configure tokenizers and filters for your use case.

CRDT Integration

Indexes update automatically on set, remove, and merge operations.

Basic Usage

Add an InvertedIndex to enable text search on a field:

Basic Text Search
import {
  IndexedLWWMap,
  simpleAttribute,
  HLC
} from '@topgunbuild/core';

interface Product {
  id: string;
  name: string;
  description: string;
  tags: string[];
}

const hlc = new HLC('node-1');
const products = new IndexedLWWMap<string, Product>(hlc);

// Add inverted index on text fields
const nameAttr = simpleAttribute<Product, string>('name', p => p.name);
products.addInvertedIndex(nameAttr);

// Add some products
products.set('p1', {
  id: 'p1',
  name: 'Wireless Bluetooth Mouse',
  description: 'Ergonomic design with 6 buttons',
  tags: ['electronics', 'wireless', 'mouse']
});

products.set('p2', {
  id: 'p2',
  name: 'USB-C Wireless Keyboard',
  description: 'Mechanical switches with RGB',
  tags: ['electronics', 'wireless', 'keyboard']
});

// Search for products containing "wireless"
const results = products.queryValues({
  type: 'contains',
  attribute: 'name',
  value: 'wireless'
});
// Returns both products (both names contain "wireless")

Query Types

InvertedIndex supports three query types with different matching semantics:

Query Types
// 1. contains - All tokens must match (AND semantics)
const wireless = products.queryValues({
  type: 'contains',
  attribute: 'name',
  value: 'wireless mouse'  // Matches: "wireless" AND "mouse"
});
// Returns: [{ name: 'Wireless Bluetooth Mouse', ... }]

// 2. containsAll - All specified values must match
const withTags = products.queryValues({
  type: 'containsAll',
  attribute: 'name',
  values: ['wireless', 'bluetooth']
});
// Returns: [{ name: 'Wireless Bluetooth Mouse', ... }]

// 3. containsAny - Any token matches (OR semantics)
const anyMatch = products.queryValues({
  type: 'containsAny',
  attribute: 'name',
  values: ['keyboard', 'mouse']
});
// Returns both products
Query TypeSemanticsUse Case
containsAll tokens must match (AND)Search box with multiple words
containsAllAll values must be presentFilter by required tags
containsAnyAny token matches (OR)Search with alternatives

Tokenization Pipeline

Text is processed through a tokenization pipeline before indexing and searching:

Tokenization Pipeline
import {
  TokenizationPipeline,
  WordBoundaryTokenizer,
  LowercaseFilter,
  MinLengthFilter,
  StopWordFilter
} from '@topgunbuild/core';

// Simple pipeline (default)
const simple = TokenizationPipeline.simple();
simple.process("Hello World!");
// → ["hello", "world"]

// Search pipeline (with stop words removed)
const search = TokenizationPipeline.search();
search.process("The quick brown fox");
// → ["quick", "brown", "fox"]  ("the" removed as stop word)

// Custom pipeline
const custom = TokenizationPipeline.custom(
  new WordBoundaryTokenizer(),
  [
    new LowercaseFilter(),
    new MinLengthFilter(3),     // Min 3 characters
    new StopWordFilter()        // Remove common words
  ]
);
custom.process("I have a wireless mouse");
// → ["wireless", "mouse"]  ("i", "have", "a" removed)

Pre-built Pipelines

PipelineTokenizerFiltersBest For
simple()WordBoundaryLowercase, MinLength(2)General use
search()WordBoundaryLowercase, MinLength(2), StopWordsSearch engines
minimal()WordBoundaryLowercase onlyPreserve all tokens

Available Components

Tokenizers & Filters
// Available tokenizers:
// - WhitespaceTokenizer: splits on whitespace
// - WordBoundaryTokenizer: splits on word boundaries (default)
// - NGramTokenizer: generates n-grams for substring matching

// Available filters:
// - LowercaseFilter: converts to lowercase
// - MinLengthFilter(n): removes tokens shorter than n
// - MaxLengthFilter(n): removes tokens longer than n
// - StopWordFilter: removes common words (the, a, is, etc.)
// - TrimFilter: trims whitespace
// - UniqueFilter: removes duplicate tokens

Custom Pipelines

Create custom pipelines for specialized use cases:

N-Gram Pipeline for Substring Matching
import {
  IndexedLWWMap,
  simpleAttribute,
  TokenizationPipeline,
  NGramTokenizer,
  LowercaseFilter,
  HLC
} from '@topgunbuild/core';

// N-gram pipeline for substring matching
const ngramPipeline = TokenizationPipeline.custom(
  new NGramTokenizer(3),  // 3-character n-grams
  [new LowercaseFilter()]
);

const products = new IndexedLWWMap<string, Product>(hlc);

// Use custom pipeline for description field
const descAttr = simpleAttribute<Product, string>('description', p => p.description);
products.addInvertedIndex(descAttr, ngramPipeline);

// Now substring matches work
products.set('p1', { id: 'p1', description: 'Ergonomic mouse' });
const results = products.queryValues({
  type: 'contains',
  attribute: 'description',
  value: 'rgo'  // Matches "Ergonomic" via n-gram
});

N-gram tip: N-grams enable substring matching but increase memory usage significantly. Use only when needed (e.g., autocomplete, partial name matching).

Multi-Value Fields

Index array fields like tags using multiAttribute:

Indexing Array Fields
import { multiAttribute } from '@topgunbuild/core';

// Index an array field (tags)
const tagsAttr = multiAttribute<Product, string>('tags', p => p.tags);
products.addInvertedIndex(tagsAttr);

// Each tag becomes searchable
const wirelessProducts = products.queryValues({
  type: 'contains',
  attribute: 'tags',
  value: 'wireless'
});

Combined Queries

Combine text search with other index types for powerful filtering:

Combining with Other Indexes
// Combine text search with other predicates
const results = products.queryValues({
  type: 'and',
  children: [
    { type: 'contains', attribute: 'name', value: 'wireless' },
    { type: 'eq', attribute: 'status', value: 'active' },
    { type: 'between', attribute: 'price', from: 10, to: 100 }
  ]
});

// Query optimizer will use:
// - InvertedIndex for "contains"
// - HashIndex for "eq" (if indexed)
// - NavigableIndex for "between" (if indexed)

The Query Optimizer automatically selects the best index for each part of the query.

For search-engine-style relevance ranking, TopGun provides BM25 full-text search. Unlike InvertedIndex (boolean matching), BM25 scores documents by relevance and returns results sorted by score.

Relevance Ranking

BM25 algorithm scores documents based on term frequency and document length.

Porter Stemming

Words are stemmed (running→run) for better matching with 174 English stopwords.

Basic Usage

Enable BM25 search on an IndexedORMap:

BM25 Ranked Search
import { IndexedORMap, HLC } from '@topgunbuild/core';

interface Article {
  title: string;
  body: string;
  author: string;
}

const hlc = new HLC('node-1');
const articles = new IndexedORMap<string, Article>(hlc);

// Enable BM25 full-text search on title and body fields
articles.enableFullTextSearch({
  fields: ['title', 'body']
});

// Add some articles
articles.add('a1', {
  title: 'Introduction to Machine Learning',
  body: 'Machine learning is a subset of artificial intelligence...',
  author: 'Alice'
});

articles.add('a2', {
  title: 'Deep Learning Tutorial',
  body: 'Deep learning uses neural networks with many layers...',
  author: 'Bob'
});

articles.add('a3', {
  title: 'Getting Started with AI',
  body: 'Artificial intelligence is transforming industries...',
  author: 'Charlie'
});

// Search with BM25 relevance ranking
const results = articles.search('machine learning');
// Results sorted by relevance score:
// [
//   { key: 'a1', score: 2.34, matchedTerms: ['machin', 'learn'], value: {...} },
//   { key: 'a2', score: 0.89, matchedTerms: ['learn'], value: {...} }
// ]

Search Options

Control search behavior with options:

Search Options
// Search with options
const results = articles.search('artificial intelligence', {
  limit: 10,           // Maximum results to return
  minScore: 0.5,       // Minimum relevance score threshold
  boost: {             // Field boosting weights
    title: 2.0,        // Title matches worth 2x
    body: 1.0          // Body matches worth 1x (default)
  }
});

// Results are ranked by weighted BM25 scores
for (const result of results) {
  console.log(`[${result.score.toFixed(2)}] ${result.value.title}`);
  console.log(`  Matched terms: ${result.matchedTerms.join(', ')}`);
}

Configuration

Customize tokenization and BM25 parameters:

BM25 Configuration
// Configure tokenizer and BM25 parameters
articles.enableFullTextSearch({
  fields: ['title', 'body', 'tags'],

  // Tokenizer options
  tokenizer: {
    minLength: 2,        // Minimum token length (default: 2)
    maxLength: 50,       // Maximum token length (default: 50)
    lowercase: true,     // Convert to lowercase (default: true)
    // Uses Porter stemmer and 174 English stopwords by default
  },

  // BM25 scoring parameters
  bm25: {
    k1: 1.2,  // Term frequency saturation (default: 1.2)
              // Higher = more weight to term frequency
    b: 0.75   // Document length normalization (default: 0.75)
              // 0 = no normalization, 1 = full normalization
  }
});

BM25 vs InvertedIndex

Choose the right approach for your use case:

Comparison
// Two approaches to full-text search:

// 1. InvertedIndex - Boolean matching (fast, no ranking)
const nameAttr = simpleAttribute<Product, string>('name', p => p.name);
products.addInvertedIndex(nameAttr);
const matches = products.queryValues({
  type: 'contains',
  attribute: 'name',
  value: 'wireless mouse'
});
// Returns all products containing both "wireless" AND "mouse"
// No relevance score, order is arbitrary

// 2. BM25 Search - Relevance ranking (search engine style)
products.enableFullTextSearch({ fields: ['name', 'description'] });
const ranked = products.search('wireless mouse');
// Returns products sorted by relevance
// "Wireless Bluetooth Mouse" ranks higher than "Mouse pad for wireless setup"
// Each result includes: score, matchedTerms
FeatureInvertedIndexBM25 Search
Query methodqueryValues({ type: 'contains' })search('query')
RankingNo ranking (boolean match)Relevance-sorted by score
StemmingOptional (via pipeline)Built-in Porter stemmer
StopwordsOptional (via pipeline)174 English stopwords
Best forFiltering, exact matchesSearch boxes, content discovery
Works withIndexedLWWMap, IndexedORMapIndexedORMap

Index Persistence

Serialize and restore the BM25 index:

Index Serialization
// Serialize index for persistence
const ftsIndex = articles.getFullTextIndex();
const serialized = ftsIndex.serialize();

// Save to storage (IndexedDB, localStorage, file, etc.)
localStorage.setItem('fts-index', JSON.stringify(serialized));

// Later: restore from storage
const saved = JSON.parse(localStorage.getItem('fts-index'));
ftsIndex.load(saved);

// Index is ready to use immediately
const results = articles.search('machine learning');

Performance tip: BM25 index builds in <100ms for 1K documents. Search queries complete in <10ms. For large datasets, consider serializing the index to avoid rebuilding on page load.

For multi-client applications, TopGun supports server-side BM25 search. The server maintains indexes centrally, eliminating the need for each client to build and store indexes locally.

Centralized Indexes

Server maintains FTS indexes, clients query via WebSocket.

Permission-Based

Search respects RBAC - users need READ permission to search a map.

Server Configuration

Enable FTS on the server for specific maps:

Server Setup
import { ServerCoordinator } from '@topgunbuild/server';

const server = new ServerCoordinator({
  port: 8080,
  // Enable full-text search for specific maps
  fullTextSearch: {
    articles: {
      fields: ['title', 'body'],
      tokenizer: { minLength: 2 },
      bm25: { k1: 1.2, b: 0.75 }
    },
    products: {
      fields: ['name', 'description', 'tags']
    }
  }
});

await server.start();
// Server now maintains FTS indexes for 'articles' and 'products' maps
// Indexes are automatically updated when data changes
// Indexes are backfilled from storage on startup

The server will:

  • Create and maintain BM25 indexes for configured maps
  • Automatically update indexes when data changes (add/update/remove)
  • Backfill indexes from persistent storage on startup

Client API

Search from the client using client.search():

Client Search
import { TopGunClient } from '@topgunbuild/client';

const client = new TopGunClient({
  serverUrl: 'ws://localhost:8080'
});

await client.authenticate({ token: 'user-token' });

// Search articles on the server
const results = await client.search<Article>('articles', 'machine learning', {
  limit: 20,
  minScore: 0.5,
  boost: { title: 2.0, body: 1.0 }
});

// Results are sorted by relevance
for (const result of results) {
  console.log(`[${result.score.toFixed(2)}] ${result.key}: ${result.value.title}`);
  console.log(`  Matched: ${result.matchedTerms.join(', ')}`);
}

Search Result Structure

Each result includes the document key, value, relevance score, and matched terms:

SearchResult Interface
// SearchResult<T> interface
interface SearchResult<T> {
  key: string;        // Document key
  value: T;           // Full document value
  score: number;      // BM25 relevance score
  matchedTerms: string[];  // Stemmed terms that matched
}

// Example result
const result: SearchResult<Article> = {
  key: 'a1',
  value: { title: 'Introduction to ML', body: '...' },
  score: 2.34,
  matchedTerms: ['machin', 'learn']  // Stemmed
};

Permissions

Server-side search respects the security model - users must have READ permission on a map to search it:

Permissions
// Server security configuration
const server = new ServerCoordinator({
  port: 8080,
  fullTextSearch: {
    articles: { fields: ['title', 'body'] }
  },
  security: {
    permissions: {
      // Users need READ permission to search a map
      articles: {
        read: ['user', 'admin'],
        write: ['admin']
      }
    }
  }
});

// Client-side: search requires READ permission
try {
  const results = await client.search('articles', 'query');
} catch (error) {
  // Error: Permission denied for map: articles
}

When to use server-side search: Use server-side search when you have multiple clients, need centralized index management, or want to offload search computation from clients. Use local BM25 (IndexedORMap) for offline-first apps where clients need to search without server connectivity.

Live Search Subscriptions

For real-time search UIs, TopGun provides Live Search subscriptions that automatically update when matching documents change. Unlike the one-shot search() method, searchSubscribe() pushes delta updates to your callback.

Real-Time Updates

Results update automatically when documents are added, modified, or removed.

Delta Updates

Server sends ENTER/UPDATE/LEAVE events instead of full result sets.

Client API

Create a live search subscription with searchSubscribe():

Live Search Subscription
import { TopGunClient } from '@topgunbuild/client';

const client = new TopGunClient({
  serverUrl: 'ws://localhost:8080'
});

// Create a live search subscription
const handle = client.searchSubscribe<Article>('articles', 'machine learning', {
  limit: 20,
  minScore: 0.5,
  boost: { title: 2.0, body: 1.0 }
});

// Subscribe to result changes (includes initial results + delta updates)
const unsubscribe = handle.subscribe((results) => {
  console.log('Search results updated:', results.length);
  for (const result of results) {
    console.log(`[${result.score.toFixed(2)}] ${result.key}: ${result.value.title}`);
  }
});

// Get current results snapshot at any time
const snapshot = handle.getResults();

// Update query dynamically (re-subscribes automatically)
handle.setQuery('deep learning');

// Cleanup when done
handle.dispose();

Delta Update Types

The server sends incremental updates instead of full result sets:

Delta Updates
// Delta update types:
// - ENTER: Document now matches the query (was added or score increased above minScore)
// - UPDATE: Document still matches but score/value changed
// - LEAVE: Document no longer matches (was removed or score dropped below minScore)

// The SearchHandle maintains a sorted result set internally.
// Your subscribe callback receives the full sorted array on each change.

// Example: Building a real-time search UI
const handle = client.searchSubscribe<Product>('products', 'wireless');

handle.subscribe((results) => {
  // Results are always sorted by score (highest first)
  renderSearchResults(results);
});

// When a product matching "wireless" is added to the 'products' map,
// or an existing product's text is updated to include "wireless",
// your callback fires with the updated results array.
Update TypeWhen Sent
ENTERDocument now matches query (added or score increased above threshold)
UPDATEDocument still matches but score or value changed
LEAVEDocument no longer matches (removed or score dropped below threshold)

In clustered environments, live search subscriptions work across all nodes. When you subscribe to a search query, you receive real-time updates for matching documents regardless of which node owns the data.

// Connect to any node in the cluster
const client = new TopGunClient({
  serverUrl: 'ws://node1:8080'
});

// Subscribe to search - automatically distributed across cluster
const handle = client.searchSubscribe<Article>('articles', 'machine learning');

handle.subscribe((results) => {
  // Results include matches from ALL cluster nodes
  // Updates push automatically when documents change on ANY node
  console.log('Results from all nodes:', results.length);
});

How it works:

  1. Subscription broadcast: The coordinator node registers the search subscription on all cluster nodes
  2. Local evaluation: Each node maintains a local FTS index and evaluates changes against subscriptions
  3. Targeted updates: When a document changes, only the owning node sends an update (via CLUSTER_SUB_UPDATE)
  4. RRF merging: Initial results are merged using Reciprocal Rank Fusion for consistent relevance ordering
  5. Client delivery: The coordinator forwards delta updates to the client

This architecture eliminates the need to broadcast every data change to all nodes - only subscription-relevant updates are sent to coordinators.

Scalability: Each node evaluates subscriptions locally against its FTS index. Update messages flow only to coordinators with active subscriptions, making this approach efficient even with many concurrent subscriptions.

React Integration

The @topgunbuild/react package provides the useSearch hook for easy integration with React applications:

Basic Usage

useSearch Hook
import { useSearch } from '@topgunbuild/react';

function SearchResults() {
  const [searchTerm, setSearchTerm] = useState('');

  const { results, loading, error } = useSearch<Article>('articles', searchTerm, {
    limit: 20,
    boost: { title: 2.0 }
  });

  if (loading) return <Spinner />;
  if (error) return <div>Error: {error.message}</div>;

  return (
    <ul>
      {results.map(r => (
        <li key={r.key}>
          [{r.score.toFixed(2)}] {r.value.title}
          <small>Matched: {r.matchedTerms.join(', ')}</small>
        </li>
      ))}
    </ul>
  );
}

With Debounce

For search-as-you-type interfaces, use the debounceMs option to avoid excessive server requests:

Debounced Search
import { useSearch } from '@topgunbuild/react';
import { useState } from 'react';

function SearchInput() {
  const [input, setInput] = useState('');

  // Debounce search queries by 300ms to avoid excessive server requests
  const { results, loading, error } = useSearch<Product>('products', input, {
    debounceMs: 300,  // Wait 300ms after user stops typing
    limit: 10,
    minScore: 0.5
  });

  return (
    <div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        placeholder="Search products..."
      />
      {loading && <span>Searching...</span>}
      {error && <span className="error">{error.message}</span>}
      <ul>
        {results.map(r => (
          <li key={r.key}>{r.value.name} - ${r.value.price}</li>
        ))}
      </ul>
    </div>
  );
}

useSearch Return Values

PropertyTypeDescription
resultsSearchResult<T>[]Current search results sorted by score
loadingbooleanTrue while waiting for initial results
errorError | nullError if search failed

useSearch Options

OptionTypeDescription
limitnumberMaximum results to return
minScorenumberMinimum BM25 score threshold
boostRecord<string, number>Field boost weights
debounceMsnumberDebounce delay for query changes

See Also: The React Hooks Reference for all available hooks.

Hybrid Queries

Hybrid queries combine full-text search with traditional filter predicates in a single query. This is powerful for building faceted search UIs where users can search by text while also filtering by category, price range, date, etc.

FTS + Filters

Combine match() with equal(), greaterThan(), between() in one query.

_score Sorting

Sort results by BM25 relevance score alongside other sort fields.

useHybridQuery Hook

The useHybridQuery hook provides a React-friendly way to create hybrid queries:

Basic Hybrid Query
import { useHybridQuery } from '@topgunbuild/react';
import { Predicates } from '@topgunbuild/core';

function TechArticles() {
  // Combine FTS with traditional filters
  const { results, loading, error } = useHybridQuery<Article>('articles', {
    predicate: Predicates.and(
      Predicates.match('body', 'machine learning'),  // FTS predicate
      Predicates.equal('category', 'tech')           // Filter predicate
    ),
    sort: { _score: 'desc' },  // Sort by relevance
    limit: 20
  });

  if (loading) return <Spinner />;
  if (error) return <div>Error: {error.message}</div>;

  return (
    <ul>
      {results.map(r => (
        <li key={r._key}>
          [{r._score?.toFixed(2)}] {r.value.title}
          <small>Matched: {r._matchedTerms?.join(', ')}</small>
        </li>
      ))}
    </ul>
  );
}

Dynamic Filters

Build complex filter UIs with dynamic predicates:

Dynamic Filters
import { useHybridQuery } from '@topgunbuild/react';
import { Predicates } from '@topgunbuild/core';
import { useState, useMemo } from 'react';

function SearchWithFilters() {
  const [searchTerm, setSearchTerm] = useState('');
  const [category, setCategory] = useState('all');
  const [priceMax, setPriceMax] = useState(1000);

  // Build predicate dynamically
  const filter = useMemo(() => {
    const conditions = [];

    // Add FTS if search term exists
    if (searchTerm.trim()) {
      conditions.push(Predicates.match('description', searchTerm));
    }

    // Add category filter
    if (category !== 'all') {
      conditions.push(Predicates.equal('category', category));
    }

    // Add price filter
    conditions.push(Predicates.lessThanOrEqual('price', priceMax));

    return {
      predicate: conditions.length > 1
        ? Predicates.and(...conditions)
        : conditions[0],
      sort: searchTerm ? { _score: 'desc' } : { createdAt: 'desc' },
      limit: 20
    };
  }, [searchTerm, category, priceMax]);

  const { results, loading } = useHybridQuery<Product>('products', filter);

  return (
    <div>
      <input
        value={searchTerm}
        onChange={(e) => setSearchTerm(e.target.value)}
        placeholder="Search products..."
      />
      <select value={category} onChange={(e) => setCategory(e.target.value)}>
        <option value="all">All Categories</option>
        <option value="electronics">Electronics</option>
        <option value="clothing">Clothing</option>
      </select>
      <input
        type="range"
        value={priceMax}
        onChange={(e) => setPriceMax(Number(e.target.value))}
        min={0}
        max={1000}
      />
      {loading && <span>Loading...</span>}
      <ul>
        {results.map(r => (
          <li key={r._key}>
            {r.value.name} - ${r.value.price}
            {r._score && <span> (score: {r._score.toFixed(2)})</span>}
          </li>
        ))}
      </ul>
    </div>
  );
}

Client API

For non-React environments, use client.hybridQuery() directly:

Client API
import { TopGunClient } from '@topgunbuild/client';
import { Predicates } from '@topgunbuild/core';

const client = new TopGunClient({
  serverUrl: 'ws://localhost:8080'
});

// Create a hybrid query handle
const handle = client.hybridQuery<Article>('articles', {
  predicate: Predicates.and(
    Predicates.match('body', 'artificial intelligence'),
    Predicates.equal('status', 'published'),
    Predicates.greaterThan('views', 100)
  ),
  sort: { _score: 'desc' },
  limit: 50
});

// Subscribe to results
const unsubscribe = handle.subscribe((results) => {
  for (const r of results) {
    console.log(`[${r._score?.toFixed(2)}] ${r._key}: ${r.value.title}`);
  }
});

// Cleanup
unsubscribe();

HybridResultItem Interface

Each result includes relevance score and matched terms:

PropertyTypeDescription
valueTThe document value
_keystringDocument key
_scorenumber | undefinedBM25 relevance score (only for FTS queries)
_matchedTermsstring[] | undefinedStemmed terms that matched

Available Predicates

Hybrid queries support all predicate types:

PredicateExampleDescription
match(field, query)match('body', 'machine learning')FTS with BM25 scoring
matchPhrase(field, phrase)matchPhrase('title', 'getting started')Exact phrase matching
matchPrefix(field, prefix)matchPrefix('name', 'prod')Prefix autocomplete
equal(field, value)equal('status', 'active')Exact equality
greaterThan(field, value)greaterThan('price', 100)Range comparison
lessThanOrEqual(field, value)lessThanOrEqual('stock', 10)Range comparison
contains(field, value)contains('tags', 'featured')Array contains
and(...predicates)and(match(...), equal(...))All must match
or(...predicates)or(match(...), match(...))Any can match

When to use Hybrid vs useSearch: Use useSearch for pure text search (search box). Use useHybridQuery when you need to combine text search with filters (faceted search, filtered listings).

Index Statistics

Monitor your inverted index with extended statistics:

Index Statistics
// Get extended statistics for inverted index
const index = products.getIndexes().find(i => i.type === 'inverted');
if (index) {
  const stats = index.getExtendedStats();
  console.log(`Unique tokens: ${stats.totalTokens}`);
  console.log(`Documents indexed: ${stats.totalEntries}`);
  console.log(`Avg tokens/doc: ${stats.avgTokensPerDocument.toFixed(1)}`);
  console.log(`Max docs/token: ${stats.maxDocumentsPerToken}`);
}

Performance Characteristics

MetricValue
Query complexityO(K) where K = matching tokens
Index memory overhead30-50% of text data
Index update time< 10μs per document
Supported operationscontains, containsAll, containsAny, has

Memory Usage

Inverted indexes store:

  • Token index: Map<Token, Set<Key>> - tokens to document keys
  • Reverse index: Map<Key, Set<Token>> - document keys to tokens (for updates)

For text-heavy documents, expect 30-50% memory overhead. Monitor with getExtendedStats().

Best Practices

  1. Choose the right pipeline

    • simple() for general use
    • search() for search boxes (removes noise words)
    • Custom N-gram for autocomplete/substring matching
  2. Index selectively

    • Only index fields users actually search
    • Long text fields increase memory usage
  3. Combine with other indexes

    • Use HashIndex for category filtering
    • Use NavigableIndex for date/price ranges
    • Let the optimizer combine them efficiently
  4. Test your tokenization

    const pipeline = TokenizationPipeline.search();
    console.log(pipeline.process("your sample text"));
    // Verify tokens match your expectations

Next Steps