Full-Text Search
TopGun’s InvertedIndex provides fast full-text search with O(K) performance, where K is the number of matching tokens. On 100K documents, text search drops from 50-100ms (full scan) to under 1ms.
Token-Based Search
Inverted index maps tokens to documents for instant lookups.
Flexible Tokenization
Configure tokenizers and filters for your use case.
CRDT Integration
Indexes update automatically on set, remove, and merge operations.
Basic Usage
Add an InvertedIndex to enable text search on a field:
import {
IndexedLWWMap,
simpleAttribute,
HLC
} from '@topgunbuild/core';
interface Product {
id: string;
name: string;
description: string;
tags: string[];
}
const hlc = new HLC('node-1');
const products = new IndexedLWWMap<string, Product>(hlc);
// Add inverted index on text fields
const nameAttr = simpleAttribute<Product, string>('name', p => p.name);
products.addInvertedIndex(nameAttr);
// Add some products
products.set('p1', {
id: 'p1',
name: 'Wireless Bluetooth Mouse',
description: 'Ergonomic design with 6 buttons',
tags: ['electronics', 'wireless', 'mouse']
});
products.set('p2', {
id: 'p2',
name: 'USB-C Wireless Keyboard',
description: 'Mechanical switches with RGB',
tags: ['electronics', 'wireless', 'keyboard']
});
// Search for products containing "wireless"
const results = products.queryValues({
type: 'contains',
attribute: 'name',
value: 'wireless'
});
// Returns both products (both names contain "wireless") Query Types
InvertedIndex supports three query types with different matching semantics:
// 1. contains - All tokens must match (AND semantics)
const wireless = products.queryValues({
type: 'contains',
attribute: 'name',
value: 'wireless mouse' // Matches: "wireless" AND "mouse"
});
// Returns: [{ name: 'Wireless Bluetooth Mouse', ... }]
// 2. containsAll - All specified values must match
const withTags = products.queryValues({
type: 'containsAll',
attribute: 'name',
values: ['wireless', 'bluetooth']
});
// Returns: [{ name: 'Wireless Bluetooth Mouse', ... }]
// 3. containsAny - Any token matches (OR semantics)
const anyMatch = products.queryValues({
type: 'containsAny',
attribute: 'name',
values: ['keyboard', 'mouse']
});
// Returns both products | Query Type | Semantics | Use Case |
|---|---|---|
contains | All tokens must match (AND) | Search box with multiple words |
containsAll | All values must be present | Filter by required tags |
containsAny | Any token matches (OR) | Search with alternatives |
Tokenization Pipeline
Text is processed through a tokenization pipeline before indexing and searching:
import {
TokenizationPipeline,
WordBoundaryTokenizer,
LowercaseFilter,
MinLengthFilter,
StopWordFilter
} from '@topgunbuild/core';
// Simple pipeline (default)
const simple = TokenizationPipeline.simple();
simple.process("Hello World!");
// → ["hello", "world"]
// Search pipeline (with stop words removed)
const search = TokenizationPipeline.search();
search.process("The quick brown fox");
// → ["quick", "brown", "fox"] ("the" removed as stop word)
// Custom pipeline
const custom = TokenizationPipeline.custom(
new WordBoundaryTokenizer(),
[
new LowercaseFilter(),
new MinLengthFilter(3), // Min 3 characters
new StopWordFilter() // Remove common words
]
);
custom.process("I have a wireless mouse");
// → ["wireless", "mouse"] ("i", "have", "a" removed) Pre-built Pipelines
| Pipeline | Tokenizer | Filters | Best For |
|---|---|---|---|
simple() | WordBoundary | Lowercase, MinLength(2) | General use |
search() | WordBoundary | Lowercase, MinLength(2), StopWords | Search engines |
minimal() | WordBoundary | Lowercase only | Preserve all tokens |
Available Components
// Available tokenizers:
// - WhitespaceTokenizer: splits on whitespace
// - WordBoundaryTokenizer: splits on word boundaries (default)
// - NGramTokenizer: generates n-grams for substring matching
// Available filters:
// - LowercaseFilter: converts to lowercase
// - MinLengthFilter(n): removes tokens shorter than n
// - MaxLengthFilter(n): removes tokens longer than n
// - StopWordFilter: removes common words (the, a, is, etc.)
// - TrimFilter: trims whitespace
// - UniqueFilter: removes duplicate tokens Custom Pipelines
Create custom pipelines for specialized use cases:
import {
IndexedLWWMap,
simpleAttribute,
TokenizationPipeline,
NGramTokenizer,
LowercaseFilter,
HLC
} from '@topgunbuild/core';
// N-gram pipeline for substring matching
const ngramPipeline = TokenizationPipeline.custom(
new NGramTokenizer(3), // 3-character n-grams
[new LowercaseFilter()]
);
const products = new IndexedLWWMap<string, Product>(hlc);
// Use custom pipeline for description field
const descAttr = simpleAttribute<Product, string>('description', p => p.description);
products.addInvertedIndex(descAttr, ngramPipeline);
// Now substring matches work
products.set('p1', { id: 'p1', description: 'Ergonomic mouse' });
const results = products.queryValues({
type: 'contains',
attribute: 'description',
value: 'rgo' // Matches "Ergonomic" via n-gram
}); N-gram tip: N-grams enable substring matching but increase memory usage significantly. Use only when needed (e.g., autocomplete, partial name matching).
Multi-Value Fields
Index array fields like tags using multiAttribute:
import { multiAttribute } from '@topgunbuild/core';
// Index an array field (tags)
const tagsAttr = multiAttribute<Product, string>('tags', p => p.tags);
products.addInvertedIndex(tagsAttr);
// Each tag becomes searchable
const wirelessProducts = products.queryValues({
type: 'contains',
attribute: 'tags',
value: 'wireless'
}); Combined Queries
Combine text search with other index types for powerful filtering:
// Combine text search with other predicates
const results = products.queryValues({
type: 'and',
children: [
{ type: 'contains', attribute: 'name', value: 'wireless' },
{ type: 'eq', attribute: 'status', value: 'active' },
{ type: 'between', attribute: 'price', from: 10, to: 100 }
]
});
// Query optimizer will use:
// - InvertedIndex for "contains"
// - HashIndex for "eq" (if indexed)
// - NavigableIndex for "between" (if indexed) The Query Optimizer automatically selects the best index for each part of the query.
BM25 Ranked Search
For search-engine-style relevance ranking, TopGun provides BM25 full-text search. Unlike InvertedIndex (boolean matching), BM25 scores documents by relevance and returns results sorted by score.
Relevance Ranking
BM25 algorithm scores documents based on term frequency and document length.
Porter Stemming
Words are stemmed (running→run) for better matching with 174 English stopwords.
Basic Usage
Enable BM25 search on an IndexedORMap:
import { IndexedORMap, HLC } from '@topgunbuild/core';
interface Article {
title: string;
body: string;
author: string;
}
const hlc = new HLC('node-1');
const articles = new IndexedORMap<string, Article>(hlc);
// Enable BM25 full-text search on title and body fields
articles.enableFullTextSearch({
fields: ['title', 'body']
});
// Add some articles
articles.add('a1', {
title: 'Introduction to Machine Learning',
body: 'Machine learning is a subset of artificial intelligence...',
author: 'Alice'
});
articles.add('a2', {
title: 'Deep Learning Tutorial',
body: 'Deep learning uses neural networks with many layers...',
author: 'Bob'
});
articles.add('a3', {
title: 'Getting Started with AI',
body: 'Artificial intelligence is transforming industries...',
author: 'Charlie'
});
// Search with BM25 relevance ranking
const results = articles.search('machine learning');
// Results sorted by relevance score:
// [
// { key: 'a1', score: 2.34, matchedTerms: ['machin', 'learn'], value: {...} },
// { key: 'a2', score: 0.89, matchedTerms: ['learn'], value: {...} }
// ] Search Options
Control search behavior with options:
// Search with options
const results = articles.search('artificial intelligence', {
limit: 10, // Maximum results to return
minScore: 0.5, // Minimum relevance score threshold
boost: { // Field boosting weights
title: 2.0, // Title matches worth 2x
body: 1.0 // Body matches worth 1x (default)
}
});
// Results are ranked by weighted BM25 scores
for (const result of results) {
console.log(`[${result.score.toFixed(2)}] ${result.value.title}`);
console.log(` Matched terms: ${result.matchedTerms.join(', ')}`);
} Configuration
Customize tokenization and BM25 parameters:
// Configure tokenizer and BM25 parameters
articles.enableFullTextSearch({
fields: ['title', 'body', 'tags'],
// Tokenizer options
tokenizer: {
minLength: 2, // Minimum token length (default: 2)
maxLength: 50, // Maximum token length (default: 50)
lowercase: true, // Convert to lowercase (default: true)
// Uses Porter stemmer and 174 English stopwords by default
},
// BM25 scoring parameters
bm25: {
k1: 1.2, // Term frequency saturation (default: 1.2)
// Higher = more weight to term frequency
b: 0.75 // Document length normalization (default: 0.75)
// 0 = no normalization, 1 = full normalization
}
}); BM25 vs InvertedIndex
Choose the right approach for your use case:
// Two approaches to full-text search:
// 1. InvertedIndex - Boolean matching (fast, no ranking)
const nameAttr = simpleAttribute<Product, string>('name', p => p.name);
products.addInvertedIndex(nameAttr);
const matches = products.queryValues({
type: 'contains',
attribute: 'name',
value: 'wireless mouse'
});
// Returns all products containing both "wireless" AND "mouse"
// No relevance score, order is arbitrary
// 2. BM25 Search - Relevance ranking (search engine style)
products.enableFullTextSearch({ fields: ['name', 'description'] });
const ranked = products.search('wireless mouse');
// Returns products sorted by relevance
// "Wireless Bluetooth Mouse" ranks higher than "Mouse pad for wireless setup"
// Each result includes: score, matchedTerms | Feature | InvertedIndex | BM25 Search |
|---|---|---|
| Query method | queryValues({ type: 'contains' }) | search('query') |
| Ranking | No ranking (boolean match) | Relevance-sorted by score |
| Stemming | Optional (via pipeline) | Built-in Porter stemmer |
| Stopwords | Optional (via pipeline) | 174 English stopwords |
| Best for | Filtering, exact matches | Search boxes, content discovery |
| Works with | IndexedLWWMap, IndexedORMap | IndexedORMap |
Index Persistence
Serialize and restore the BM25 index:
// Serialize index for persistence
const ftsIndex = articles.getFullTextIndex();
const serialized = ftsIndex.serialize();
// Save to storage (IndexedDB, localStorage, file, etc.)
localStorage.setItem('fts-index', JSON.stringify(serialized));
// Later: restore from storage
const saved = JSON.parse(localStorage.getItem('fts-index'));
ftsIndex.load(saved);
// Index is ready to use immediately
const results = articles.search('machine learning'); Performance tip: BM25 index builds in <100ms for 1K documents. Search queries complete in <10ms. For large datasets, consider serializing the index to avoid rebuilding on page load.
Server-Side Search
For multi-client applications, TopGun supports server-side BM25 search. The server maintains indexes centrally, eliminating the need for each client to build and store indexes locally.
Centralized Indexes
Server maintains FTS indexes, clients query via WebSocket.
Permission-Based
Search respects RBAC - users need READ permission to search a map.
Server Configuration
Enable FTS on the server for specific maps:
import { ServerCoordinator } from '@topgunbuild/server';
const server = new ServerCoordinator({
port: 8080,
// Enable full-text search for specific maps
fullTextSearch: {
articles: {
fields: ['title', 'body'],
tokenizer: { minLength: 2 },
bm25: { k1: 1.2, b: 0.75 }
},
products: {
fields: ['name', 'description', 'tags']
}
}
});
await server.start();
// Server now maintains FTS indexes for 'articles' and 'products' maps
// Indexes are automatically updated when data changes
// Indexes are backfilled from storage on startup The server will:
- Create and maintain BM25 indexes for configured maps
- Automatically update indexes when data changes (add/update/remove)
- Backfill indexes from persistent storage on startup
Client API
Search from the client using client.search():
import { TopGunClient } from '@topgunbuild/client';
const client = new TopGunClient({
serverUrl: 'ws://localhost:8080'
});
await client.authenticate({ token: 'user-token' });
// Search articles on the server
const results = await client.search<Article>('articles', 'machine learning', {
limit: 20,
minScore: 0.5,
boost: { title: 2.0, body: 1.0 }
});
// Results are sorted by relevance
for (const result of results) {
console.log(`[${result.score.toFixed(2)}] ${result.key}: ${result.value.title}`);
console.log(` Matched: ${result.matchedTerms.join(', ')}`);
} Search Result Structure
Each result includes the document key, value, relevance score, and matched terms:
// SearchResult<T> interface
interface SearchResult<T> {
key: string; // Document key
value: T; // Full document value
score: number; // BM25 relevance score
matchedTerms: string[]; // Stemmed terms that matched
}
// Example result
const result: SearchResult<Article> = {
key: 'a1',
value: { title: 'Introduction to ML', body: '...' },
score: 2.34,
matchedTerms: ['machin', 'learn'] // Stemmed
}; Permissions
Server-side search respects the security model - users must have READ permission on a map to search it:
// Server security configuration
const server = new ServerCoordinator({
port: 8080,
fullTextSearch: {
articles: { fields: ['title', 'body'] }
},
security: {
permissions: {
// Users need READ permission to search a map
articles: {
read: ['user', 'admin'],
write: ['admin']
}
}
}
});
// Client-side: search requires READ permission
try {
const results = await client.search('articles', 'query');
} catch (error) {
// Error: Permission denied for map: articles
} When to use server-side search: Use server-side search when you have multiple clients, need centralized index management, or want to offload search computation from clients. Use local BM25 (IndexedORMap) for offline-first apps where clients need to search without server connectivity.
Live Search Subscriptions
For real-time search UIs, TopGun provides Live Search subscriptions that automatically update when matching documents change. Unlike the one-shot search() method, searchSubscribe() pushes delta updates to your callback.
Real-Time Updates
Results update automatically when documents are added, modified, or removed.
Delta Updates
Server sends ENTER/UPDATE/LEAVE events instead of full result sets.
Client API
Create a live search subscription with searchSubscribe():
import { TopGunClient } from '@topgunbuild/client';
const client = new TopGunClient({
serverUrl: 'ws://localhost:8080'
});
// Create a live search subscription
const handle = client.searchSubscribe<Article>('articles', 'machine learning', {
limit: 20,
minScore: 0.5,
boost: { title: 2.0, body: 1.0 }
});
// Subscribe to result changes (includes initial results + delta updates)
const unsubscribe = handle.subscribe((results) => {
console.log('Search results updated:', results.length);
for (const result of results) {
console.log(`[${result.score.toFixed(2)}] ${result.key}: ${result.value.title}`);
}
});
// Get current results snapshot at any time
const snapshot = handle.getResults();
// Update query dynamically (re-subscribes automatically)
handle.setQuery('deep learning');
// Cleanup when done
handle.dispose(); Delta Update Types
The server sends incremental updates instead of full result sets:
// Delta update types:
// - ENTER: Document now matches the query (was added or score increased above minScore)
// - UPDATE: Document still matches but score/value changed
// - LEAVE: Document no longer matches (was removed or score dropped below minScore)
// The SearchHandle maintains a sorted result set internally.
// Your subscribe callback receives the full sorted array on each change.
// Example: Building a real-time search UI
const handle = client.searchSubscribe<Product>('products', 'wireless');
handle.subscribe((results) => {
// Results are always sorted by score (highest first)
renderSearchResults(results);
});
// When a product matching "wireless" is added to the 'products' map,
// or an existing product's text is updated to include "wireless",
// your callback fires with the updated results array. | Update Type | When Sent |
|---|---|
ENTER | Document now matches query (added or score increased above threshold) |
UPDATE | Document still matches but score or value changed |
LEAVE | Document no longer matches (removed or score dropped below threshold) |
Cluster-Wide Live Search
In clustered environments, live search subscriptions work across all nodes. When you subscribe to a search query, you receive real-time updates for matching documents regardless of which node owns the data.
// Connect to any node in the cluster
const client = new TopGunClient({
serverUrl: 'ws://node1:8080'
});
// Subscribe to search - automatically distributed across cluster
const handle = client.searchSubscribe<Article>('articles', 'machine learning');
handle.subscribe((results) => {
// Results include matches from ALL cluster nodes
// Updates push automatically when documents change on ANY node
console.log('Results from all nodes:', results.length);
});
How it works:
- Subscription broadcast: The coordinator node registers the search subscription on all cluster nodes
- Local evaluation: Each node maintains a local FTS index and evaluates changes against subscriptions
- Targeted updates: When a document changes, only the owning node sends an update (via
CLUSTER_SUB_UPDATE) - RRF merging: Initial results are merged using Reciprocal Rank Fusion for consistent relevance ordering
- Client delivery: The coordinator forwards delta updates to the client
This architecture eliminates the need to broadcast every data change to all nodes - only subscription-relevant updates are sent to coordinators.
Scalability: Each node evaluates subscriptions locally against its FTS index. Update messages flow only to coordinators with active subscriptions, making this approach efficient even with many concurrent subscriptions.
React Integration
The @topgunbuild/react package provides the useSearch hook for easy integration with React applications:
Basic Usage
import { useSearch } from '@topgunbuild/react';
function SearchResults() {
const [searchTerm, setSearchTerm] = useState('');
const { results, loading, error } = useSearch<Article>('articles', searchTerm, {
limit: 20,
boost: { title: 2.0 }
});
if (loading) return <Spinner />;
if (error) return <div>Error: {error.message}</div>;
return (
<ul>
{results.map(r => (
<li key={r.key}>
[{r.score.toFixed(2)}] {r.value.title}
<small>Matched: {r.matchedTerms.join(', ')}</small>
</li>
))}
</ul>
);
} With Debounce
For search-as-you-type interfaces, use the debounceMs option to avoid excessive server requests:
import { useSearch } from '@topgunbuild/react';
import { useState } from 'react';
function SearchInput() {
const [input, setInput] = useState('');
// Debounce search queries by 300ms to avoid excessive server requests
const { results, loading, error } = useSearch<Product>('products', input, {
debounceMs: 300, // Wait 300ms after user stops typing
limit: 10,
minScore: 0.5
});
return (
<div>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Search products..."
/>
{loading && <span>Searching...</span>}
{error && <span className="error">{error.message}</span>}
<ul>
{results.map(r => (
<li key={r.key}>{r.value.name} - ${r.value.price}</li>
))}
</ul>
</div>
);
} useSearch Return Values
| Property | Type | Description |
|---|---|---|
results | SearchResult<T>[] | Current search results sorted by score |
loading | boolean | True while waiting for initial results |
error | Error | null | Error if search failed |
useSearch Options
| Option | Type | Description |
|---|---|---|
limit | number | Maximum results to return |
minScore | number | Minimum BM25 score threshold |
boost | Record<string, number> | Field boost weights |
debounceMs | number | Debounce delay for query changes |
See Also: The React Hooks Reference for all available hooks.
Hybrid Queries
Hybrid queries combine full-text search with traditional filter predicates in a single query. This is powerful for building faceted search UIs where users can search by text while also filtering by category, price range, date, etc.
FTS + Filters
Combine match() with equal(), greaterThan(), between() in one query.
_score Sorting
Sort results by BM25 relevance score alongside other sort fields.
useHybridQuery Hook
The useHybridQuery hook provides a React-friendly way to create hybrid queries:
import { useHybridQuery } from '@topgunbuild/react';
import { Predicates } from '@topgunbuild/core';
function TechArticles() {
// Combine FTS with traditional filters
const { results, loading, error } = useHybridQuery<Article>('articles', {
predicate: Predicates.and(
Predicates.match('body', 'machine learning'), // FTS predicate
Predicates.equal('category', 'tech') // Filter predicate
),
sort: { _score: 'desc' }, // Sort by relevance
limit: 20
});
if (loading) return <Spinner />;
if (error) return <div>Error: {error.message}</div>;
return (
<ul>
{results.map(r => (
<li key={r._key}>
[{r._score?.toFixed(2)}] {r.value.title}
<small>Matched: {r._matchedTerms?.join(', ')}</small>
</li>
))}
</ul>
);
} Dynamic Filters
Build complex filter UIs with dynamic predicates:
import { useHybridQuery } from '@topgunbuild/react';
import { Predicates } from '@topgunbuild/core';
import { useState, useMemo } from 'react';
function SearchWithFilters() {
const [searchTerm, setSearchTerm] = useState('');
const [category, setCategory] = useState('all');
const [priceMax, setPriceMax] = useState(1000);
// Build predicate dynamically
const filter = useMemo(() => {
const conditions = [];
// Add FTS if search term exists
if (searchTerm.trim()) {
conditions.push(Predicates.match('description', searchTerm));
}
// Add category filter
if (category !== 'all') {
conditions.push(Predicates.equal('category', category));
}
// Add price filter
conditions.push(Predicates.lessThanOrEqual('price', priceMax));
return {
predicate: conditions.length > 1
? Predicates.and(...conditions)
: conditions[0],
sort: searchTerm ? { _score: 'desc' } : { createdAt: 'desc' },
limit: 20
};
}, [searchTerm, category, priceMax]);
const { results, loading } = useHybridQuery<Product>('products', filter);
return (
<div>
<input
value={searchTerm}
onChange={(e) => setSearchTerm(e.target.value)}
placeholder="Search products..."
/>
<select value={category} onChange={(e) => setCategory(e.target.value)}>
<option value="all">All Categories</option>
<option value="electronics">Electronics</option>
<option value="clothing">Clothing</option>
</select>
<input
type="range"
value={priceMax}
onChange={(e) => setPriceMax(Number(e.target.value))}
min={0}
max={1000}
/>
{loading && <span>Loading...</span>}
<ul>
{results.map(r => (
<li key={r._key}>
{r.value.name} - ${r.value.price}
{r._score && <span> (score: {r._score.toFixed(2)})</span>}
</li>
))}
</ul>
</div>
);
} Client API
For non-React environments, use client.hybridQuery() directly:
import { TopGunClient } from '@topgunbuild/client';
import { Predicates } from '@topgunbuild/core';
const client = new TopGunClient({
serverUrl: 'ws://localhost:8080'
});
// Create a hybrid query handle
const handle = client.hybridQuery<Article>('articles', {
predicate: Predicates.and(
Predicates.match('body', 'artificial intelligence'),
Predicates.equal('status', 'published'),
Predicates.greaterThan('views', 100)
),
sort: { _score: 'desc' },
limit: 50
});
// Subscribe to results
const unsubscribe = handle.subscribe((results) => {
for (const r of results) {
console.log(`[${r._score?.toFixed(2)}] ${r._key}: ${r.value.title}`);
}
});
// Cleanup
unsubscribe(); HybridResultItem Interface
Each result includes relevance score and matched terms:
| Property | Type | Description |
|---|---|---|
value | T | The document value |
_key | string | Document key |
_score | number | undefined | BM25 relevance score (only for FTS queries) |
_matchedTerms | string[] | undefined | Stemmed terms that matched |
Available Predicates
Hybrid queries support all predicate types:
| Predicate | Example | Description |
|---|---|---|
match(field, query) | match('body', 'machine learning') | FTS with BM25 scoring |
matchPhrase(field, phrase) | matchPhrase('title', 'getting started') | Exact phrase matching |
matchPrefix(field, prefix) | matchPrefix('name', 'prod') | Prefix autocomplete |
equal(field, value) | equal('status', 'active') | Exact equality |
greaterThan(field, value) | greaterThan('price', 100) | Range comparison |
lessThanOrEqual(field, value) | lessThanOrEqual('stock', 10) | Range comparison |
contains(field, value) | contains('tags', 'featured') | Array contains |
and(...predicates) | and(match(...), equal(...)) | All must match |
or(...predicates) | or(match(...), match(...)) | Any can match |
When to use Hybrid vs useSearch: Use useSearch for pure text search (search box). Use useHybridQuery when you need to combine text search with filters (faceted search, filtered listings).
Index Statistics
Monitor your inverted index with extended statistics:
// Get extended statistics for inverted index
const index = products.getIndexes().find(i => i.type === 'inverted');
if (index) {
const stats = index.getExtendedStats();
console.log(`Unique tokens: ${stats.totalTokens}`);
console.log(`Documents indexed: ${stats.totalEntries}`);
console.log(`Avg tokens/doc: ${stats.avgTokensPerDocument.toFixed(1)}`);
console.log(`Max docs/token: ${stats.maxDocumentsPerToken}`);
} Performance Characteristics
| Metric | Value |
|---|---|
| Query complexity | O(K) where K = matching tokens |
| Index memory overhead | 30-50% of text data |
| Index update time | < 10μs per document |
| Supported operations | contains, containsAll, containsAny, has |
Memory Usage
Inverted indexes store:
- Token index:
Map<Token, Set<Key>>- tokens to document keys - Reverse index:
Map<Key, Set<Token>>- document keys to tokens (for updates)
For text-heavy documents, expect 30-50% memory overhead. Monitor with getExtendedStats().
Best Practices
-
Choose the right pipeline
simple()for general usesearch()for search boxes (removes noise words)- Custom N-gram for autocomplete/substring matching
-
Index selectively
- Only index fields users actually search
- Long text fields increase memory usage
-
Combine with other indexes
- Use HashIndex for category filtering
- Use NavigableIndex for date/price ranges
- Let the optimizer combine them efficiently
-
Test your tokenization
const pipeline = TokenizationPipeline.search(); console.log(pipeline.process("your sample text")); // Verify tokens match your expectations
Next Steps
- Indexing - Overview of all index types
- Adaptive Indexing - Auto-suggest and auto-create indexes
- Live Queries - Combine search with real-time updates