Cluster Replication
TopGun clusters automatically replicate data across nodes for high availability and fault tolerance. This guide covers server-side replication configuration, consistency levels, and anti-entropy repair.
Overview
Partitioned Data
271 partitions with consistent hashing
Backup Replicas
Each partition replicated to backup nodes
Anti-Entropy Repair
Merkle tree-based consistency verification
Cluster Setup
import { ServerCoordinator } from '@topgunbuild/server';
// Node 1 (seed node)
const node1 = new ServerCoordinator({
port: 8080,
nodeId: 'node-1',
host: 'localhost',
clusterPort: 9000,
peers: [], // First node has no peers
jwtSecret: process.env.JWT_SECRET,
replicationEnabled: true,
defaultConsistency: 'EVENTUAL',
});
await node1.ready();
// Node 2 (joins via seed)
const node2 = new ServerCoordinator({
port: 8081,
nodeId: 'node-2',
host: 'localhost',
clusterPort: 9001,
peers: ['localhost:9000'], // Connect to node-1
jwtSecret: process.env.JWT_SECRET,
replicationEnabled: true,
defaultConsistency: 'EVENTUAL',
});
await node2.ready(); Gossip Protocol: You only need to provide one seed node. Nodes automatically discover each other via gossip protocol - when node2 connects to node1, it learns about all other members.
Gossip-Based Discovery
// Cluster membership uses gossip protocol
// Nodes automatically discover each other
const server = new ServerCoordinator({
clusterPort: 9000,
peers: ['node1:9000'], // Just one seed is enough
});
// When node2 joins:
// 1. Connects to seed node (node1)
// 2. Sends HELLO message with its info
// 3. Receives MEMBER_LIST with all known members
// 4. Connects to discovered members
// 5. Broadcasts updated member list
// Result: All nodes learn about all others automatically Consistency Levels
TopGun supports three consistency levels for replication:
// EVENTUAL - Fire-and-forget replication (default)
// Best for: High throughput, tolerance for brief inconsistency
const eventualServer = new ServerCoordinator({
defaultConsistency: 'EVENTUAL',
replicationConfig: {
batchIntervalMs: 50, // Batch replication every 50ms
batchSize: 100, // Max operations per batch
queueSizeLimit: 10000, // Max queued operations
},
});
// QUORUM - Wait for majority of replicas
// Best for: Balance of consistency and availability
const quorumServer = new ServerCoordinator({
defaultConsistency: 'QUORUM',
replicationConfig: {
ackTimeoutMs: 5000, // Wait up to 5s for acks
},
});
// STRONG - Wait for all replicas
// Best for: Critical data requiring full consistency
const strongServer = new ServerCoordinator({
defaultConsistency: 'STRONG',
replicationConfig: {
ackTimeoutMs: 10000, // Longer timeout for all replicas
},
}); Comparison
| Level | Latency | Consistency | Availability | Use Case |
|---|---|---|---|---|
| EVENTUAL | Lowest | Eventually consistent | Highest | Real-time updates, gaming, social feeds |
| QUORUM | Medium | Majority agree | High | Most applications, shopping carts |
| STRONG | Highest | All replicas agree | Lower | Financial data, inventory counts |
Partitioning
// TopGun uses consistent hashing with 271 partitions
// Each partition has an owner and backup nodes
// Data is automatically distributed:
// - Key is hashed to partition ID (0-270)
// - Partition owner stores the primary copy
// - Backup nodes store replica copies
// Example: With 3 nodes and BACKUP_COUNT=1
// - Node A owns ~90 partitions, backs up ~90 others
// - Node B owns ~90 partitions, backs up ~90 others
// - Node C owns ~90 partitions, backs up ~90 others
// Total: Each piece of data exists on 2 nodes (owner + 1 backup) Data Flow
- Write arrives at any node
- Routing: If not owner, forward to partition owner
- Owner stores data locally using CRDT merge
- Replication: Owner sends to backup nodes
- Acknowledgment: Based on consistency level
Important: Only the partition owner and its backups store the data. Other nodes route requests but don’t keep copies. This is the partitioned data grid model, not full replication.
Anti-Entropy Repair
TopGun uses Merkle trees for efficient anti-entropy repair:
// Anti-entropy repair runs automatically
const server = new ServerCoordinator({
replicationEnabled: true,
// RepairScheduler configuration (defaults shown)
repairConfig: {
enabled: true,
scanIntervalMs: 300000, // Full scan every 5 minutes
repairBatchSize: 1000, // Keys per repair batch
maxConcurrentRepairs: 2, // Parallel repair streams
throttleMs: 100, // Delay between batches
prioritizeRecent: true, // Repair recent data first
},
});
// Repair process:
// 1. MerkleTreeManager builds hash tree of partition data
// 2. RepairScheduler compares Merkle roots with backup nodes
// 3. Differences trigger targeted data sync
// 4. Uses efficient bucket-level comparison before key-level How It Works
Build Merkle Trees
Each node builds a Merkle tree of its partition data, hashing keys into buckets.
Compare Roots
RepairScheduler periodically compares Merkle roots between owner and backup.
Drill Down
If roots differ, compare bucket hashes to find divergent buckets.
Repair Keys
Fetch and sync only the specific keys that differ, minimizing network traffic.
Failover Handling
// Automatic failover on node failure
// FailureDetector uses Phi Accrual algorithm
const server = new ServerCoordinator({
// FailureDetector config (defaults shown)
failureDetector: {
heartbeatIntervalMs: 1000,
suspicionTimeoutMs: 5000,
confirmationTimeoutMs: 10000,
phiThreshold: 8,
},
});
// Failover process:
// 1. Heartbeats stop arriving from failed node
// 2. Phi value exceeds threshold -> node marked SUSPECT
// 3. After confirmationTimeout -> node marked FAILED
// 4. PartitionService rebalances:
// - Backup nodes become new owners
// - Remaining nodes become new backups
// 5. Clients receive updated partition map Phi Accrual Failure Detector
TopGun uses the Phi Accrual failure detector algorithm, which:
- Tracks heartbeat arrival times
- Calculates probability of node failure
- Adapts to network conditions automatically
- Avoids false positives from temporary network issues
Read Replicas
// Read from replicas for lower latency
const server = new ServerCoordinator({
replicationEnabled: true,
// ReadReplicaHandler is automatically enabled
});
// Read routing options:
// - Primary: Always read from partition owner
// - Replica: Read from nearest replica (lower latency)
// - Any: Read from any node that has the data
// Clients can specify read preference per query Docker Compose Example
Deploy a 3-node cluster with Docker Compose:
version: '3.8'
services:
node1:
image: topgun:latest
ports:
- "8080:8080"
- "9000:9000"
environment:
TOPGUN_NODE_ID: node-1
TOPGUN_PORT: 8080
TOPGUN_CLUSTER_PORT: 9000
TOPGUN_PEERS: ""
TOPGUN_REPLICATION: "true"
TOPGUN_CONSISTENCY: EVENTUAL
node2:
image: topgun:latest
ports:
- "8081:8080"
- "9001:9000"
environment:
TOPGUN_NODE_ID: node-2
TOPGUN_PORT: 8080
TOPGUN_CLUSTER_PORT: 9000
TOPGUN_PEERS: node1:9000
TOPGUN_REPLICATION: "true"
TOPGUN_CONSISTENCY: EVENTUAL
depends_on:
- node1
node3:
image: topgun:latest
ports:
- "8082:8080"
- "9002:9000"
environment:
TOPGUN_NODE_ID: node-3
TOPGUN_PORT: 8080
TOPGUN_CLUSTER_PORT: 9000
TOPGUN_PEERS: node1:9000
TOPGUN_REPLICATION: "true"
TOPGUN_CONSISTENCY: EVENTUAL
depends_on:
- node1 Run with:
docker-compose up -d Monitoring
Prometheus Metrics
// Replication metrics available via Prometheus
// GET /metrics on metricsPort
# Replication queue size per node
topgun_replication_queue_size{node="node-2"} 15
# Pending synchronous acknowledgments
topgun_replication_pending_acks 3
# Replication lag in milliseconds
topgun_replication_lag_ms{node="node-2"} 42
# Health status
topgun_replication_healthy 1
topgun_replication_unhealthy_nodes 0 Health Checks
The ReplicationPipeline.getHealth() method returns:
interface ReplicationHealth {
healthy: boolean; // Overall health status
unhealthyNodes: string[]; // Nodes with issues
laggyNodes: string[]; // Nodes with high lag
avgLagMs: number; // Average replication lag
}
Distributed Subscriptions
TopGun supports distributed live subscriptions for both queries and full-text search. When a client subscribes, the subscription is automatically registered across all cluster nodes, and updates flow efficiently to the client.
Subscription Protocol
The distributed subscription protocol uses four message types:
| Message | Direction | Purpose |
|---|---|---|
CLUSTER_SUB_REGISTER | Coordinator → All Nodes | Register subscription for evaluation |
CLUSTER_SUB_ACK | Node → Coordinator | Acknowledge registration + initial results |
CLUSTER_SUB_UPDATE | Node → Coordinator | Delta update (ENTER/UPDATE/LEAVE) |
CLUSTER_SUB_UNREGISTER | Coordinator → All Nodes | Remove subscription |
Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED SUBSCRIPTION FLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. SUBSCRIBE │
│ ──────────── │
│ Client ──SEARCH_SUB──▶ Coordinator ──CLUSTER_SUB_REGISTER──▶ All Nodes │
│ CLUSTER_SUB_REGISTER │
│ │
│ 2. INITIAL RESULTS │
│ ───────────────── │
│ Coordinator ◀──CLUSTER_SUB_ACK── All Nodes (includes local results) │
│ │ │
│ └──▶ Merge results (RRF for search, dedupe for query) │
│ └──▶ Send to Client │
│ │
│ 3. LIVE UPDATES │
│ ─────────────── │
│ Document changes on Node B: │
│ │ │
│ ▼ │
│ Node B evaluates local subscriptions │
│ │ │
│ ▼ │
│ Node B ──CLUSTER_SUB_UPDATE──▶ Coordinator only (NOT broadcast) │
│ │ │
│ ▼ │
│ Coordinator forwards to Client │
│ │
│ 4. UNSUBSCRIBE │
│ ──────────── │
│ Client ──UNSUB──▶ Coordinator ──CLUSTER_SUB_UNREGISTER──▶ All Nodes │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key Design Decisions
- Coordinator Pattern: The node receiving the client subscription becomes the “coordinator” - it aggregates results and forwards updates to the client
- Targeted Updates: Nodes send updates ONLY to coordinators with matching subscriptions, not broadcast to all nodes
- Local Evaluation: Each node evaluates subscriptions against its local data/index
- Unified Protocol: Same message structure for FTS and Query (different evaluation logic)
Node Disconnect Handling
When a cluster node disconnects:
- Its results are automatically removed from active subscriptions
- Pending ACKs for the disconnected node are resolved
- Local subscriptions where the disconnected node was coordinator are cleaned up
- When the node rejoins, subscriptions can be re-registered
Metrics
Distributed subscription metrics are available via Prometheus:
# Active subscriptions by type (SEARCH/QUERY)
topgun_distributed_sub_active{type="SEARCH"} 15
topgun_distributed_sub_active{type="QUERY"} 8
# Subscription registrations
topgun_distributed_sub_total{type="SEARCH",status="success"} 100
topgun_distributed_sub_total{type="QUERY",status="timeout"} 2
# Update delivery
topgun_distributed_sub_updates{direction="sent",change_type="ENTER"} 5000
topgun_distributed_sub_updates{direction="received",change_type="UPDATE"} 4800
# Latency histogram
topgun_distributed_sub_update_latency_ms{type="SEARCH",quantile="0.99"} 12
See Also: Live Queries and Full-Text Search guides for client-side usage of distributed subscriptions.
Best Practices
Use Odd Node Counts
For QUORUM consistency, use 3, 5, or 7 nodes. This ensures clear majority without split-brain.
Match Consistency to Use Case
Use EVENTUAL for real-time features, QUORUM for most data, STRONG only for critical operations.
Monitor Replication Lag
Set up alerts on topgun_replication_lag_ms.
High lag may indicate network issues or overloaded nodes.
Enable Cluster TLS
In production, always use TLS for inter-node communication with clusterTls config.