Partially Implemented
Some metric names and config options on this page reference the old Node.js server. The Rust server uses different metric names and tracing configuration.
Observability
Monitor the health and performance of your TopGun cluster using Prometheus metrics and structured logging.
Prometheus Metrics
The TopGun server exposes a
/metrics endpoint that is compatible with Prometheus.By default, this is available on the main server port (or a dedicated metrics port if configured).
Key Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_connected_clients | Gauge | Number of currently connected WebSocket clients. |
| topgun_map_size_items | Gauge | Number of items in a specific map (labeled by map name). |
| topgun_ops_total | Counter | Total number of operations processed. Labels: type (PUT, GET, DELETE, SUBSCRIBE), map. |
| topgun_memory_usage_bytes | Gauge | Current heap memory usage of the server process. |
| topgun_cluster_members | Gauge | Number of active nodes in the cluster. |
Standard Node.js metrics (CPU, Event Loop, GC) are also exported with the topgun_ prefix.
Event Routing Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_events_routed_total | Counter | Total events processed for routing to subscribers. |
| topgun_events_filtered_by_subscription | Counter | Events not sent because no clients were subscribed. |
| topgun_subscribers_per_event | Summary | Distribution of subscriber count per event (quantiles). |
Event Queue Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_event_queue_size | Gauge | Current number of events in the bounded queue. |
| topgun_event_queue_enqueued_total | Counter | Total events successfully enqueued. |
| topgun_event_queue_dequeued_total | Counter | Total events dequeued for processing. |
| topgun_event_queue_rejected_total | Counter | Events rejected due to queue being at capacity. Alert if increasing. |
Backpressure Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_backpressure_sync_forced_total | Counter | Number of times synchronous processing was forced. |
| topgun_backpressure_pending_ops | Gauge | Current number of pending async operations. |
| topgun_backpressure_waits_total | Counter | Times processing had to wait for capacity. |
| topgun_backpressure_timeouts_total | Counter | Backpressure timeouts. Alert if increasing. |
Connection Rate Limiting Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_connections_accepted_total | Counter | Total connections accepted by the rate limiter. |
| topgun_connections_rejected_total | Counter | Connections rejected due to rate limiting. |
| topgun_connections_pending | Gauge | Current pending connection handshakes. |
| topgun_connection_rate_per_second | Gauge | Current connection rate (connections/second). |
Alert Recommendations
Key Metrics to Alert On
topgun_event_queue_rejected_total— Events are being dropped due to queue capacitytopgun_backpressure_timeouts_total— Operations timing out waiting for capacitytopgun_connections_rejected_total— Clients being rejected (rate limit or DDoS)topgun_event_queue_size— Monitor queue depth approachingeventQueueCapacity
Structured Logging
TopGun uses Pino for high-performance, structured JSON logging.
This makes it easy to ingest logs into systems like ELK, Datadog, or Loki.
Log Example
server.log
{
"level": 30,
"time": 1678901234567,
"pid": 12345,
"hostname": "topgun-server-0",
"msg": "Server started on port 8080",
"nodeId": "node-xyz-123"
} Logs include context such as nodeId, clientId, and requestId where applicable, allowing you to trace requests across the distributed system.