Observability
Monitor the health and performance of your TopGun cluster using Prometheus metrics and structured logging.
Prometheus Metrics
The TopGun server exposes a
/metrics endpoint that is compatible with Prometheus.By default, this is available on the main server port (or a dedicated metrics port if configured).
Key Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_connected_clients | Gauge | Number of currently connected WebSocket clients. |
| topgun_map_size_items | Gauge | Number of items in a specific map (labeled by map name). |
| topgun_ops_total | Counter | Total number of operations processed. Labels: type (PUT, GET, DELETE, SUBSCRIBE), map. |
| topgun_memory_usage_bytes | Gauge | Current heap memory usage of the server process. |
| topgun_cluster_members | Gauge | Number of active nodes in the cluster. |
Standard Node.js metrics (CPU, Event Loop, GC) are also exported with the topgun_ prefix.
Event Routing Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_events_routed_total | Counter | Total events processed for routing to subscribers. |
| topgun_events_filtered_by_subscription | Counter | Events not sent because no clients were subscribed. |
| topgun_subscribers_per_event | Summary | Distribution of subscriber count per event (quantiles). |
Event Queue Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_event_queue_size | Gauge | Current number of events in the bounded queue. |
| topgun_event_queue_enqueued_total | Counter | Total events successfully enqueued. |
| topgun_event_queue_dequeued_total | Counter | Total events dequeued for processing. |
| topgun_event_queue_rejected_total | Counter | Events rejected due to queue being at capacity. Alert if increasing. |
Backpressure Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_backpressure_sync_forced_total | Counter | Number of times synchronous processing was forced. |
| topgun_backpressure_pending_ops | Gauge | Current number of pending async operations. |
| topgun_backpressure_waits_total | Counter | Times processing had to wait for capacity. |
| topgun_backpressure_timeouts_total | Counter | Backpressure timeouts. Alert if increasing. |
Connection Rate Limiting Metrics
| Metric Name | Type | Description |
|---|---|---|
| topgun_connections_accepted_total | Counter | Total connections accepted by the rate limiter. |
| topgun_connections_rejected_total | Counter | Connections rejected due to rate limiting. |
| topgun_connections_pending | Gauge | Current pending connection handshakes. |
| topgun_connection_rate_per_second | Gauge | Current connection rate (connections/second). |
Alert Recommendations
Key Metrics to Alert On
topgun_event_queue_rejected_total— Events are being dropped due to queue capacitytopgun_backpressure_timeouts_total— Operations timing out waiting for capacitytopgun_connections_rejected_total— Clients being rejected (rate limit or DDoS)topgun_event_queue_size— Monitor queue depth approachingeventQueueCapacity
Structured Logging
TopGun uses Pino for high-performance, structured JSON logging.
This makes it easy to ingest logs into systems like ELK, Datadog, or Loki.
Log Example
server.log
{
"level": 30,
"time": 1678901234567,
"pid": 12345,
"hostname": "topgun-server-0",
"msg": "Server started on port 8080",
"nodeId": "node-xyz-123"
} Logs include context such as nodeId, clientId, and requestId where applicable, allowing you to trace requests across the distributed system.