Partially Implemented

Some metric names and config options on this page reference the old Node.js server. The Rust server uses different metric names and tracing configuration.

DocsGuidesObservability

Observability

Monitor the health and performance of your TopGun cluster using Prometheus metrics and structured logging.

Prometheus Metrics

The TopGun server exposes a /metrics endpoint that is compatible with Prometheus.

By default, this is available on the main server port (or a dedicated metrics port if configured).

Key Metrics

Metric NameTypeDescription
topgun_connected_clientsGaugeNumber of currently connected WebSocket clients.
topgun_map_size_itemsGaugeNumber of items in a specific map (labeled by map name).
topgun_ops_totalCounterTotal number of operations processed. Labels: type (PUT, GET, DELETE, SUBSCRIBE), map.
topgun_memory_usage_bytesGaugeCurrent heap memory usage of the server process.
topgun_cluster_membersGaugeNumber of active nodes in the cluster.

Standard Node.js metrics (CPU, Event Loop, GC) are also exported with the topgun_ prefix.

Event Routing Metrics

Metric NameTypeDescription
topgun_events_routed_totalCounterTotal events processed for routing to subscribers.
topgun_events_filtered_by_subscriptionCounterEvents not sent because no clients were subscribed.
topgun_subscribers_per_eventSummaryDistribution of subscriber count per event (quantiles).

Event Queue Metrics

Metric NameTypeDescription
topgun_event_queue_sizeGaugeCurrent number of events in the bounded queue.
topgun_event_queue_enqueued_totalCounterTotal events successfully enqueued.
topgun_event_queue_dequeued_totalCounterTotal events dequeued for processing.
topgun_event_queue_rejected_totalCounterEvents rejected due to queue being at capacity. Alert if increasing.

Backpressure Metrics

Metric NameTypeDescription
topgun_backpressure_sync_forced_totalCounterNumber of times synchronous processing was forced.
topgun_backpressure_pending_opsGaugeCurrent number of pending async operations.
topgun_backpressure_waits_totalCounterTimes processing had to wait for capacity.
topgun_backpressure_timeouts_totalCounterBackpressure timeouts. Alert if increasing.

Connection Rate Limiting Metrics

Metric NameTypeDescription
topgun_connections_accepted_totalCounterTotal connections accepted by the rate limiter.
topgun_connections_rejected_totalCounterConnections rejected due to rate limiting.
topgun_connections_pendingGaugeCurrent pending connection handshakes.
topgun_connection_rate_per_secondGaugeCurrent connection rate (connections/second).

Alert Recommendations

Key Metrics to Alert On

  • topgun_event_queue_rejected_total — Events are being dropped due to queue capacity
  • topgun_backpressure_timeouts_total — Operations timing out waiting for capacity
  • topgun_connections_rejected_total — Clients being rejected (rate limit or DDoS)
  • topgun_event_queue_size — Monitor queue depth approaching eventQueueCapacity

Structured Logging

TopGun uses Pino for high-performance, structured JSON logging.

This makes it easy to ingest logs into systems like ELK, Datadog, or Loki.

Log Example

server.log
{
  "level": 30,
  "time": 1678901234567,
  "pid": 12345,
  "hostname": "topgun-server-0",
  "msg": "Server started on port 8080",
  "nodeId": "node-xyz-123"
}

Logs include context such as nodeId, clientId, and requestId where applicable, allowing you to trace requests across the distributed system.