DocsGuidesObservability

Observability

Monitor the health and performance of your TopGun cluster using Prometheus metrics and structured logging.

Prometheus Metrics

The TopGun server exposes a /metrics endpoint that is compatible with Prometheus.

By default, this is available on the main server port (or a dedicated metrics port if configured).

Key Metrics

Metric Name	Type	Description
topgun_connected_clients	Gauge	Number of currently connected WebSocket clients.
topgun_map_size_items	Gauge	Number of items in a specific map (labeled by map name).
topgun_ops_total	Counter	Total number of operations processed. Labels: `type` (PUT, GET, DELETE, SUBSCRIBE), `map`.
topgun_memory_usage_bytes	Gauge	Current heap memory usage of the server process.
topgun_cluster_members	Gauge	Number of active nodes in the cluster.

Standard Node.js metrics (CPU, Event Loop, GC) are also exported with the topgun_ prefix.

Event Routing Metrics

Metric Name	Type	Description
topgun_events_routed_total	Counter	Total events processed for routing to subscribers.
topgun_events_filtered_by_subscription	Counter	Events not sent because no clients were subscribed.
topgun_subscribers_per_event	Summary	Distribution of subscriber count per event (quantiles).

Event Queue Metrics

Metric Name	Type	Description
topgun_event_queue_size	Gauge	Current number of events in the bounded queue.
topgun_event_queue_enqueued_total	Counter	Total events successfully enqueued.
topgun_event_queue_dequeued_total	Counter	Total events dequeued for processing.
topgun_event_queue_rejected_total	Counter	Events rejected due to queue being at capacity. Alert if increasing.

Backpressure Metrics

Metric Name	Type	Description
topgun_backpressure_sync_forced_total	Counter	Number of times synchronous processing was forced.
topgun_backpressure_pending_ops	Gauge	Current number of pending async operations.
topgun_backpressure_waits_total	Counter	Times processing had to wait for capacity.
topgun_backpressure_timeouts_total	Counter	Backpressure timeouts. Alert if increasing.

Connection Rate Limiting Metrics

Metric Name	Type	Description
topgun_connections_accepted_total	Counter	Total connections accepted by the rate limiter.
topgun_connections_rejected_total	Counter	Connections rejected due to rate limiting.
topgun_connections_pending	Gauge	Current pending connection handshakes.
topgun_connection_rate_per_second	Gauge	Current connection rate (connections/second).

Alert Recommendations

Key Metrics to Alert On

topgun_event_queue_rejected_total — Events are being dropped due to queue capacity
topgun_backpressure_timeouts_total — Operations timing out waiting for capacity
topgun_connections_rejected_total — Clients being rejected (rate limit or DDoS)
topgun_event_queue_size — Monitor queue depth approaching eventQueueCapacity

Structured Logging

TopGun uses Pino for high-performance, structured JSON logging.

This makes it easy to ingest logs into systems like ELK, Datadog, or Loki.

Log Example

server.log

{
  "level": 30,
  "time": 1678901234567,
  "pid": 12345,
  "hostname": "topgun-server-0",
  "msg": "Server started on port 8080",
  "nodeId": "node-xyz-123"
}

Logs include context such as nodeId, clientId, and requestId where applicable, allowing you to trace requests across the distributed system.

Cluster Replication

Next Guide

Performance Tuning