Lewati ke isi

Monitoring — Prometheus & Grafana

Observability stack for Garuda Chain production and staging.

Components

Service Port Purpose
metrics-exporter 9101 Custom chain + service metrics
Prometheus 9090 Metrics collection & alerting
Grafana 3001 Dashboards & visualization
Alertmanager 9093 Alert routing
node-exporter 9100 Host system metrics
Besu validators 9545 Validator metrics (when enabled)

Metrics Exposed

Metric Description
garuda_chain_block_number Latest block height
garuda_chain_id Connected chain ID
garuda_chain_id_valid 1 if chain ID matches expected
garuda_rpc_up RPC reachability
garuda_peer_count P2P peer count
garuda_gas_price_wei Current gas price
garuda_service_up{service} Per-service health
garuda_poll_duration_ms Exporter poll latency

Quick Start

# Production monitoring
npm run monitoring:setup

# Staging monitoring
npm run monitoring:setup:staging

# Or combined with full deploy
bash scripts/staging/deploy.sh --with-monitoring

Access

Environment Grafana URL Local
Production https://grafana.garudachain.id http://localhost:3001
Staging https://grafana.staging.garudachain.id http://localhost:3001

SSH tunnel (production):

ssh -L 3001:localhost:3001 -L 9090:localhost:9090 deploy@<production-host>

Dashboard

Auto-provisioned: Garuda Chain Overview - Block height, RPC status, chain ID validation - Service health matrix - Poll latency & gas price trends

Path: deploy/monitoring/grafana/dashboards/garuda-chain.json

Alerts

Configured in deploy/monitoring/alerts.yml:

Alert Severity Condition
GarudaRPCDown critical RPC unreachable 2m
GarudaChainIdMismatch critical Wrong chain ID 1m
GarudaBlockStall critical No blocks 5m
GarudaServiceDown warning Service down 3m
GarudaLowPeers warning Peers < 2 for 10m
GarudaHighPollLatency warning Poll > 5s for 5m

Alerts route to security-monitor:4002/alerts webhook.

Enable Besu Validator Metrics

Automatically enabled when monitoring overlay is deployed (METRICS_ENABLED=true on validators).

Manual check:

curl http://localhost:9545/metrics  # via validator port forward

Production Compose

docker compose \
  -f docker-compose.mainnet.yml \
  -f docker-compose.production.yml \
  -f docker-compose.monitoring.yml \
  up -d

Environment Variables

GRAFANA_ADMIN_PASSWORD=strong_password
METRICS_RPC_URL=http://rpc-mainnet:8585
METRICS_CHAIN_ID=8846
PROMETHEUS_CONFIG=prometheus.yml   # or prometheus.staging.yml

Retention

Prometheus TSDB retention: 30 days (configurable in docker-compose.monitoring.yml).

Security

  • Prometheus/Grafana bind to 127.0.0.1 only
  • Production Grafana exposed via Caddy — restrict with firewall/VPN
  • Change default Grafana password immediately
  • Do not expose Alertmanager publicly