Monitoring — Prometheus & Grafana¶
Observability stack for Garuda Chain production and staging.
Components¶
| Service | Port | Purpose |
|---|---|---|
| metrics-exporter | 9101 | Custom chain + service metrics |
| Prometheus | 9090 | Metrics collection & alerting |
| Grafana | 3001 | Dashboards & visualization |
| Alertmanager | 9093 | Alert routing |
| node-exporter | 9100 | Host system metrics |
| Besu validators | 9545 | Validator metrics (when enabled) |
Metrics Exposed¶
| Metric | Description |
|---|---|
garuda_chain_block_number |
Latest block height |
garuda_chain_id |
Connected chain ID |
garuda_chain_id_valid |
1 if chain ID matches expected |
garuda_rpc_up |
RPC reachability |
garuda_peer_count |
P2P peer count |
garuda_gas_price_wei |
Current gas price |
garuda_service_up{service} |
Per-service health |
garuda_poll_duration_ms |
Exporter poll latency |
Quick Start¶
# Production monitoring
npm run monitoring:setup
# Staging monitoring
npm run monitoring:setup:staging
# Or combined with full deploy
bash scripts/staging/deploy.sh --with-monitoring
Access¶
| Environment | Grafana URL | Local |
|---|---|---|
| Production | https://grafana.garudachain.id | http://localhost:3001 |
| Staging | https://grafana.staging.garudachain.id | http://localhost:3001 |
SSH tunnel (production):
ssh -L 3001:localhost:3001 -L 9090:localhost:9090 deploy@<production-host>
Dashboard¶
Auto-provisioned: Garuda Chain Overview - Block height, RPC status, chain ID validation - Service health matrix - Poll latency & gas price trends
Path: deploy/monitoring/grafana/dashboards/garuda-chain.json
Alerts¶
Configured in deploy/monitoring/alerts.yml:
| Alert | Severity | Condition |
|---|---|---|
| GarudaRPCDown | critical | RPC unreachable 2m |
| GarudaChainIdMismatch | critical | Wrong chain ID 1m |
| GarudaBlockStall | critical | No blocks 5m |
| GarudaServiceDown | warning | Service down 3m |
| GarudaLowPeers | warning | Peers < 2 for 10m |
| GarudaHighPollLatency | warning | Poll > 5s for 5m |
Alerts route to security-monitor:4002/alerts webhook.
Enable Besu Validator Metrics¶
Automatically enabled when monitoring overlay is deployed (METRICS_ENABLED=true on validators).
Manual check:
curl http://localhost:9545/metrics # via validator port forward
Production Compose¶
docker compose \
-f docker-compose.mainnet.yml \
-f docker-compose.production.yml \
-f docker-compose.monitoring.yml \
up -d
Environment Variables¶
GRAFANA_ADMIN_PASSWORD=strong_password
METRICS_RPC_URL=http://rpc-mainnet:8585
METRICS_CHAIN_ID=8846
PROMETHEUS_CONFIG=prometheus.yml # or prometheus.staging.yml
Retention¶
Prometheus TSDB retention: 30 days (configurable in docker-compose.monitoring.yml).
Security¶
- Prometheus/Grafana bind to
127.0.0.1only - Production Grafana exposed via Caddy — restrict with firewall/VPN
- Change default Grafana password immediately
- Do not expose Alertmanager publicly