[10:23:07] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [10:33:07] RESOLVED: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [17:52:25] FIRING: SystemdUnitFailed: grafana-loki.service on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:28:48] FIRING: PuppetFailure: Puppet has failed on titan2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:31:48] FIRING: PuppetFailure: Puppet has failed on webperf2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:36:48] FIRING: [2x] PuppetFailure: Puppet has failed on webperf1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:37:48] FIRING: [3x] PuppetFailure: Puppet has failed on prometheus2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:38:48] FIRING: PuppetFailure: Puppet has failed on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:39:02] This is strange, I'm taking a look. [18:41:38] Okay, this is related to Gerrit being down. [18:41:48] FIRING: PuppetFailure: Puppet has failed on alert2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:42:48] FIRING: [7x] PuppetFailure: Puppet has failed on prometheus1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:43:11] I've ACK'd and silenced the alerts for 1 day. [18:45:44] For more context, the alerts are happening because the hosts can't pull the alerts repository from Gerrit. [21:53:13] FIRING: SystemdUnitFailed: grafana-loki.service on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed