[04:34:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:44:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:28:02] FYI, I fixed the underlying issue which prevented the re-migration of prometheus5003 back to DRBD earlier the week and will migrate it to DRBD in a but (during the migration the VM will be unavailable) [13:42:54] moritzm: thx, sre.metamonitoring.downtime cookbook is now available [13:43:49] the cookbook to change between DRBD and plain also downtimes,so that shouldn't be needed [13:44:00] and it just completed, prometheus5003 is back on DRBD [13:44:29] ack, thanks again moritzm [22:50:56] FIRING: PrometheusZombieSeriesDetected: Zombie series detected on k8s (eqiad) - https://wikitech.wikimedia.org/wiki/Prometheus#Runbooks - https://grafana.wikimedia.org/d/taff979/prometheus-tsdb-cardinality-monitoring?orgId=1&from=now-14d&to=now&timezone=utc&var-prometheus=k8s&var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DPrometheusZombieSeriesDetected [23:25:56] FIRING: [2x] PrometheusZombieSeriesDetected: Zombie series detected on k8s (codfw) - https://wikitech.wikimedia.org/wiki/Prometheus#Runbooks - https://alerts.wikimedia.org/?q=alertname%3DPrometheusZombieSeriesDetected