[00:22:23] PROBLEM - MariaDB sustained replica lag on x1 on db1220 is CRITICAL: 25 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1220&var-port=9104 [00:22:55] PROBLEM - MariaDB sustained replica lag on x1 on db1179 is CRITICAL: 16.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1179&var-port=9104 [00:23:23] RECOVERY - MariaDB sustained replica lag on x1 on db1220 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1220&var-port=9104 [00:23:55] RECOVERY - MariaDB sustained replica lag on x1 on db1179 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1179&var-port=9104 [01:12:08] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2044:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:12:08] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2044:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:27:08] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2044:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:57:08] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2044:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:02:08] RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es2044:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:09] Emperor: o/ just to confirm, all the ms-be nodes should be ready to go (except the last one of each series to keep as test)