[00:29:06] FIRING: MysqlReplicationThreadCountTooLow: MySQL instance db1233:9104 has replication issues. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1233&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationThreadCountTooLow [03:50:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1233:9104 has too large replication lag (4h 11m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1233&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [04:29:06] FIRING: MysqlReplicationThreadCountTooLow: MySQL instance db1233:9104 has replication issues. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1233&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationThreadCountTooLow [07:14:30] PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 1.227e+05 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [07:24:48] FIRING: MysqlReplicationLag: MySQL instance db1155:13312 has too large replication lag (1d 7h 44m 4s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1155&var-port=13312 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:50:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1233:9104 has too large replication lag (8h 11m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1233&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [08:29:06] FIRING: MysqlReplicationThreadCountTooLow: MySQL instance db1233:9104 has replication issues. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1233&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationThreadCountTooLow [10:49:48] RESOLVED: MysqlReplicationLag: MySQL instance db1155:13312 has too large replication lag (6h 24m 55s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1155&var-port=13312 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [12:11:36] PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 5001 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [12:21:48] FIRING: MysqlReplicationLag: MySQL instance db1155:13312 has too large replication lag (5m 7s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1155&var-port=13312 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [12:26:36] RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [12:26:48] RESOLVED: MysqlReplicationLag: MySQL instance db1155:13312 has too large replication lag (5m 7s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1155&var-port=13312 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [14:04:12] FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.20.service on moss-be2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:07:07] FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.20.service on moss-be2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:56:09] that'll be a disk gone [22:07:07] FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.20.service on moss-be2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:29:04] PROBLEM - MariaDB sustained replica lag on s4 on db2155 is CRITICAL: 18.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2155&var-port=9104 [22:31:04] RECOVERY - MariaDB sustained replica lag on s4 on db2155 is OK: (C)10 ge (W)5 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2155&var-port=9104