[02:51:40] FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.31.service on moss-be2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:32:06] Emperor godog can we do something about moss-be2002:9100 ^ it's been alerting every 4 hours for the whole weekend [04:51:12] I am going to switch m2 master [06:51:40] FIRING: SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.31.service on moss-be2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:04:29] * volans is back, on clinic duty and reporting for durty for Data Persistence :) [07:21:46] welcome volans ! [07:25:57] thanks! [07:42:43] marostegui: easiest is to silence the alert for that unit [07:47:02] godog: but is there any point then if it fires every 4 hours and we just need to silence it? :) [07:50:29] marostegui: heh that would be https://phabricator.wikimedia.org/T357333 [07:51:40] Yeah, that is why I was asking :) [07:52:17] godog: Do you have some ETA on when that'd be fixed? [07:52:55] I have silenced the alerts for 7 days btw [07:53:03] marostegui: my plan is to work on it this quarter [07:53:08] godog: nice!! <3 [07:53:43] I've suggested two work arounds in the task, please consider implementing those in the meantime too [07:56:52] would it be possible to route all alerts to data-persistence-feed except p*ges ? [07:58:01] yes that's doable too [07:58:44] that would be nice! [07:58:53] marostegui: ^ that would be my suggestion, but I don't get to decide [07:59:22] I'm ok with that [08:06:26] That'd be okay too, but that's essentially sort of ignoring alerts :) [08:09:58] godog: I'd prefer not to set long-running silence as otherwise we may simply forget about this problem, and I believe it needs some better fixing than just silencing for good (whatever the fix we agreed on) [08:13:26] yeah the redirect to -feed or to a sink/devnull is the next best option in this case I think then, at least for now that is [08:13:55] yep, I agree [08:16:25] can I depool es1025 for 24 hours? [08:16:31] yep [08:20:01] there were some ongoing errors on ruwiki api [08:23:23] dump.es5.2024-07-01--08-22-56 is ongoing [10:36:47] PROBLEM - MariaDB sustained replica lag on s5 on db1154 is CRITICAL: 330 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315 [10:37:26] marostegui: we finished the meeting, did you find something about the p*ge? [10:37:47] RECOVERY - MariaDB sustained replica lag on s5 on db1154 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13315 [10:38:21] :-( on index corruption [10:38:38] jynus: nope, and I got buried in 200 things more [10:38:47] I guess :-( [11:28:05] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 12.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:30:05] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:55:06] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 13.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:56:06] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:57:43] is that the dumps ? [11:59:03] * volans lunch [12:14:19] jynus: yes [12:14:28] :-((( [12:15:06] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 43 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:18:06] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:08:11] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 25.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:10:11] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:57:15] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 16.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:01:18] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 3.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:34:22] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 15.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:36:22] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:41:46] procurement task will be coming soon, dcops will file it today and I will fill it in tomorrow and will give you a heads up [14:48:27] 🎉 [15:30:24] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 32.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [15:37:24] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 1.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:20:35] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 12.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:21:35] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:12:37] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 10.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [20:15:37] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 4.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [20:22:37] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [20:26:37] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 11.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [20:39:37] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 4.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [21:02:37] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 17.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [21:05:37] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 4.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [21:10:37] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 14.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [21:14:37] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 3.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [21:50:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 10.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [21:51:43] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:03:41] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 13.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:12:43] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 4.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:15:43] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 37.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:17:43] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:19:43] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 10.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:24:43] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 2.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:29:45] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 20.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:32:43] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 3.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:54:45] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 120.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:06:45] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:10:43] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 20.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [23:13:43] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 4.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [23:19:43] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 20.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [23:22:44] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 2.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [23:32:44] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 10.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:33:44] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 3 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104