[09:12:50] Hi, can I get a +1 to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178484 please? I've checked the relevant node is drained, now it needs to come out of the rings entirely before the controller swap. [10:34:26] Amir1: can I upgrade db2165 ? [10:34:50] old s8 codfw master? Go for it [10:55:53] yep [13:24:25] FIRING: SystemdUnitFailed: ferm.service on db2165:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:31:27] federico3: ^-- expected? [13:31:54] I'll chase it [13:32:42] it's not the first time fern fails during the os upgrade [13:33:08] it's not the first time ferm fails during the os upgrade [13:34:28] gave it a kick, it restarted [13:38:46] 👍 [13:39:25] RESOLVED: SystemdUnitFailed: ferm.service on db2165:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:28:39] it's a long standing bug in ferm and/or iptables-persistent, which sometimes happens: https://phabricator.wikimedia.org/T254477 [14:29:18] moving to nftables will fix that for good, but it'll take some time for the DBs to migrate [15:49:31] Amir1: can I start the flip for s4 codfw? [15:49:53] actually wait so I push a change there [15:50:12] actually, you shouldn't do it at all since you're running a schema change there? [15:50:27] https://phabricator.wikimedia.org/T399249#11082411 [15:50:29] yup [15:54:43] federico3: ^ [15:54:59] that got stuck but I have to rerun it anyways [15:55:28] it wasn't stuck, this is an expensive schema change, its going to take hours [15:55:47] the previous ones did took four five hours too [15:56:16] do you expect it to take days tho? [15:56:59] the full schema change yes [15:57:03] per replica four five hours [15:57:17] (more in general I'm taking the OS upgrade as higher priority that schema changes) [15:57:18] https://phabricator.wikimedia.org/T399249#11082411 [15:57:26] it was depooled three hours ago [15:57:42] the schema change has high priority too [15:57:48] This one speficially [15:58:01] also now you have depooled two replicas at the same time [15:58:11] and it's already depooling hosts that's already done [15:58:53] until we get the smarter depooling merged [15:59:35] you can run the --check and put the output in replicas variable, that's what we do [16:00:10] (BTW https://gitlab.wikimedia.org/repos/data_persistence/dbtools/auto_schema/-/merge_requests/14 is in draft) [16:01:04] wait, then I can stop , manually repool and do --check etc [16:01:37] yup [16:02:26] we can earmark the CR for discussing it when you have some time - I'm not 100% sure around the logic in sql_on_each* [16:18:51] make sure to repool db2219 too once it's caught up [16:22:48] yes, it should take 10 mins