[07:07:40] Going to switch s2 codfw master [07:16:45] marostegui: can you let me know when you're done playing with it please [07:16:49] ? [07:16:53] will do! [07:16:58] <3 [07:28:47] arnaudb: the switchover is done, I am running my schema change, will let you know when you can take the host [07:28:54] great thanks! [08:38:05] for some reason, yesterday's dump of s8 took 10 hours, so I am going to rerun it again (normally it takes 4) [09:25:25] FIRING: SystemdUnitFailed: ifup@eno5.service on ms-be2053:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:35:25] RESOLVED: SystemdUnitFailed: ifup@eno5.service on ms-be2053:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:35:56] I am going to switch x2 master [09:35:58] wish me luck [09:39:58] 😱 [09:54:03] I abandoned lots of automatically generated gerrit patches for switchovers that were not merged (and probably were not needed anymore) if I abandoned anything by mistake, please restore [09:55:08] Amir1: were tasks generated as well? [09:55:31] yeah https://gerrit.wikimedia.org/r/q/topic:%22lsc%22 [09:55:49] e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1036601 which the task is now declined [09:56:14] Amir1: Ah maybe, it got autothrottled by wikibugs? I didn't see them in our feed channel [09:56:24] Ah but they are in my mail [09:56:25] right [10:01:28] arnaudb: the old s2 codfw master can be used. It is _pooled_ [10:02:52] I go to old codfw s1 master [10:03:09] thanks marostegui, will go for it [10:13:12] I have reimaged all the core bullseye hosts that were reported at https://phabricator.wikimedia.org/T366556 [10:13:22] I saw that, thanks! [10:15:21] db2204 was the one requiring the change, I guess I've missed a previous switchover 🤔 [10:16:26] Probably [10:16:46] well, I'll have to swap it again [10:19:00] marostegui: I can't see T366684 I think the space should stay as S1 [10:19:13] I just changed it [10:19:29] arnaudb: if you're not touching db2204, I have schema change there, can I? :D [10:19:42] Thanks marostegui, do you want me to run my script? [10:19:53] Amir1: which script? [10:20:08] oh, it's only bullseye, nvm then [10:20:17] Amir1: please go ahead, I did not wanted to perform a switchmaster right now as I'm gonna eat lunch :D [10:23:44] started, it's going to take a couple of hours :D [12:50:27] fyi jclarck-ctr will go for T363119 → host has been depooled and downtimed [12:50:27] T363119: db1246 crashed - https://phabricator.wikimedia.org/T363119 [17:30:11] FYI, some time this or next week, I'll deploy a new conftool for T365123. flagging that here, since the main change in behavior is that dbctl will now validate changes to "external" sections just like it does for "regular" (i.e., each section has exactly one master and replica count satisfies min_replicas). [17:30:11] I've verified that all external-flavored sections should validate. [17:30:11] in any case, let me know if you have any questions / concerns :) [17:30:11] T365123: Make dbctl check for depooled future masters - https://phabricator.wikimedia.org/T365123 [23:36:33] > s8 eqiad snapshot wrong_size 19 hours ago 1.4 TB -9.0 % The previous backup had a size of 1.6 TB, a change larger than 5.0%. [23:36:51] pagelinks drop