[04:51:24] Amir1: you finished with the old s8 master? [07:12:26] Would anyone care to 👀 and maybe +1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1037558 please? I know PCC is unhappy, but j.hathaway (who has been very helpful!) is of the view that this is a bug in PCC not the CR and that it'd be more useful to merge this and look at fixing PCC in due course rather than blocking on it. [07:13:14] The change is a starter-for-ten on RGW (i.e. S3) setup for apus [07:13:47] thanks arnaud.b :) [07:13:54] my pleasure! [07:24:25] FIRING: SystemdUnitFailed: envoyproxy.service on moss-fe1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:27:39] let me downtime that host for a bit, it's in dev [07:34:21] [yes, puppet is happy, but has left me with an empty /etc/envoy/envoy.yaml so more work needed here...] [08:04:46] Hm, fixed by running sudo /usr/local/sbin/build-envoy-config -c /etc/envoy which puppet should have done for me. [08:26:50] have you ever met: "resize2fs: Invalid argument While checking for on-line resizing support" [08:30:40] sounds like the FS doesn't support it? [I'd lazily strace to see what actually got EINVAL] [09:11:51] marostegui: sorry I just woke up. Yes. I'm done! [09:13:51] Thanks! [09:59:16] arnaudb: I will be done with s2 codfw in a day: https://orchestrator.wikimedia.org/web/cluster/alias/s2 [09:59:47] (the schema change goes alphabetically) [10:01:26] I can pick the old s3 master in codfw? Are you done marostegui and arnaudb ? [10:04:24] ok for me ! [10:51:12] Amir1: I'm not doing anything with it [10:51:24] awesome [13:31:12] Couple of tiny but useful apus hiera updates if anyone's feeling kind, please? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1037792 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1037791 [13:33:04] [PCC still busted on moss-fe1002] [13:42:26] arnaud.b: thanks :) [13:43:14] anytime! [13:49:25] FIRING: SystemdUnitFailed: logrotate.service on moss-be1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:30:19] arnaudb, Emporer: thanks for the input on the eqiad switch upgrade task and creating all the sub-tasks [14:30:31] Emperor: even, damn I always spell that wrong :) [14:30:47] :D [14:31:03] I'll propose to do them at 15:00 UTC, which is 4pm for me in Ireland, and 11am Eastern in the US [14:31:14] if that works for you guys? [14:31:37] in terms of the tasks you created I started editing the description in them but I'm not sure that was right [14:31:54] do you want to use those tasks just to track the actions you need to take for hosts you guys manage? [14:32:15] oh no it was mostly to take a first inventory [14:32:17] if so I will create new, per-rack tasks (assigned to me) to track the actual network switch upgrades, and make those children of them [14:32:34] you can edit at will, don't worry :D [14:32:40] ok, I've tried to do the inventory as best I can on the google sheet [14:33:06] cool no probs, basically I need a "master" task for each rack which should be assigned to me to do the actual upgrade if that makes sense [14:33:45] so I don't know whether to add to your tasks to turn them into that, or create my own set and make your ones children of that [14:33:47] either works for me [14:42:55] topranks: I don't mind - if you roll them in together we don't have to remember to separately close ours [14:43:04] exactly! [14:43:27] ok guys thanks, yeah that sounds good to me, don't want to overdo it with a million tasks [14:43:30] cheers :) [15:00:09] topranks: time-wise, that's fine with me except for when it clashes with the staff meeting, which I'd rather not miss if poss (but obviously if that's the only good time I can live with it!) [15:04:02] They're all planned for Tuesdays and Thursdays so with any luck that won't happen, but I'll double-check [15:04:29] sorry staff meeting rather than SRE meeting - good call [15:04:56] Maybe 14:00 UTC is better in that case to avoid it [15:08:41] that's good for me [15:11:07] cool thanks I'll do that [15:13:55] ta [15:14:25] RESOLVED: SystemdUnitFailed: logrotate.service on moss-be1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed