[08:20:57] jayme, hnowlan, claime: could you have a look at https://phabricator.wikimedia.org/T372728 ? I think there is a typo causing some `wikikube-worker` hosts to not have any role set in Puppet [08:21:08] (and Alex is out) [08:24:22] sorry yeah there's a missing 0 [08:24:40] patch incoming shortly [08:24:52] cool, thx! [10:36:43] godog: fyi, I'm pushing updated ACLs that include the new alert hosts and new prometheus hosts to the mgmt and core routers [10:38:20] XioNoX: sweet, thank you [13:03:27] I seem to remember something needing to be done after a host stayed in netbox as failed for quite a while, but I'm failing at finding the doc for it [13:04:27] I put it back to active but there is no change from sre.dns.netbox [13:08:50] ah. found it. i need to run sre.puppet.sync-netbox-hiera [14:43:47] wow I am deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1063761 and turns out systemd-sysusers oauth2-proxy.conf on bullseye segfaults because of long lines in gshadow (adm in our case) https://github.com/systemd/systemd/issues/6512 [14:45:37] heh godog we were just talking about that https://phabricator.wikimedia.org/T372472#10064694 [14:45:52] cdanis: lolsob [14:46:10] reverting [14:48:08] tbh it is kind of depressing to search for gshadow in phab and see T256098 pop up [14:48:09] T256098: Segfault for systemd-sysusers.service on stat1007 - https://phabricator.wikimedia.org/T256098 [14:51:22] hah, indeed [15:04:21] godog: I set the priority on the task, since this is having more immediate effect than I realized. Still not sure of the best path forward. [15:04:43] apologies for making you re-triage the issue [15:06:25] jhathaway: sure no worries, but yeah I'm not sure either what is the best path forward [15:07:25] I do know patching libc is very tricky, at least in my experience [15:09:42] easy to believe, though I haven't done it myself [15:10:27] I'll run the issue by the o11y team on wed, we'll see [15:10:56] it is possible that specifically for our use case might as well bite the bullet and upgrade in place to bookworm, we have to do it anyways [15:12:05] okay [19:11:48] !incidents [19:11:49] 5081 (ACKED) Primary outbound port utilisation over 80% (paged) global noc (cloudsw1-c8-eqiad.mgmt.eqiad.wmnet) [19:11:49] 5082 (ACKED) Primary inbound port utilisation over 80% (paged) global noc (cloudsw1-f4-eqiad.mgmt.eqiad.wmnet) [19:11:49] 5074 (RESOLVED) VarnishUnavailable global sre (varnish-text thanos-rule) [19:11:49] 5067 (RESOLVED) ProbeDown sre (10.2.2.76 ip4 mw-api-ext:4447 probes/service http_mw-api-ext_ip4 eqiad) [19:11:50] 5068 (RESOLVED) HaproxyUnavailable cache_text global sre (thanos-rule) [19:11:50] 5076 (RESOLVED) db1249 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:50] 5078 (RESOLVED) db1248 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:50] 5075 (RESOLVED) db1238 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:51] 5077 (RESOLVED) db1242 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:51] 5073 (RESOLVED) db1241 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:52] 5072 (RESOLVED) db1243 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:52] 5071 (RESOLVED) db1221 (paged)/MariaDB Replica Lag: s4 (paged) [19:11:54] lol