[01:19:55] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:19:55] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:14:32] slyngs: ^ any luck with that one ? [07:14:42] Also is there anyone who can review https://gerrit.wikimedia.org/r/c/operations/puppet/+/1259748 ? [07:35:08] XioNoX: Yes, I have a patch, let me just see if it got reviewed [07:35:34] Apparently not: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1255580 [07:38:44] slyngs: shouldn't ExternalCloudVendorRIPE log and skip the one problematic network instead of failing the run? [07:39:00] +1 to remove it, but the issue might happen again in the futur, no? [07:39:57] Agreed, we kinda need to handle this a little different. The idea is well intended, we get zero networks, something has gone wrong, but WHAT? [07:40:36] slyngs: sounds good [07:40:37] +1 [07:41:20] I'll ask what the intention was a come up with a solution. There's information enough in the ripe API that we should be able to handle it more intelligently [07:42:14] slyngs: should we also check requestctl to make sure there are no mentions of that provider in the rules? [07:43:42] Probably, I'll do that in a bit [08:49:28] XioNoX: We had the IP block in one place in requestctl. I've removed it. [08:49:38] cool [08:59:19] FIRING: [2x] SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:04:19] FIRING: [2x] SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:26] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:10:32] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:13:52] 10SRE-tools, 06ServiceOps new: Add a --rack flag to sre.k8s.pool-depool-node - https://phabricator.wikimedia.org/T410537#11742517 (10MLechvien-WMF) 05Open→03Resolved Tentatively resolving as this was tested and merged, please reopen if any concerns [10:38:26] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:40:31] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:52:25] jhathaway: thanks for the review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1212097, I fixed the one issue I found and added tests to cover that, PCC is still running but so far looks correct so I think that's ready for a final look from you and then we can merge it [10:53:42] fwiw it seems like that notrack issue affected every rule except the first from each definition, not just IPv6 [11:38:26] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:40:32] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:48:26] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:57:25] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:43:00] 10netops, 06Infrastructure-Foundations, 06SRE: Atlas no longer reachable from monitoring on routed ganeti - https://phabricator.wikimedia.org/T420975#11743113 (10cmooney) 05Open→03Resolved a:03cmooney This should now be working again. Big thanks to @ayounsi for the heavy-lifting with all the puppe... [13:36:40] Hey IF! small patch to add a repo component if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1259232 [13:39:06] many thanks c-danis ;) [15:23:44] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11744633 (10RobH) 05Open→03Resolved [15:57:40] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:21:30] https://www.irccloud.com/pastebin/G3EG11XL/ [16:21:52] uff it got all in [16:21:58] topranks, XioNoX --^ :D [16:23:05] elukey: you can push that feel free [16:23:42] thank youu [16:32:11] topranks: another thing if you have a min - aux-k8s-worker1006 is connected to lsw1-d4-eqiad that is a Nokia, do I need to do some special magic to allow BGP in there? Other than the usual netbox BGP flag + homer dance [16:35:24] elukey: probably not, let me check [16:37:50] elukey: so the good news is that host is on the new per-rack vlan [16:38:13] the bad news is that means it has to peer with the switch, and the bgp policy might not yet be defined [16:38:15] let me check [16:44:27] I checked in deployment-charts and I see it there, plus the aux's admin_ng diff comes up empty (so in theory it should know how to talk to it) [16:44:39] the calico's BGPPeers config I mean [16:49:05] ok I had to make a patch [16:54:01] ah snap there is some extra static config? [16:54:45] yeah [16:54:50] https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1260033 [16:54:57] arzhel is just reviewing [17:01:52] elukey: ok the policy is updated on lsw1-d4 now [17:03:56] elukey: switch looks configured correctly for aux-k8s-worker1006, but bgp hasn't established for some reason [17:04:15] yeah tried to restart calico as well there, mmm [17:04:29] ah wait there is something else missing [17:06:32] topranks: I also have a worker on lsw1-d7-eqiad btw [17:06:47] I am not sure if you are going to kill me now or not, because I might have told you sooner [17:07:31] ah ok no it is a per cluster thing, oook [17:09:25] so aux-k8s-worker1006's calico works [17:12:21] 1007 still not working, weird [17:12:34] I guess I'll need to run homer on d7? [17:12:45] should be ok in d4 now [17:12:48] yep [17:13:05] super doing it! Thanks! <3 [17:13:18] actually I need to merge the last patch so homer works from cumin host [17:22:11] elukey: ok should be good now in d7 too [17:22:20] https://www.irccloud.com/pastebin/FgC6nv4V/ [17:32:40] topranks: confirmed, thanks a lot! [18:23:44] hi folks, I'm following [0] to convert host over to UEFI - anyone happen to have a minute to look over my cookbook flags? [18:23:44] [0] - https://wikitech.wikimedia.org/wiki/UEFI_Boot [18:37:18] jasmine_: sure [18:37:25] ah, just seeing now that UEFI is default (https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214055) [18:38:59] jhathaway: thanks! in that case, just provisioning with `--no-switch --no-users --no-dhcp` should be okay if converting? [18:39:52] yes [18:44:08] sweet, ty! [19:57:40] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:57:40] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed