[00:21:42] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [00:25:27] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [00:55:27] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [00:56:27] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [01:42:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:31:27] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [02:32:26] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [02:52:26] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [02:53:57] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [03:33:57] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [03:36:42] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [03:56:42] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [03:59:42] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [04:53:57] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [04:54:42] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:14:42] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:16:12] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:26:12] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:27:11] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:47:12] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [05:49:42] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [06:53:57] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [06:54:42] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [07:34:59] I'll have a look at ^ [08:08:57] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [08:09:56] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [08:19:42] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [08:20:56] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [08:40:56] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [08:43:27] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [09:23:27] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki2003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [10:15:22] hey folks, we have some newly renamed and reimaged hosts on T353788, we would like to change the current vlan/add them to the analytics vlan. Any pointers on how to go about this? [10:15:22] T353788: Add kafka-stretch100[1-2] to the hadoop cluster - https://phabricator.wikimedia.org/T353788 [11:29:19] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations: keepalived: it doesn't support mixing IPv4 and IPv6 VIPs on the same VRRP instance - https://phabricator.wikimedia.org/T376879#10221052 (10aborrero) 05In progress→03Resolved >>! In T376879#10219564, @Multichill wrote: > Ipv... [12:12:22] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10221196 (10ayounsi) A few more reasons to upgrade in {T376986}. [13:33:35] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 3 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10221410 (10Papaul) [13:35:15] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 3 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10221411 (10Papaul) 05Open→03Resolved This is now complete the new firewall is in place and in production [13:38:59] 10netbox, 06Infrastructure-Foundations: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible - https://phabricator.wikimedia.org/T354169#10221417 (10ayounsi) FYI, I finally cleaned up the description field and removed the `WikiKube` tag in Netbox. > A tag or custom... [14:14:23] reading through email, are we good to ignore these alerts for RPKI invalids for those new(?) WMCS codfw prefixes? (e.g. 2a02:ec80:a100:fe04::/64) [14:16:11] ah, lol - just saw https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1079487 for the export policy ... never mind [14:47:47] cdanis: o/ [14:47:56] elukey: hi! [14:48:05] if you are around, we could break ehm expand aux-etcd with https://wikitech.wikimedia.org/wiki/Etcd#Adding_a_new_member_to_the_cluster [14:48:17] basically going from 3->5 [14:48:26] https://phabricator.wikimedia.org/T344230 [14:48:39] if it is too dangerous for a friday we can do it next week :D [14:48:39] I am about to have to drive the family to the airport but I will be around in about 2hrs, I'm happy to pick up the work wherever you leave off if you want to proceed [14:48:59] ah nono let's do it next week then, I'll prep the changes in the meantime [14:49:01] breaking the aux cluster for a bit would be fine [14:49:03] ok! [14:57:16] swfrench-wmf: em yeah that was my bad. we only announced to RIPE (not the internet), so the alerts are an anomaly (the routes we are announcing to peers and transit do have RPKI ROAs) [14:57:25] I'll go ahead and merge arzhel's patch to clear the issue [15:07:47] topranks: ah, thanks for clarifying about where these were actually getting announced to (didn't realize it was _just_ RIS). sounds good and thanks :) [15:30:27] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 3 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10221746 (10Papaul) [15:35:19] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10221749 (10Papaul) 05Open→03Resolved This is complete switches are in production [16:15:10] 10CAS-SSO, 06Data-Engineering, 10Data-Engineering-Jupyter, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386#10222001 (10BTullis) I have written a proposal document (restricted to WMF) about how to im... [16:21:56] 10CAS-SSO, 06Data-Engineering, 10Data-Engineering-Jupyter, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Improve the JupyterHub services and use CAS/SSO - https://phabricator.wikimedia.org/T260386#10222015 (10BTullis) [18:29:02] 07Puppet, 06Infrastructure-Foundations, 06SRE: Facter is slow on a few hosts - https://phabricator.wikimedia.org/T251293#10222459 (10colewhite) 05Open→03Resolved Haven't seen widespread problems with export_smart_data_dump.service [[ https://grafana-rw.wikimedia.org/explore?orgId=1&left=%7B%22datasou... [20:13:24] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed