[08:58:55] 10netops, 06Infrastructure-Foundations, 06SRE: cloudsw1-d5-eqiad instability Aug 6 2024 - https://phabricator.wikimedia.org/T371879#10227934 (10cmooney) 05Open→03Resolved Closing this one, things have been ok since upgrade/reset. [09:13:07] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10227973 (10BTullis) a:03BTullis [09:28:11] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10228035 (10cmooney) p:05Medium→03Low [09:54:59] hey folks, I noticed https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=debmonitor [09:55:25] we use cfssl for debmonitor, so I'd say it is just a matter of destroying the old puppet cert [09:55:39] can somebody sanity check before I proceed? [09:59:50] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10228256 (10BTullis) >>! In T376697#10225721, @cmooney wrote: > The envoy service it connects to seems stable for longer though.... [10:15:17] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10228372 (10BTullis) Ah, it looks like `check_rise` and `check_interval` are currently hardcoded in the template: https://github.c... [10:35:36] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18), 13Patch-For-Review: cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10228450 (10cmooney) >>! In T376697#10228372, @BTullis wrote: > Ah, it looks like `check_rise` and `check_in... [11:22:06] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10228569 (10aborrero) 05In progress→03Resolved I think we can consider this to be completed. We may reopen if required. [11:52:04] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10228652 (10BTullis) An interesting data point is that in our radosgw logs we have only 200 responses recorded from the `check_htt... [12:50:27] removed the debmonitor.discovery.wment cert from puppetmaster1001 [13:52:37] 10netops, 10Ceph, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697#10229187 (10BTullis) 05Open→03Resolved I'll mark this as done, for now. Haven't had any more occurrences since this mornin... [14:45:11] elukey: thx [14:45:18] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10229563 (10Jgreen) 05Open→03In progress p:05Triage→03Medium [15:11:24] FIRING: SystemdUnitFailed: netbox_ganeti_eqiad_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:41:24] RESOLVED: SystemdUnitFailed: netbox_ganeti_eqiad_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:07:50] 10Mail, 06Infrastructure-Foundations, 10Sustainability (Incident Followup): Upgrade Exim to 4.96 - https://phabricator.wikimedia.org/T310836#10230035 (10jhathaway) 05Open→03Invalid We have migrated postfix [16:52:46] 10Mail, 06Infrastructure-Foundations: Remove wikivoyage-ev.org mail aliases from wikivoyage.org & wikivoyage.de - https://phabricator.wikimedia.org/T319041#10230312 (10jhathaway) 05Open→03Resolved [17:08:07] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643#10230435 (10nisrael) Apologies, I was out of office last week. @jhathaway and @Dzahn here is the .eml file! {F57617255} [17:18:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10230470 (10Papaul) [17:37:27] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10230551 (10Papaul) [18:26:23] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack pfw3 and old fasw decommission - https://phabricator.wikimedia.org/T377254 (10Papaul) 03NEW [18:52:29] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack pfw3 and old fasw decommission - https://phabricator.wikimedia.org/T377254#10230870 (10Papaul) [22:59:11] 10netops, 06Infrastructure-Foundations: Arelion IPv6 transit renumbering - https://phabricator.wikimedia.org/T365697#10231895 (10Papaul) @ayounsi I double check again that all the old IPV6 were removed. But we have the old IPV6 that were still in cache so here are the 2 commands I used to fix the issue. I...