[01:06:53] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10132300 (10Papaul) [01:08:25] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10132304 (10Papaul) Cluster creation complete ` root@pfw1-codfw# run show chassis cluster status Cluster ID: 1 Node Pr... [01:11:21] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10132313 (10Papaul) [03:45:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:45:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:04:26] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar, 13Patch-For-Review: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10132710 (10Volans) While the above is totally true the probability that a rename+reimage happens exactly at the time of the... [08:13:17] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar, 13Patch-For-Review: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10132717 (10MoritzMuehlenhoff) >>! In T374351#10132710, @Volans wrote: > Your problem is not a Puppet run and disabling pupp... [08:15:37] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10132731 (10dcaro) [08:21:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping - https://phabricator.wikimedia.org/T374272#10132749 (10cmooney) 05Open→03Resolved a:03cmooney Still all looking good, there have been no logs or cases the interface reported d... [08:24:39] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10132754 (10dcaro) @cmooney @VRiley-WMF Hi! I'm almost done draining the rack, we can try to find a slot startin... [08:27:53] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10132768 (10dcaro) [08:39:40] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: BFD won't esablish between QFX in VRF and host from IPv6 link-local - https://phabricator.wikimedia.org/T374379#10132799 (10cmooney) Ok patch has been merged and things are ok for now. Hosts are configured to peer with the switch unicast I... [08:51:37] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: BFD won't esablish between QFX in VRF and host from IPv6 link-local - https://phabricator.wikimedia.org/T374379#10132830 (10cmooney) I'll leave this open for now, we will need to make a call on how to proceed here in general, there are two... [09:16:11] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Routed Ganeti: Add support for VM QoS marking - https://phabricator.wikimedia.org/T374392#10132903 (10cmooney) It seems the routed ganeti hosts actually use nftables instead. This is nice as it does allow us to match on the incoming interf... [09:19:55] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Routed Ganeti: Add support for VM QoS marking - https://phabricator.wikimedia.org/T374392#10132909 (10MoritzMuehlenhoff) >>! In T374392#10132903, @cmooney wrote: > It seems the routed ganeti hosts actually use nftables instead. This is nic... [09:45:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:46:45] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10132983 (10ABran-WMF) db2114 is decommed (see T362948) [10:24:48] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar, 13Patch-For-Review: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10133176 (10Volans) I don't think it does anymore unfortunately... In https://gerrit.wikimedia.org/r/plugins/gitiles/operat... [10:35:03] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10133213 (10MoritzMuehlenhoff) >>! In T374351#10133176, @Volans wrote: > I don't think it does anymore unfortunately... > > In https://gerrit.wik... [12:26:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: (2) new singlemode fiber patches from dmarc to routers for IX ports - https://phabricator.wikimedia.org/T373376#10133598 (10cmooney) To confirm the links look good, interfaces come up when enable and rx light is good: ` cmooney@re... [12:46:22] hey folks, I created https://phabricator.wikimedia.org/T374443 to collect all the info about puppet-merge [12:46:30] lemme know if anything is missing [13:28:48] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10133916 (10cmooney) >>! In T373097#10129063, @Jelto wrote: > I depooled `gitlab-runner2003` for tomorrows maintenance... [13:47:25] FIRING: SystemdUnitFailed: envoyproxy.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:57:25] RESOLVED: SystemdUnitFailed: envoyproxy.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:59:00] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BFD on 'core' EBGP peerings from L3 switches to CRs - https://phabricator.wikimedia.org/T374452 (10cmooney) 03NEW p:05Triage→03Low [14:09:55] FIRING: [2x] SystemdUnitFailed: envoyproxy.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:29:55] RESOLVED: SystemdUnitFailed: envoyproxy.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:30:55] FIRING: SystemdUnitFailed: envoyproxy.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:35:55] RESOLVED: SystemdUnitFailed: envoyproxy.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:40:06] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10134339 (10cmooney) >>! In T373942#10117963, @Jgreen wrote: > The haproxy configuration part seems to work, I'm a... [14:50:37] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10134407 (10Jgreen) >>! In T373942#10134339, @cmooney wrote: >>>! In T373942#10117963, @Jgreen wrote: >> The hapro... [14:56:10] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar, 13Patch-For-Review: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10134469 (10Clement_Goubert) Sorry I didn't see the updates to the discussion before merging the previous iteration. Patch u... [15:47:38] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134741 (10ABran-WMF) db/es hosts have been depooled [15:50:57] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134763 (10cmooney) >>! In T373097#10134741, @ABran-WMF wrote: > db/es hosts have been depooled thanks for confirming! [16:01:07] hello I/F friends - when allocating a VIP for an LVS service [0], is it alright if there's a delay between running `sre.dns.netbox` and the subsequent operations/dns patch that's still needed for the svc zones. [16:01:07] [0] https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox [16:02:00] basically, confirming whether it's alright to have the (unused) netbox-generated records lingering for a bit, before I add the "real" ones [16:03:08] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134808 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c5ef5c49-317c-49af-b11b-61e58fe45620) set... [16:07:59] swfrench-wmf: not I/F but it's fine. the thing that should not linger IMO is the Netbox changes themselves since they block other changes [16:09:28] sukhe: great, thank you! yeah, once I save those changes I'll plan to follow up with the cookbook run soon after. [16:09:34] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134820 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a5d7ae66-6b48-4bdb-8951-87b0e41404de) set... [16:14:04] swfrench-wmf: what suk.he said :) manual change can wait a bit, not a big deal, only risk is if someone wants to add a new IP, starts with the manual patch and steals yours, but it will show up in netbox afterwards :) [16:15:26] volans: thanks as well :) ah yes, inconsistent resource acquisition order, always a recipe for a fun time. [16:17:56] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134839 (10cmooney) Move done, all migrated hosts are pinging again no issues to report. [16:22:19] indeed :) [16:24:27] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10134882 (10ABran-WMF) db/es hosts are repooling [19:06:27] 10Mail, 06Infrastructure-Foundations, 07User-notice-archive: Notifications stop after bot edits until page is manually viewed or watchlist is marked as read - https://phabricator.wikimedia.org/T374404#10135366 (10AlbanGeller) [19:06:31] 10Mail, 06Infrastructure-Foundations, 07User-notice-archive: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984#10135367 (10AlbanGeller) [19:30:10] 10Mail, 06Infrastructure-Foundations, 06SRE: Having issues with Zendesk e-mail notifications - https://phabricator.wikimedia.org/T374489#10135576 (10JLam-WMF) [21:39:46] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10135906 (10cmooney) 05Open→03Resolved a:03cmooney