[01:28:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:03:51] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:48:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:08:07] 10Mail, 10Infrastructure-Foundations: Exim: add lists and auto-generated headers - https://phabricator.wikimedia.org/T347831 (10ayounsi) [08:34:02] 10Mail, 10Infrastructure-Foundations: Add Auto-Submitted: auto-generated header to emails sent by scripts - https://phabricator.wikimedia.org/T347835 (10ayounsi) p:05Triage→03Low [08:43:28] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) I'm going to start draining nodes from `D5`: cloudcephosd1011 cloudcephosd1012 cloudcephosd1013 cloudcephosd1014 clou... [08:46:54] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10cmooney) >>! In T316544#9214514, @dcaro wrote: > I'm going to start draining nodes from `D5`: @dcaro that's great thanks! L... [08:48:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:13:13] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [09:18:51] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10cmooney) In terms of the other nodes in that rack we have the following cloudvirts, and should consider possibly moving insta... [10:03:48] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [10:04:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Export routes generated from ARP/ND in EVPN - https://phabricator.wikimedia.org/T329369 (10cmooney) 05Open→03Resolved Change merged and pushed out to live devices. No change to announced routes on existing devices, e.g. type 5 routes a... [10:10:14] 10Packaging, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): wmfbackups packages for Debian Bookworm - https://phabricator.wikimedia.org/T347740 (10fnegri) I was planning to work on this one. It's not an emergency but it's blocking the upgrade of our hosts to Bookworm (T345810), which i... [10:32:46] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10Goal, and 2 others: cloudcumin: decide sudoers rules for users without global root - https://phabricator.wikimedia.org/T325067 (10fnegri) [10:40:41] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10aborrero) 05Open→03Resolved [10:40:44] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Netbox - PuppetDB audit 2021-11 - https://phabricator.wikimedia.org/T295762 (10aborrero) [10:47:14] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10cmooney) > I think they should be converted all to be /32 both on Netbox and on the instances. This will also let the automation know that they are proper VIPs and will p... [10:58:47] (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:13:47] (SystemdUnitFailed) firing: (4) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:17:27] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Netbox - PuppetDB audit 2021-11 - https://phabricator.wikimedia.org/T295762 (10cmooney) [11:17:56] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10cmooney) 05Resolved→03Open I'm gonna re-open this for now, as it looks like the issue isn't fully solved. On the cloudnet side of this particular link the VIP is sti... [11:33:52] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10Volans) @cmooney this change would affect a lot of VIPs assigned by puppet all over production so we must check carefully the consequences of any changes. That said I'm h... [12:39:55] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10cmooney) [12:51:22] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10cmooney) [12:57:45] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10Goal, and 2 others: cloudcumin: decide sudoers rules for users without global root - https://phabricator.wikimedia.org/T325067 (10fnegri) 05Open→03Resolved a:03fnegri The patch above has been merged and now all members of the `wmcs-roots` grou... [13:07:35] 10Puppet, 10Patch-For-Review: pg replication lag UNKNOWN for puppetdb2003 - https://phabricator.wikimedia.org/T346016 (10jbond) 05In progress→03Resolved a:03jbond This has now been corrected [13:51:38] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10cmooney) @Volans yep thanks. I created a provisional patch but I agree we need to consider all the cases. I believe from looking through the code... [14:04:06] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Spicerack: add distributed locking support - https://phabricator.wikimedia.org/T341973 (10Volans) To ensure that the generated read/write traffic on the etcd cluster will be ok and not cause any issue I've made some tests using the... [15:13:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:59:08] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10aborrero) [15:59:52] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10aborrero) [19:13:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:04:13] (DiskSpace) firing: Disk space idp2002:9100:/ 5.979% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:44:13] (DiskSpace) resolved: Disk space idp2002:9100:/ 5.802% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [23:13:47] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed