[01:49:44] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:19:44] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:04:44] (SystemdUnitFailed) firing: sync_bitu_username_block.service Failed on idm1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:27:26] 10netbox, 10Infrastructure-Foundations: Netbox report test_matching_vlan - AttributeError: 'NoneType' object has no attribute 'prefixes' - https://phabricator.wikimedia.org/T339078 (10ayounsi) [07:27:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: test_matching_vlan() function crashing in Netbox network report - https://phabricator.wikimedia.org/T339133 (10ayounsi) [07:28:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: test_matching_vlan() function crashing in Netbox network report - https://phabricator.wikimedia.org/T339133 (10ayounsi) [09:08:34] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudservices2004-dev.codfw.wmne... [09:21:50] Just to let you know, I've temporarily put 3.8 GBs of hadoop debs in /home/btullis on apt1001 - The root volume is at 88% with 21 GB available, so it could alert if it goes much higher. I'll try to clear up after myself. [09:26:39] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [09:30:17] ack [10:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [10:55:47] Anyone knows how to catch a squirrel? 😄 [11:12:11] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudservices2004-dev.codfw.wmnet wi... [11:52:58] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudservices2004-dev.codfw.wmne... [12:35:57] 10netops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Jclark-ctr) @ayounsi removed 8 cables. deleted from netbox [12:41:04] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi) [12:41:11] 10netops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) 05Open→03Resolved Awesome, thanks! [12:44:44] (SystemdUnitFailed) firing: wmf_auto_restart_krb5-kpropd.service Failed on krb2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:47:35] jobo: make friends with it is by far the best approach :) [12:52:30] Her name is Susie, she lives with us now. [13:22:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) I will be working with @Clement_Goubert today at 10am CT to relocate those mw nodes. [14:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [14:13:41] 10SRE-tools, 10Spicerack: Service without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Clement_Goubert) [14:43:07] jbond: what should I do (if anything) to try to report a redirect loop bug with IDP and people.wm.o ? [14:47:29] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: ServiceLVS without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Clement_Goubert) [14:56:43] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Jhancock.wm) [14:56:59] (PuppetDisabled) firing: Puppet disabled on puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [15:04:08] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [15:29:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [15:37:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [15:38:29] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [15:39:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) 05Open→03Resolved This is complete, thanks to @ssingh and @Clement_Goubert [15:44:44] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:54:23] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [15:57:13] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [16:14:44] (SystemdUnitFailed) firing: (3) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:30:49] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [16:44:33] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [16:44:43] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) 05Open→03In progress Note: I started to boostrap the node with instructions from https://wikitech.wikimedia.org/wiki/P... [16:48:54] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) Also, `designate-producer` is complaining about something related to rabbitmq, possibly related to the new IP address: `... [16:52:16] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudservices2004-dev.codfw.wmnet wi... [16:59:36] 10puppet-compiler, 10Infrastructure-Foundations, 10SRE, 10Continuous-Integration-Config, 10Release-Engineering-Team (Seen): Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10hashar) 05Open→03Resolved a:03Legoktm That was implemented by @... [17:01:28] 10puppet-compiler, 10Infrastructure-Foundations, 10SRE, 10Continuous-Integration-Config, 10Release-Engineering-Team (Seen): Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10hashar) (I think that task was left open to have the list of hosts pa... [18:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [18:56:07] 10netops, 10Infrastructure-Foundations: IC-307235 down yet again - https://phabricator.wikimedia.org/T339289 (10CDanis) [18:56:59] (PuppetDisabled) firing: Puppet disabled on puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [19:19:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw:row A/B: rack/cable new switches - https://phabricator.wikimedia.org/T332180 (10Jhancock.wm) [20:14:44] (SystemdUnitFailed) firing: (3) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [22:57:14] (PuppetDisabled) firing: Puppet disabled on puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [23:37:01] 10netops, 10Infrastructure-Foundations, 10ops-codfw: codfw:basic spines/leaves configuration using ZTP - https://phabricator.wikimedia.org/T339315 (10Papaul) [23:37:23] 10netops, 10Infrastructure-Foundations, 10ops-codfw: codfw:basic spines/leaves configuration using ZTP - https://phabricator.wikimedia.org/T339315 (10Papaul) p:05Triage→03Medium [23:52:27] 10netops, 10Infrastructure-Foundations, 10ops-codfw: codfw:basic spines/leaves configuration using ZTP - https://phabricator.wikimedia.org/T339315 (10Papaul)