[01:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:44:29] 10netbox, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9653141 (10Jelto) [08:44:49] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9653145 (10Gehel) [09:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:28:10] 07Puppet, 10Wikidata, 06Wikidata Dev Team, 10wmde-wikidata-tech, and 2 others: Remove the WDCM clone (stats1007) - https://phabricator.wikimedia.org/T351072#9653490 (10Manuel) 05Open→03Stalled [11:07:12] 10netbox, 06DC-Ops, 06Infrastructure-Foundations, 06SRE: sre.hardware.upgrade-firmware cookbook: product slug parsing - https://phabricator.wikimedia.org/T348036#9653584 (10BTullis) I've just had a failure to update firmware for a host and a brief search led me to this issue. The error I got was from an-wo... [11:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:15:01] 10netops, 06Infrastructure-Foundations, 06SRE: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw - https://phabricator.wikimedia.org/T360772 (10cmooney) 03NEW p:05Triage→03Low [13:15:17] 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#9653918 (10cmooney) [13:15:21] 10netops, 06Infrastructure-Foundations, 06SRE: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw - https://phabricator.wikimedia.org/T360772#9653917 (10cmooney) [13:17:37] 10netops, 06Infrastructure-Foundations, 06SRE: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw - https://phabricator.wikimedia.org/T360772#9653941 (10cmooney) [13:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [13:33:09] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776 (10cmooney) 03NEW p:05Triage→03Medium [13:34:41] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654011 (10Papaul) @cmooney what works for you works for me as well [13:35:12] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654012 (10Papaul) [13:35:34] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654013 (10Papaul) [14:07:59] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654083 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=79f10d11-133e-477b-be4d-b326d7e4bcf9) set by cmooney@cumin1002 for 4:00:00... [14:18:19] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654094 (10cmooney) [15:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:29:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789 (10RobH) 03NEW [15:30:15] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9654382 (10RobH) [16:02:12] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9654489 (10cmooney) [16:26:50] 10netops, 06Infrastructure-Foundations, 06SRE: 14Migrate IP gateway for public1-a-codfw to spine switches - 14https://phabricator.wikimedia.org/T351532#9654574 (10cmooney) 05Open→03Resolved [16:27:28] 10netops, 06Infrastructure-Foundations, 06SRE: 14Migrate IP gateway for private1-b-codfw to spine switches - 14https://phabricator.wikimedia.org/T351534#9654580 (10cmooney) 05Open→03Resolved [16:28:35] 10netops, 06Infrastructure-Foundations, 06SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938#9654586 (10cmooney) [16:28:48] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: 14Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - 14https://phabricator.wikimedia.org/T347191#9654584 (10cmooney) 05Open→03Resolved 14Closing this task, everything now completed. For future rows we can b... [16:28:56] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: 14Upgrade new codfw switches to Juniper recommended - 14https://phabricator.wikimedia.org/T341670#9654588 (10cmooney) [16:29:11] 10netops, 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485#9654587 (10cmooney) [16:40:30] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803#9654614 (10cmooney) >>! In T345803#9479281, @Papaul wrote: > @cmooney can we get those 2 hosts back in decom? Thanks @papaul I'm done wit... [16:41:53] 10netops, 06Infrastructure-Foundations, 06SRE: 14Codfw row A/B top-of-rack switch refresh - 14https://phabricator.wikimedia.org/T327938#9654617 (10cmooney) 05Open→03Resolved a:03cmooney 14Closing this one, I've made some notes on wikitech below about how to approach these for future rows. https:/... [16:43:22] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803#9654622 (10cmooney) [17:12:28] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803#9654809 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by cmooney@cumin1002 for hosts: `sretest2003.codfw.wmnet` - sretes... [17:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:01:49] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032#9655151 (10RoySmith) Is there any progress on this? [19:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:43:04] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032#9655375 (10jhathaway) Thanks for the poke @RoySmith, ITS obtained this information from Zendesk on how Zendesk's spam marking system o... [21:26:31] 10Mail, 06Infrastructure-Foundations, 06Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032#9655475 (10RoySmith) @jhathaway thanks for the response. Yes, I agree with you that trying to make zendesk/cloudmark do something it... [21:30:04] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:17:14] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed