[03:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [07:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [10:33:19] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [10:40:29] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: puppet7 on cumin breaks database connections - https://phabricator.wikimedia.org/T352974 (10ABran-WMF) >>! In T352974#9446142, @MoritzMuehlenhoff wrote: > One other option is that the TLS toolchain as used by Orchestrator be not... [11:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [13:16:28] volans: I have added you as a reviewer to a patch for Puppet tox, I'd like to eventually switch the CI Docker image being used from tox v3 to tox v4 https://gerrit.wikimedia.org/r/c/operations/puppet/+/977223 ;) [13:21:01] hashar: ack I'll have a look after lunch [13:31:15] volans: thanks! :) [13:57:26] hashar: merged+puppet-merged [13:57:35] youpih! [14:04:53] volans: and after spending more than a few hours on tox v4, I am pretty sure I will look for a replacement to tox :) [14:05:24] version 4 has some broken stuff :-\ [14:05:34] eheheh [14:05:54] like I can't force pass an environment variable like `CI` or `XDG_CACHE_HOME` [14:06:04] TOX_TESTENV_PASSENV is no more supported [14:06:21] or we can no more share an environment directory between tox envs [14:06:33] anyway I am ranting :] [14:06:53] OpenStack moved to nox https://nox.thea.codes/en/stable/index.html [14:11:42] IIRC if CI=1 is set in the environment where tox runs you don't need to pass it over [14:12:03] to have tox detect it, or you meant to pass it over to the tested env [14:15:47] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Netbox: get rid of WMF Production Patches - https://phabricator.wikimedia.org/T310717 (10ayounsi) [14:29:42] volans: I meant passing it to the tested env :) [14:29:58] I need to write a plugin to restore that beahvior [14:30:52] ? [14:30:57] can't you use https://tox.wiki/en/4.11.4/config.html#setenv ? [14:31:32] there is also https://tox.wiki/en/4.11.4/config.html#passenv [14:45:35] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [15:31:22] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [15:48:32] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [15:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [15:55:06] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10Clement_Goubert) Summary of the discussion on the linked CR: - LLDP based logic runs the risk o... [15:57:40] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10JMeybohm) [16:00:20] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10akosiaris) >>! In T352893#9450792, @Clement_Goubert wrote: > Summary of the discussion on the l... [16:09:27] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10cmooney) >>! In T352893#9450792, @Clement_Goubert wrote: > I am left wondering if the fear of L... [16:16:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, and 2 others: Move lvs2014 link to row A and connect to new row A/B vlans - https://phabricator.wikimedia.org/T352758 (10Papaul) @cmooney link moved to ssw1-a8 [16:28:49] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10Volans) I might be missing context, but why we can't get that info from netbox? Extracting it d... [17:06:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Migrate atlas-codfw from asw-a1-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T348159 (10cmooney) [17:09:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [17:09:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Migrate atlas-codfw from asw-a1-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T348159 (10cmooney) 05Open→03Resolved Work completed. Cable moved and irb.2201 added to lsw1-a2-codfw. As no other devices are o... [17:13:42] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10cmooney) >>! In T352893#9450929, @Volans wrote: > I might be missing context, but why we can't... [17:28:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) [17:29:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) Link is now up and BGP has established. ` cmooney@lsw1-a2-codfw> show route receive-protocol bgp 10.192.254.9 table PRODUCTION.inet.0 ters... [19:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk [22:10:41] 10netops, 10Infrastructure-Foundations, 10SRE: Automate BGP peering on MR routers towards core - https://phabricator.wikimedia.org/T354809 (10cmooney) p:05Triage→03Low [23:50:18] (MDRAIDFailedDisk) firing: MD RAID - Failed disk(s) on aqs1013:9100 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering - TODO - https://alerts.wikimedia.org/?q=alertname%3DMDRAIDFailedDisk