[09:37:46] topranks, XioNoX: [sprint week question] is https://phabricator.wikimedia.org/T320566 still worked on/in your radar? [09:40:03] volans: yes [09:40:16] hope the upgrades will fix the issue [09:41:40] ack thx [10:03:04] 10Mail, 10Infrastructure-Foundations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Samwalton9) This just happened again (https://zh.wikipedia.org/wiki/User_talk:Evesiesta/Archives/... [10:23:42] 10netops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10Volans) [10:30:20] topranks, XioNoX: same question of before for T313463 [10:30:21] T313463: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 [10:30:49] sorry to bother, but I'll be pinging people to triage things as part of the sustainability sprint week [10:31:52] 10Mail, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Upgrade Exim to 4.96 - https://phabricator.wikimedia.org/T310836 (10Volans) [10:34:43] same for T297355 too [10:34:44] T297355: Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355 [10:35:10] 10netops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355 (10Volans) [10:42:27] 10netops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Volans) [10:44:19] volans: yes and yes [10:44:28] <3 thx [11:43:15] 10Mail, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Add exim queue size to grafana graph - https://phabricator.wikimedia.org/T275867 (10Volans) The mail dashboard has already a quick display of the queues, I've a... [11:48:06] 10SRE-tools, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Write a cookbook to set a k8s cluster in maintenance mode - https://phabricator.wikimedia.org/T277677 (10Volans) I've spoken with the people involved, and the original request has been me... [11:53:16] 10Mail, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10observability, 10Sustainability (Incident Followup): Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172 (10Volans) [11:54:17] 10Mail, 10Infrastructure-Foundations, 10SRE Observability, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Graph outbound mail volume on per-service or hostgroup level - https://phabricator.wikimedia.org/T197171 (10Volans) [11:59:50] 10Puppet, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Fix the general problem of randomly-bad puppet agent cron timings within redundant clusters - https://phabricator.wikimedia.org/T161145 (10Volans) Although the principle still stands, I... [12:06:18] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, 10Sustainability (Incident Followup): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10jcrespo) > So I think that the original concern has been almo... [12:07:14] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10Volans) 05Open... [12:21:04] 10Puppet, 10Infrastructure-Foundations, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): A puppet run should not start if a box is under abnormal load. - https://phabricator.wikimedia.org/T84183 (10Volans) 05Open→03Invalid Resolving as invalid because is not very well d... [15:12:05] is this circuit still down? https://phabricator.wikimedia.org/T322529 [15:12:14] / worked on? [15:13:12] cc XioNoX [15:14:57] volans: 302 willy [15:18:51] ack thx [15:37:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10Papaul) @cmooney We can move any servers racked from U11 up [16:23:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers to make space for new switches in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [18:54:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) >>! In T327919#8710573, @Papaul wrote: > @cmooney We can move any servers racked from U11 up... [19:57:14] 10Mail, 10Infrastructure-Foundations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10jpxg) This happened with the Signpost today ([[ https://en.wikipedia.org/wiki/Wikipedia_talk:Wiki... [19:58:29] 10Mail, 10Infrastructure-Foundations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10jpxg) p:05High→03Triage [20:33:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10Papaul) @cmooney Please see first batch proposal. We can move all those servers next week. @aborrero ca... [23:34:13] 10Mail, 10Infrastructure-Foundations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Tgr) Ideally this would be fixed in the job runner, but given that's not happening and this exten...