[02:21:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q1): Ingest Cron and Root Alerts Into Logstash - https://phabricator.wikimedia.org/T274377 (10lmata) [02:22:09] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: SCS CPU monitoring issue - https://phabricator.wikimedia.org/T285229 (10lmata) [02:25:22] 10Traffic, 10SRE, 10SRE Observability, 10Patch-For-Review: Implement SLI measurement for Varnish Frontend - https://phabricator.wikimedia.org/T284576 (10lmata) [02:28:57] 10Traffic, 10SRE, 10SRE Observability, 10Patch-For-Review: varnishmtail silently stops working if varnishncsa crashes - https://phabricator.wikimedia.org/T259020 (10lmata) [02:29:07] 10Traffic, 10SRE, 10SRE Observability, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Document and/or improve navigation of the various HTTP frontend Grafana dashboards - https://phabricator.wikimedia.org/T253655 (10lmata) [02:33:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: replace check_ripe_atlas Python script with a check_prometheus backed by atlasexporter data - https://phabricator.wikimedia.org/T251155 (10lmata) [02:33:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: add traceroute measurements to RIPE Atlas prometheus data - https://phabricator.wikimedia.org/T251156 (10lmata) [02:39:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: Provision plaintext syslog collectors in esams/ulsfo/eqsin - https://phabricator.wikimedia.org/T243065 (10lmata) [07:32:14] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10elukey) >>! In T286032#7197078, @MoritzMuehlenhoff wrote: > Looking at Ganeti VMs, they broadly fall under three/four categories: >... [07:39:53] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10elukey) [08:04:00] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [08:04:13] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [08:04:23] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10cmooney) [08:04:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) [08:04:43] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [08:05:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) [08:13:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10ArielGlenn) With the new schedule I think I can swap one dumpsdata host and one snapshot host and avoid any impact whatsoever on XMl/SQL dumps.... [08:41:15] 10Traffic, 10SRE, 10Patch-For-Review: LetsEncrypt cert expiration warning for some ncredir names - https://phabricator.wikimedia.org/T286377 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [09:00:03] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10aborrero) >>! In T286065#7194569, @Bstorm wrote: > @aborrero does cloudgw require manual failover? it doesn't require manual failover, but we could... [09:02:23] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Kormat) [09:07:13] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:07:59] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:12:12] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:13:31] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:15:25] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:16:02] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:29:36] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10MoritzMuehlenhoff) [09:39:00] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [09:52:27] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10cmooney) [09:59:41] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [10:26:42] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [10:29:19] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) @BStorm / @aborrero as mentioned on IRC I messed up with the list of servers here, inadvertently including those in the row connected to //cl... [10:33:18] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [10:33:51] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) @Bstorm / @aborrero as mentioned on IRC I messed up with the list of servers here, inadvertently including those in the row connected to... [11:24:06] 10Traffic, 10SRE: Enable UDS support on varnish - https://phabricator.wikimedia.org/T285374 (10Vgutierrez) 05Open→03Resolved [11:24:12] 10Traffic, 10SRE: Test envoyproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T271421 (10Vgutierrez) [13:57:34] 10Traffic: Decomission malmok.wikimedia.org - https://phabricator.wikimedia.org/T286480 (10ssingh) [15:16:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10jijiki) @cmooney should we sent out an email about this to ops@ and possibly add those times/dates to the maintenance calendar? Thank you! [15:27:19] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10hnowlan) [15:45:40] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10MoritzMuehlenhoff) [15:45:58] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10MoritzMuehlenhoff) [15:51:52] sukhe: Moritz raised a question about doh1001 during our planned change on the eqiad switches in row C (planned for Thurs week July 22nd) [15:52:23] He suggested it might need some manual intervention prior, but I was kind of thinking it should be ok without any changes? [15:52:36] topranks: sorry I missed that [15:52:50] If the interruption is very brief, as expected, the impact shouldn't be large, if it lasts any longer BGP will die and it will automatically be depooled? [15:53:05] yes, it should be ok! traffic should go to doh1002 [15:53:20] we can depool it manually as well. was the question in this channel? [15:53:45] I expect the interruption will be too brief for that, but perhaps one or two packets might be dropped in transit. Given how DNS clients behave I don't think that would be a major issue. [15:54:02] yep [15:54:10] It wasn't really a question, Moritz listed it as needing attention on https://phabricator.wikimedia.org/T286065 [15:54:22] But I thought I'd ask as I suspect it will be ok without any action. [15:54:42] thanks for checking but yeah, I think we should be good [15:55:07] Ok cool thanks, I'll mark it as action not needed in that case. [15:55:11] cheers! [15:55:57] thanks <3! [16:48:44] 10Traffic, 10SRE, 10serviceops, 10User-jijiki: Access mwdebug kubernetes deployment via the 'X-Wikimedia-Debug' header - https://phabricator.wikimedia.org/T286491 (10jijiki) [16:48:58] 10Traffic, 10SRE, 10serviceops, 10User-jijiki: Access mwdebug kubernetes deployment via the 'X-Wikimedia-Debug' header - https://phabricator.wikimedia.org/T286491 (10jijiki) [18:55:48] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Dwisehaupt) [18:57:29] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Dwisehaupt) Still need to confirm the window with Advancement, but it is looking ok right now. There will be some work on the FR-Tech side to ensure d... [21:28:18] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Bstorm) >>! In T286065#7205088, @cmooney wrote: > @BStorm / @aborrero as mentioned on IRC I messed up with the list of servers here, inadvertently inc... [21:31:17] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Bstorm) @cmooney Do the cloudsw switches get impacted by row B updates?