[04:31:57] (VarnishTrafficDrop) firing: 68% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [04:36:57] (VarnishTrafficDrop) resolved: 69% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [05:01:34] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Marostegui) I have switched m3-master from dbproxy1020 to dbproxy1016: https://gerrit.wikimedia.org/r/705789 [05:02:06] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Marostegui) [07:05:36] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10MoritzMuehlenhoff) [07:58:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10ayounsi) `lang=diff re0.cr2-eqiad# show | compare [edit interfaces xe-3/2/2 unit 0 family inet filter] + output sample-ac... [08:04:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10ayounsi) Talked to @fgiunchedi on IRC, let us know when to rollback. Ideally before the end of the week so we don't keep "hacks"... [08:11:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10fgiunchedi) Thank you @ayounsi @cmooney ! Could we keep the sampling for a week straight ? I understand if you are not comforta... [08:25:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10fgiunchedi) [08:27:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover, 10User-fgiunchedi: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10fgiunchedi) [08:34:01] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10MoritzMuehlenhoff) [08:35:59] 10netops, 10Infrastructure-Foundations, 10SRE: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) [08:36:16] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) 05Open→03Resolved [08:51:38] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [08:52:47] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:01:11] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10cmooney) [09:01:34] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [09:02:19] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [09:31:37] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10fgiunchedi) In light of longer than expected lead time for new ms-be hardware (T284953) I'd like to explore again the object expi... [10:22:31] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10hnowlan) [10:27:47] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10MoritzMuehlenhoff) [10:28:33] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10MoritzMuehlenhoff) [13:06:05] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10fgiunchedi) [13:06:58] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10fgiunchedi) [13:26:06] 10Traffic, 10SRE: Sudden surge of requests to https://wikipedia.org/ from Telus customers - https://phabricator.wikimedia.org/T276213 (10Aklapper) Four months later: Is this something to still further investigate, or can this be closed? [15:46:34] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10dpifke) Sorry, this kinda dropped off my radar. The object-expirer works (it's been running in deployment-prep with no issues fo... [16:24:09] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10dpifke) Looks like we're already tracking DELETEs, e.g. the second panel in https://grafana-rw.wikimedia.org/d/OPgmB1Eiz/swift?or... [16:44:03] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10Infrastructure-Foundations, 10serviceops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) ACKed some more today, gerrit2001.mgmt, wdqs2002.mgmt [17:52:43] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Automate diff and commit of frack ACL - https://phabricator.wikimedia.org/T260655 (10Jgreen) a:05Jgreen→03None [18:29:52] 10netops, 10Infrastructure-Foundations: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10ayounsi) p:05Triage→03High [18:31:01] 10netops, 10Infrastructure-Foundations: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10ayounsi) [18:37:43] 10netops, 10Infrastructure-Foundations: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10ayounsi) > Case ID 2021-0721-0486 has been created for you. [19:27:52] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10cmooney) First related log I can find referencing FPC (ae interface down logs were before). Jul 21, 2021 @ 17:53:34.000 CMTFPC: Fabric request time out pfe 0 plane 1 pg 0, trying recovery.... [23:06:06] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10Papaul) Email from JTAC ` Please perform a physical re-seat of the card. Remove it and insert it back into the chassis. If this doesn’t work, we’ll proceed with a replacement. `