[00:11:57] (VarnishTrafficDrop) firing: 56% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [00:21:57] (VarnishTrafficDrop) resolved: 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [07:00:51] 10Traffic, 10SRE, 10User-MoritzMuehlenhoff: Unexpected auditd service restart failure - https://phabricator.wikimedia.org/T287266 (10MoritzMuehlenhoff) [08:40:55] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10fgiunchedi) >>! In T211661#7227857, @dpifke wrote: > Looks like we're already tracking DELETEs, e.g. the second panel in https://... [09:19:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover, 10User-fgiunchedi: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10fgiunchedi) [10:36:01] (VarnishTrafficDrop) firing: 55% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [10:40:57] (VarnishTrafficDrop) resolved: (3) 65% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [10:47:43] (VarnishTrafficDrop) firing: (3) 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [10:52:27] (VarnishTrafficDrop) firing: (3) 68% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [10:57:27] (VarnishTrafficDrop) firing: (3) 67% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [11:02:27] (VarnishTrafficDrop) resolved: (3) 67% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [14:40:15] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10Papaul) FPC0 S/N updated in Netbox [14:44:22] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10Papaul) Shipped out faulty line card today. Tracking information below {F34564155} [15:51:18] 10netops, 10Analytics-Clusters, 10Infrastructure-Foundations, 10SRE: Automate ingestion of netflow event stream - https://phabricator.wikimedia.org/T248865 (10Ottomata) [16:05:16] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10MoritzMuehlenhoff) [16:06:23] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10MoritzMuehlenhoff) [16:14:53] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Bstorm) [16:17:08] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Bstorm) Cloud team has decided we have too much in this row, and since breakage is possible if we freeze the cloud intentionally, we are going... [18:56:57] (VarnishTrafficDrop) firing: 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:01:57] (VarnishTrafficDrop) resolved: 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:03:03] (VarnishTrafficDrop) firing: 60% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:07:57] (VarnishTrafficDrop) resolved: 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [23:14:41] 10Traffic, 10SRE, 10Sustainability (Incident Followup): LVS can't handle losing a NIC on eqiad and codfw - https://phabricator.wikimedia.org/T286924 (10Legoktm) p:05Triage→03Medium [ Setting priority as part of clinic duty, please retriage if incorrect ] [23:15:00] 10Traffic, 10DC-Ops, 10SRE, 10Sustainability (Incident Followup): Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 (10Legoktm) p:05Triage→03High [23:15:21] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Sustainability (Incident Followup): lvs2007, lvs2009 and lvs2010 connected to the same row A switch - https://phabricator.wikimedia.org/T286879 (10Legoktm) p:05Triage→03High [23:16:48] 10Traffic, 10SRE, 10observability, 10Sustainability (Incident Followup): Per-country Frontend Traffic dashboards - https://phabricator.wikimedia.org/T286554 (10Legoktm) p:05Triage→03Low [23:17:35] 10Traffic, 10SRE, 10serviceops, 10Patch-For-Review, 10User-jijiki: Access mwdebug kubernetes deployment via the 'X-Wikimedia-Debug' header - https://phabricator.wikimedia.org/T286491 (10Legoktm) p:05Triage→03Medium [23:18:57] 10Traffic, 10SRE, 10serviceops, 10Patch-For-Review, 10User-jijiki: Access mwdebug kubernetes deployment via the 'X-Wikimedia-Debug' header - https://phabricator.wikimedia.org/T286491 (10Legoktm) @jijiki is {T286482} a duplicate of this one? To me it looks like both tasks have basically the same checklist [23:19:03] 10Traffic, 10SRE, 10WikimediaDebug, 10Performance-Team (Radar): Allow ATS to route traffic to mwdebug deployment on kubernetes - https://phabricator.wikimedia.org/T286482 (10Legoktm) p:05Triage→03Medium [23:50:45] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Legoktm) p:05Triage→03Medium [23:51:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover, 10User-fgiunchedi: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10Legoktm) p:05Triage→03Medium [23:53:21] 10Traffic, 10netops, 10DBA, 10Infrastructure-Foundations, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Legoktm) p:05Triage→03Medium