[00:54:57] (HAProxyEdgeTrafficDrop) firing: 68% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:59:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:01:16] (VarnishTrafficDrop) firing: Varnish traffic in eqiad has dropped 66.08128093112053% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [05:02:56] (HAProxyEdgeTrafficDrop) firing: 60% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:06:16] (VarnishTrafficDrop) firing: (2) Varnish traffic in eqiad has dropped 55.57954798127524% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [05:07:56] (HAProxyEdgeTrafficDrop) resolved: 63% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:11:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in eqiad has dropped 58.9192945877856% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [06:35:53] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ArielGlenn) Hey @Ottomata it's great to see these hosts moving loser to being in production! One thing I noticed, they are picking... [10:51:47] 10Traffic, 10Performance-Team, 10SRE: Enable HTTP compression for arclamp trace logs - https://phabricator.wikimedia.org/T305783 (10Aklapper) a:05dpifke→03None Removing inactive task assignee (please do so as part of offboarding processes). [10:52:50] 10Traffic, 10Performance-Team, 10SRE: Review socket balancing in ATS/Varnish traffic layers - https://phabricator.wikimedia.org/T248522 (10Aklapper) a:05dpifke→03None Removing inactive task assignee (please do so as part of offboarding processes). [10:53:42] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Aklapper) a:05dpifke→03None Removing inactive task assignee (please do so as part of offboarding processes). [12:15:42] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Ottomata) Wrong andrew, I think you meant to ping @Andrew ? [12:20:06] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ArielGlenn) >>! In T302981#8157062, @Ottomata wrote: > Wrong andrew, I think you meant to ping @Andrew ? Bah, yes I did. Thank you! [12:24:02] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) [12:38:29] vgutierrez: will https://gerrit.wikimedia.org/r/c/operations/puppet/+/819677 go out today still? [12:38:50] yep [12:39:05] actually let me deploy that right now :) [12:39:51] cool. thanks :) [12:42:29] done, give puppet time to deploy it globally [12:50:28] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Andrew) Thanks for the suggestion @ArielGlenn. Those hosts are really not working at all right now (something awful is happening wi... [12:53:30] vgutierrez: i also have a unit test for it in https://gerrit.wikimedia.org/r/c/operations/puppet/+/822715 (didn't want to overload the parent patch since you had already reviewed it) [12:53:45] I'm running that varnishtest locally as we speak :) [12:53:56] 0 tests failed, 0 tests skipped, 33 tests passed looking good :D [12:53:59] weee thank you [13:32:11] 10Traffic, 10MediaWiki-General, 10SRE, 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), 10Patch-For-Review: Roll out query parameter normalization - https://phabricator.wikimedia.org/T314868 (10ori) [13:49:04] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10MatthewVernon) We had a bit of a chat about this today, and thought it worth noting some of the reasons it would be good to actua... [15:25:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Link from lsw1-e1-eqiad to lsw1-f2-eqiad down - https://phabricator.wikimedia.org/T315052 (10Cmjohnson) @cmooney The QSFP28 module for et-o/o/54 on lsw1-f3-eqiad has been replaced. [15:40:53] 10Traffic, 10SRE, 10ops-eqiad: SSH on cp1089.mgmt is flapping - https://phabricator.wikimedia.org/T314951 (10Cmjohnson) 05Open→03Resolved replaced the cable [15:52:13] win 27 [20:31:11] 10Traffic, 10MediaWiki-General, 10SRE, 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), 10Patch-For-Review: Roll out query parameter normalization - https://phabricator.wikimedia.org/T314868 (10ori) [20:56:48] 10Traffic, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10Puppet: Evaluation Error on deployment-cache-text06 puppet run - https://phabricator.wikimedia.org/T315351 (10RhinosF1) p:05Triage→03Unbreak! Hi Traffic, this might be stopping beta coming back up (or a false alarm). Can you take... [20:57:13] Hey, can someone see ^ [21:09:58] I'm seeing 'Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: [401 Unauthorized] 401 Authorization Required

401 Authorization Required


nginx/1.14.2
' instead [21:29:36] ori: no idea what to do about it though [21:29:56] That sounds even more broken [21:50:20] the 'unauthorized' thing is intermittent; the occasional run gets the catalog compile error [21:50:34] so it sounds like if you fix the duplicate def issue you can get a puppet run to complete [21:54:50] oh ori, puppet b0rked stuff, is https://phabricator.wikimedia.org/T315379#8159650 helpful/related ? [22:08:57] (HAProxyEdgeTrafficDrop) firing: (2) 44% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:13:56] (HAProxyEdgeTrafficDrop) resolved: (5) 62% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:36:56] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10andrea.denisse) Hello team, I submitted the following patches for this issue: 1. [[ https... [22:37:45] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10andrea.denisse) [22:54:57] (HAProxyEdgeTrafficDrop) firing: 64% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:59:56] (HAProxyEdgeTrafficDrop) resolved: 66% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:28:52] 10Traffic, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10SRE, 10Puppet: Evaluation Error on deployment-cache-text06 puppet run - https://phabricator.wikimedia.org/T315351 (10TheresNoTime) Introduced by https://gerrit.wikimedia.org/r/c/operations/puppet/+/816806 ? `lang=diff diff --git a...