[00:55:57] (EdgeTrafficDrop) firing: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [01:00:57] (EdgeTrafficDrop) resolved: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [04:05:56] (EdgeTrafficDrop) firing: (2) 62% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [04:10:56] (EdgeTrafficDrop) firing: (2) 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [04:15:56] (EdgeTrafficDrop) resolved: (2) 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [08:56:27] 10Traffic, 10Thumbor, 10affects-Kiwix-and-openZIM: Unjustified HTTP 429 responses lead to "endless" Wikipedia scrapes - https://phabricator.wikimedia.org/T304814 (10Krinkle) [09:24:43] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp2033.codfw.wmnet with OS buster [09:54:56] (EdgeTrafficDrop) firing: 68% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [09:59:57] (EdgeTrafficDrop) resolved: 68% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [10:13:14] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp2033.codfw.wmnet with OS buster com... [11:55:32] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp2031.codfw.wmnet with OS buster [12:37:44] 10netops, 10Infrastructure-Foundations, 10SRE: IPv6 BFD Sessions Failing from Bird (Anycast VMs) to Juniper QFX in drmrs - https://phabricator.wikimedia.org/T304501 (10ayounsi) thanks for documenting it, and yes, I fully agree. We have BGP configured to the core-routers loopback in many different locations... [12:38:29] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp2031.codfw.wmnet with OS buster com... [12:56:56] (EdgeTrafficDrop) firing: 60% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [12:57:35] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp2029.codfw.wmnet with OS buster [13:21:57] (EdgeTrafficDrop) resolved: 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [13:40:27] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp2029.codfw.wmnet with OS buster com... [14:36:12] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: Unjustified HTTP 429 responses lead to "endless" Wikipedia scrapes - https://phabricator.wikimedia.org/T304814 (10AntiCompositeNumber) 429 is returned when the thumbnail hits one of four ratelimits (see https://wikitech.wikimedia.org/wiki/Thumbor#Th... [14:37:20] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: MWoffliner scrapes slowed down by Thumbor failure throttling 429s - https://phabricator.wikimedia.org/T304814 (10AntiCompositeNumber) [14:54:37] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: MWoffliner scrapes slowed down by Thumbor failure throttling 429s - https://phabricator.wikimedia.org/T304814 (10AntiCompositeNumber) The actual failure for this thumbnail is ` ImageMagickException: Failed to convert image convert: IDAT: invalid di... [14:57:56] bblack: o/ Any chance you've got bandwidth to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/765485? [15:03:30] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: MWoffliner scrapes slowed down by Thumbor failure throttling 429s - https://phabricator.wikimedia.org/T304814 (10Kelson) >>! In T304814#7810926, @AntiCompositeNumber wrote: > 429 is returned when the thumbnail hits one of four ratelimits (see https:... [15:09:29] phuedx: yeah I have a short window now [15:13:30] phuedx: it's going out now, will take ~30 minutes to spread around the fleet [15:14:05] Oh? Just a regular Puppet run? [15:14:09] yes [15:14:12] Neato [15:14:23] Thanks <3 [15:14:32] I manually ran agent on one node to confirm no major rollout issues, but the rest will go out "naturally" [15:14:35] np [16:27:33] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: MWoffliner scrapes slowed down by Thumbor failure throttling 429s - https://phabricator.wikimedia.org/T304814 (10herron) p:05Triage→03Medium [17:12:17] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) >>! In T300324#7801057, @RLazarus wrote: > Hmm, the 1.21.1 build didn't work out of the box. Running `build-envoy-deb buster future` got me this: > > `... [20:06:57] (EdgeTrafficDrop) firing: 52% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [20:11:57] (EdgeTrafficDrop) resolved: 54% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop [20:26:23] 10Traffic, 10MediaWiki-Stakeholders-Group, 10SRE, 10Wikipedia-Android-App-Backlog, 10Performance-Team (Radar): RFC: API-driven web front-end - https://phabricator.wikimedia.org/T111588 (10cscott) A lot more work was done under {T114542} and eventually several projects were started up. #marvin was the off... [20:41:02] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Jclark-ctr) [20:42:13] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10Jclark-ctr) Racked and cabled updated netbox with connections [21:00:04] 10Traffic, 10SRE, 10Thumbor, 10affects-Kiwix-and-openZIM: MWoffliner scrapes slowed down by Thumbor failure throttling 429s - https://phabricator.wikimedia.org/T304814 (10AntiCompositeNumber) > But, even if we agree with that, what is sure is that it can not be that a random final user, after one request,... [21:19:09] 10Traffic, 10MediaWiki-Stakeholders-Group, 10SRE, 10Wikipedia-Android-App-Backlog, 10Performance-Team (Radar): RFC: API-driven web front-end - https://phabricator.wikimedia.org/T111588 (10Krinkle) A more narrow proposal, driven by specific performance and user experience outcomes, exists at {T140664} as...