[00:48:41] (PurgedHighEventLag) firing: High event process lag with purged on cp5025:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5025 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [04:48:41] (PurgedHighEventLag) firing: High event process lag with purged on cp5025:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5025 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [07:14:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) Unfortunately that didn't solve it for all switches: asw2-c-eqiad is all good, but A and B are still showing errors. asw2-a-eqiad: fpc1:port: 1/1 - CRC alignment er... [08:48:41] (PurgedHighEventLag) firing: High event process lag with purged on cp5025:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5025 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:18:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, 10Sustainability (Incident Followup): Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi) I didn't proceed with the upgrade as there was errors. I opened a JTAC case 2023-0109-616616 with: > Hi, > I'm trying to... [11:48:41] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp5025:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5025 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:51:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, 10Sustainability (Incident Followup): Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi) JTAC replied with (I cherry picked the useful info): > As JTAC we would suggest that you perform a step upgrade in your c... [14:42:59] 10Traffic, 10SRE: Fix LVS "sh" shortcomings - https://phabricator.wikimedia.org/T86651 (10LSobanski) [14:45:12] 10Traffic, 10SRE: Fix LVS "sh" shortcomings - https://phabricator.wikimedia.org/T86651 (10BBlack) [14:59:52] 10netops, 10Infrastructure-Foundations, 10ops-codfw: codfw: Relocate servers racked in U27 in all racks in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) [16:19:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) Replaced and counters cleared. Let's check in a couple days. [16:22:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, 10Sustainability (Incident Followup): Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi) [18:29:56] 10Traffic, 10SRE: libvmod-netmapper: must specify ABI stanza - https://phabricator.wikimedia.org/T266567 (10BCornwall) @Vgutierrez, since you were involved with the libvmod-netmapper upgrades, would you say that this 2-year-old issue is fixed? [18:30:31] 10Traffic, 10SRE: libvmod-netmapper: must specify ABI stanza - https://phabricator.wikimedia.org/T266567 (10BCornwall) Ugh, sorry, meant to put that in T266651 [18:50:17] 10Traffic, 10Performance-Team, 10SRE, 10Performance Issue: en.wiki slow to respond when editing, and occasionally throws an error with Chrome search shortcuts, or blocked because missing HTTPS - https://phabricator.wikimedia.org/T326496 (10MBinder_WMF) I can report today that while the site is still slower... [19:45:25] 10Traffic, 10SRE: oom killed varnish on cp4052 - https://phabricator.wikimedia.org/T325797 (10BBlack) We have the patched package on cp5032 (bullseye). Did some manual testing on it today: * With stock config, can still reproduce the large transient spike by running `hey` with default params against a large... [19:54:48] 10netops, 10Data-Services, 10Infrastructure-Foundations, 10Wikidata, and 5 others: Do not rate limit dumps from internal network - https://phabricator.wikimedia.org/T222349 (10Hokwelum) >>! In T222349#8485437, @bking wrote: > Per above patches and discussion with @akosiaris , we are going to try using the... [20:30:28] 10Traffic, 10SRE, 10Patch-For-Review: Set CORS headers on error pages? - https://phabricator.wikimedia.org/T270526 (10BCornwall) 05Open→03Resolved Thankfully, we have thumbor to give us a test! :D ` [~]$ curl -I 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/ba/Circuit_de_la_Sarthe_track_map.sv... [23:33:28] 10Traffic, 10Fundraising-Backlog, 10SRE: nginx SSL_do_handshake failed - https://phabricator.wikimedia.org/T326601 (10AnnWF) [23:34:07] 10Traffic, 10Fundraising-Backlog, 10SRE: nginx SSL_do_handshake failed - https://phabricator.wikimedia.org/T326601 (10AnnWF)