[00:49:42] (SystemdUnitFailed) resolved: export_smart_data_dump.service Failed on cp4037:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:13:06] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [11:38:17] 10Traffic, 10Commons: After ~750MB file download is cut off - https://phabricator.wikimedia.org/T351876 (10Aklapper) Please add relevant project tags - I'm afraid that it will be hard for the #Commons community to fix this entirely themselves :) T210890 comes to my mind. Confirming from Central Europe: ` [ac... [11:55:22] 10Traffic, 10Commons: After ~750MB file download is cut off - https://phabricator.wikimedia.org/T351876 (10AlexisJazz) >>! In T351876#9354897, @Aklapper wrote: > Please add relevant project tags - I'm afraid that it will be hard for the #Commons community to fix this entirely themselves :) I wasn't sure what... [12:00:12] 10Traffic, 10Commons: After ~1 minute file download is cut off - https://phabricator.wikimedia.org/T351876 (10AlexisJazz) [12:14:01] Just a head's up, ncredir ocsp staple expires in a few hours, but there's no runbook. Is it autorenewed/refreshed? [12:21:02] claime: :? [12:21:33] 10Traffic, 10Commons: After ~1 minute file download is cut off - https://phabricator.wikimedia.org/T351876 (10Lucas_Werkmeister_WMDE) I can add another datapoint to it being time-based, on my connection the download finished within less than a minute without any interruption. `lang=shell-session $ time wget "... [12:21:36] vgutierrez: https://alerts.wikimedia.org/?q=%40state%3Dactive&q=instance%21%3Dwdqs2011&q=team%3Dsre&q=alertname%3DHTTPS%20non-canonical-redirect-3 [12:21:53] that's not right [12:22:00] * vgutierrez checking [12:22:06] Thank you :) [12:22:58] Nov 23 06:00:00 acmechief1001 acme-chief-backend[7721]: Refreshing live OCSP response for certificate non-canonical-redirect-6 / ec-prime256v1 [12:22:58] Nov 23 06:00:00 acmechief1001 acme-chief-backend[7721]: live OCSP response refreshed successfully for non-canonical-redirect-6 / ec-prime256v1 [12:23:00] hmmm [12:23:02] weird [12:23:55] It's alerting for -1, -3 and -5 [12:23:59] alseo for -6 [12:24:03] (with a warning) [12:24:23] Oh right [12:26:07] https://www.irccloud.com/pastebin/EojPOGRF/ [12:26:11] that's weird [12:26:59] ncredir has been migrated to Puppet 7 [12:27:17] but it's only impacting ncredir4001? [12:27:33] The last Puppet run was at Tue Nov 21 09:28:23 UTC 2023 (3059 minutes ago). [12:27:36] puppet is broken there [12:27:54] 2023-11-23T12:13:16.980217+00:00 ncredir4001 puppet-agent[3833399]: Creating a new SSL certificate request for ncredir4001.ulsfo.wmnet [12:27:54] 2023-11-23T12:13:16.990535+00:00 ncredir4001 puppet-agent[3833399]: Certificate Request fingerprint (SHA256): 1E:36:D3:6E:7F:04:FE:E6:FA:2A:8C:AF:F3:CF:FE:C3:1A:1B:F3:AE:2A:FF:04:36:12:C9:2E:1A:4C:F4:14:37 [12:27:54] 2023-11-23T12:13:17.405861+00:00 ncredir4001 puppet-agent[3833399]: Certificate for ncredir4001.ulsfo.wmnet has not been signed yet [12:27:58] moritzm: ^^ [12:28:13] moritzm: that seems a problem with ncredir migration to puppet 7 [12:30:32] i've depooled ncredir4001 till puppet is fixed there [12:31:54] ack, thank you [13:53:02] 10Traffic, 10Commons: After ~1 minute file download is cut off - https://phabricator.wikimedia.org/T351876 (10AlexisJazz) >>! In T351876#9354956, @Lucas_Werkmeister_WMDE wrote: > I can add another datapoint to it being time-based, on my connection the download finished within less than a minute without any int... [13:58:20] 10Traffic, 10SRE, 10SRE-swift-storage: Revisit CDN<-->Swift communication - https://phabricator.wikimedia.org/T317616 (10MatthewVernon) 05In progress→03Resolved a:03MatthewVernon I think this is now done - ms clusters default to using envoy (I've not done anything to beta, but it should carry on using... [14:08:15] 10Traffic, 10Commons: After ~1 minute file download is cut off - https://phabricator.wikimedia.org/T351876 (10Lucas_Werkmeister_WMDE) Indeed. Lower bound for the cutoff time is 66 s: `lang=shell-session,lines=7 $ time wget https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_ge... [14:23:16] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) [14:35:10] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate IP gateway for private1-b-codfw to spine switches - https://phabricator.wikimedia.org/T351534 (10cmooney) [15:54:14] 10Traffic, 10Commons: After ~1 minute file download is cut off - https://phabricator.wikimedia.org/T351876 (10AlexisJazz) With limit-rate it doesn't interrupt: ` lang=log $ time wget --limit-rate 10K "https://upload.wikimedia.org/wikipedia/commons/6/64/Gameplay_0_A.D._Alpha_26_Gefecht_gegen_KI_20221106_Teil_01...