[09:06:04] 10serviceops, 10SRE: mcrouter crashing on mwmaint2002 - https://phabricator.wikimedia.org/T288787 (10jijiki) mcrouter in mwmaint2002 is on version 0.37, and I found this: [[ https://github.com/wikimedia/operations-debs-mcrouter/blob/upstream/ProxyDestination.cpp#L341 | ProxyDestination.cpp ]]. There is no poin... [09:30:53] 10serviceops, 10SRE, 10Patch-For-Review: mcrouter crashing on mwmaint2002 - https://phabricator.wikimedia.org/T288787 (10Dzahn) upgrading mwmaint2002 will happen on or after September 13th, the day of the DC switch. [11:56:35] hi, I'm testing a communication problem between certain swift operations and envoy, apparently the bandaid is to set envoy.reloadable_features.strict_1xx_and_204_response_headers=false in envoy, but for the life of me I can't figure out what/where I'm supposed to put in the configuration [11:56:40] any hint/pointers ? [12:11:36] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jelto on cumin1001.eqiad.wmnet for hosts: ` mw1455.eqiad.wmnet ` The log can be found in `/var/log/wmf-... [12:20:10] 10serviceops, 10SRE-swift-storage, 10envoy: Envoy and swift HEAD with 204 response turns into 503 - https://phabricator.wikimedia.org/T288815 (10fgiunchedi) [12:20:16] filed as ^ [12:21:39] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Dzahn) [12:35:14] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1455.eqiad.wmnet'] ` and were **ALL** successful. [13:13:25] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Jelto) [13:29:34] godog: rzl might have an idea [13:30:08] 10serviceops, 10SRE-swift-storage, 10envoy: Envoy and swift HEAD with 204 response turns into 503 - https://phabricator.wikimedia.org/T288815 (10fgiunchedi) [13:30:14] jelto mutante, there was a setback removing mcrouter using our generated ssl certs [13:30:21] I will merge the temp fix on monday [13:30:30] effie: thank you! appreciate it [13:30:50] jelto mutante more info in https://phabricator.wikimedia.org/T288787 [13:38:38] thanks Effie, we were just talking about the certs, we will just create them for now [13:39:10] I will merge the fix on monday, if you can wait [15:43:46] effie: ohhh, I missed that it's a different mcrouter version, that explains it [15:43:49] thanks for looking [15:44:53] it was an unexpected error, and it hit hours after I deployed the change [15:45:12] anyway, it is just delaying the code cleanup part, thanks for reverting it yesterday [15:45:37] 👍 [15:54:01] godog: it looks like that setting only exists as "runtime config," there's not a permanent option -- that means the setting is only there as a temporary backwards-compatibility thing to keep using the old deprecated behavior, it'll go away in a future version [15:54:14] godog: but there is a place ot insert it in the config to set that setting on startup, I'll find it for you [15:59:24] rzl: nice, thank you! yeah I tried to figure that place out myself and failed miserably heh [15:59:41] also I pinged swift upstream because that's arguably the culprit [16:00:20] the bandaid is good enough for me now but obviously doesn't survive a restart [16:01:01] yeah, I think it'll have to be fixed upstream, eventually -- not sure how long Envoy plans to keep this feature flag [16:02:19] oh, not long at all, it's already removed in 1.19 https://github.com/envoyproxy/envoy/issues/14651 [16:02:21] indeed, something else I noticed and I wasn't expecting that after POSTing runtime_modify I kept getting 503s for a little bit, in the order of minutes [16:03:44] interesting, the changelog mentions a rename [16:03:54] https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.19.0 that is [16:04:17] but a bandaid nevertheless [16:04:43] oh you're right, just splitting up the features for incoming and outgoing headers [16:04:47] but, still yeah [16:06:06] we'll see what swift upstream says, it is also possible I'm holding it wrong [16:06:22] I'm signing off, thanks again effie rzl [16:07:05] btw I think we'll need to modify https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/envoyproxy/files/build_envoy_config.py -- we don't currently have anything in the layered_runtime field [16:07:19] but the envoy.yaml should look like this: https://github.com/envoyproxy/envoy/blob/95038feabf260c3937465951d5da603d31ea3bd4/configs/using_deprecated_config.yaml#L65 [16:07:24] I'll post on phab, have a good evening :) [17:05:34] 10serviceops, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) [17:14:03] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Jelto) [18:10:06] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) Note: There is always a delay of 3 seconds before the 500 response is returned. [18:16:34] 10serviceops, 10MW-on-K8s, 10SRE: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 (10Krinkle) [18:27:11] 10serviceops, 10SRE-swift-storage, 10envoy: Envoy and swift HEAD with 204 response turns into 503 - https://phabricator.wikimedia.org/T288815 (10RLazarus) Summarizing the discussion from IRC: - "Permanent" is relative -- it looks like this only exists as a runtime option for temporary backward compatibility... [18:28:46] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10Krinkle) [18:29:32] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) favicon.ico issue is an example of T288848 [18:55:00] 10serviceops, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) [20:55:22] 10serviceops: Fix the php7adm "apcu dump" command - https://phabricator.wikimedia.org/T288866 (10Krinkle) [20:56:02] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10Krinkle)