[07:55:46] 06Traffic, 06SRE: Intermittent access issues to English Wikipedia on desktop/laptop - https://phabricator.wikimedia.org/T402142#11097062 (10Josve05a) > Follow-up from ticket #2025081710002753: > > - OS: Windows 11 > - Browser: Chromium v126.0.6478.251 > - Browser add-ons: uBlock Origin, Shazam, Don't f***... [10:35:56] Hello, we are rolling out a new DSE cluster in codfw would anyone be available to help roll out the LVs changes here https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178834 ? [11:13:52] stevemunene: left two comments on the CR [11:58:28] Thanks vgutierrez updated to reflect the suggestions [13:06:37] stevemunene: so you should merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178834 first. Use conftool to set the realservers as pooled and with the right weights and then you can move forward with the 2nd one [13:50:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [13:52:53] Amir1: ^^ today it looks like ms-fe1019 is upset [13:58:22] yeah it's pretty upset [13:58:26] 09:58:07 <+icinga-wm> PROBLEM - Swift https backend on ms-fe1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:00:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [14:01:21] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [14:01:31] wow [14:03:15] yeah both are unhappy [14:04:30] > END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1019.eqiad.wmnet with OS bullseye [14:04:36] > END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1017.eqiad.wmnet with reason: host reimage [14:06:21] RESOLVED: [2x] FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [14:14:40] 06Traffic, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06MediaWiki-Platform-Team, and 5 others: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11098531 (10Tgr) The patches are merged, and I a... [14:39:12] sukhe: given that we think the maxconn change should be a noop from metrics, any objections to me letting it roll out organically (after quick manual testing on one cp host) ? [14:44:14] vgutierrez: maybe related maybe not, but one of the backends is throwing segfaults left and right, which is probably because the driver/disk is kaput. T402247 [14:44:15] T402247: rsyslog is segfaulting non-stop on ms-be1071 - https://phabricator.wikimedia.org/T402247 [14:44:48] maybe urandom could take a look? I need to go afk for ~five hours [14:44:59] six actually [14:49:40] Amir1: I need maybe 20mins or so before I can be in front of a keyboard, but I’ll have a look asap [14:50:53] 20 minutes is definitely better than six hours so no worries [14:53:22] specially if someone can take a look at ms-be1071's disk, I'd appreciate it. [14:57:26] lots of I/O errors on sdg, should we open a ticket for swapping the disk? [14:59:44] cdanis: none, thanks for checking! [15:00:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:05:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:06:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:07:46] yeah... ms-fe@eqiad is definitely not happy [15:11:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.27:80 @ ms-fe1020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:31:41] ms-fe1020 was also being reimaged (I am not sure why all of these are being but yeah) [15:35:35] yeah.. a reimage triggers that kind of errors [15:35:43] downtime needs some improvement obviously [15:38:08] They are refresh/expansions but they are not set to go to production now. If lvs is picking them up than something is wrong [15:40:15] https://phabricator.wikimedia.org/T401448 [15:40:31] They shouldn't go in rotation right away [15:41:13] Amir1: yeah.. that's right, I don't see those hosts on https://config-master.wikimedia.org/pybal/eqiad/swift