[00:04:55] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:08:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:33:32] (03CR) 10Tim Starling: [C: 03+1] "Approved" [puppet] - 10https://gerrit.wikimedia.org/r/1010590 (https://phabricator.wikimedia.org/T357877) (owner: 10Ahmon Dancy) [00:38:41] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010669 [00:38:43] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010669 (owner: 10TrainBranchBot) [00:41:54] (03PS1) 10Tim Starling: migrateBlocks.php: Fix infinite loop [core] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010571 [00:42:19] (03PS1) 10Tim Starling: migrateBlocks.php: Fix infinite loop [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010572 [01:01:25] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010669 (owner: 10TrainBranchBot) [01:28:21] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tstarling@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010571 (owner: 10Tim Starling) [01:28:24] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tstarling@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010572 (owner: 10Tim Starling) [01:46:47] (03Merged) 10jenkins-bot: migrateBlocks.php: Fix infinite loop [core] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010571 (owner: 10Tim Starling) [01:49:06] (03Merged) 10jenkins-bot: migrateBlocks.php: Fix infinite loop [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010572 (owner: 10Tim Starling) [01:50:23] !log tstarling@deploy2002 Started scap: Backport for [[gerrit:1010571|migrateBlocks.php: Fix infinite loop]], [[gerrit:1010572|migrateBlocks.php: Fix infinite loop]] [01:52:44] !log tstarling@deploy2002 tstarling: Backport for [[gerrit:1010571|migrateBlocks.php: Fix infinite loop]], [[gerrit:1010572|migrateBlocks.php: Fix infinite loop]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [01:55:35] (03CR) 10Tim Starling: "I am here because the modelines failed with an error in vim 8.2. I am not some sort of human modeline validator." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010664 (owner: 10Tim Starling) [01:55:55] !log tstarling@deploy2002 tstarling: Continuing with sync [02:01:34] PROBLEM - Check whether ferm is active by checking the default input chain on mw1464 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [02:02:06] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2046 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [02:07:05] !log tstarling@deploy2002 Finished scap: Backport for [[gerrit:1010571|migrateBlocks.php: Fix infinite loop]], [[gerrit:1010572|migrateBlocks.php: Fix infinite loop]] (duration: 16m 42s) [02:08:31] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:08:38] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:10:13] !log on mwmaint2002: running migrateBlocks.php on all wikis [02:10:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:12:02] (03PS2) 10Tim Starling: Add procps to base images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010690 [02:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:26:49] (03PS3) 10Tim Starling: Fix broken vim modelines [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010664 [02:31:34] RECOVERY - Check whether ferm is active by checking the default input chain on mw1464 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [02:32:06] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2046 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [02:37:14] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:41:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [02:41:44] (SwiftTooManyMediaUploads) firing: Too many codfw mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [02:42:25] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:50:16] PROBLEM - Host asw1-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [02:50:48] PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:50:52] PROBLEM - OSPF status on cr3-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:51:02] PROBLEM - Host ps1-604-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [02:51:14] PROBLEM - Host ps1-603-eqsin is DOWN: PING CRITICAL - Packet loss = 100% [02:51:24] PROBLEM - Host mr1-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [02:57:14] (JobUnavailable) firing: (4) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:07:56] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: CRITICAL - Host Unreachable (2403:b100:3001:9::2) [03:11:45] (SwiftTooManyMediaUploads) resolved: Too many codfw mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [03:35:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:35:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:41:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:42:25] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:49:50] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:50:04] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:54:07] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:54:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:08:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:45] (03CR) 10Tim Starling: "I updated the commit message to clarify that this is a real error, not a quibble about the spec." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010664 (owner: 10Tim Starling) [04:34:02] (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2024-03-12-113634-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010508 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [04:34:55] (03Merged) 10jenkins-bot: Update cxserver to 2024-03-12-113634-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010508 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [04:37:52] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [04:38:17] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [04:44:52] !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply [04:45:24] !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply [04:48:52] !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply [04:49:27] !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [04:50:27] !log Updated cxserver to 2024-03-12-113634-production (T350773, T359525) [04:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:50:32] T350773: Remove preq and use node fetch - https://phabricator.wikimedia.org/T350773 [04:50:33] T359525: MinT: Translation with MinT/Apertium are failing: fetch failed - https://phabricator.wikimedia.org/T359525 [06:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T0600) [06:08:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:12:11] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:12:18] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:13:55] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:21:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:35:24] RECOVERY - Host ps1-603-eqsin is UP: PING OK - Packet loss = 0%, RTA = 223.47 ms [06:35:24] RECOVERY - Host ps1-604-eqsin is UP: PING OK - Packet loss = 0%, RTA = 223.30 ms [06:35:26] RECOVERY - Host asw1-eqsin is UP: PING OK - Packet loss = 0%, RTA = 224.43 ms [06:35:28] RECOVERY - OSPF status on cr3-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:35:38] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 237.18 ms [06:36:16] RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:37:14] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:39:58] RECOVERY - Host mr1-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 222.96 ms [06:47:39] 10SRE-Access-Requests: Not able to access Gerrit - https://phabricator.wikimedia.org/T360006 (10cchen) 03NEW [07:00:05] Amir1 and Urbanecm: Time to do the UTC morning backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T0700). [07:00:05] No Gerrit patches in the queue for this window AFAICS. [07:00:40] 06SRE, 10SRE-Access-Requests: Not able to access Gerrit - https://phabricator.wikimedia.org/T360006#9625620 (10RhinosF1) [07:01:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [07:08:36] (03CR) 10Volans: "And the actual solution is Id41ff6d1ce41447a315dee2886771b382eaaebdc, that is pending some additional verification/testing to ensure is no" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1010507 (owner: 10Arturo Borrero Gonzalez) [07:10:41] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:10:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:24:19] (03CR) 10Marostegui: [C: 03+2] admin: add rkhan to group 'restricted' (mwmaint access) [puppet] - 10https://gerrit.wikimedia.org/r/1010514 (https://phabricator.wikimedia.org/T359490) (owner: 10Dzahn) [07:24:49] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to mwmaint for rkhan / Himejijo - https://phabricator.wikimedia.org/T359490#9625647 (10Marostegui) [07:25:40] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: 14Requesting access to mwmaint for rkhan / Himejijo - 14https://phabricator.wikimedia.org/T359490#9625649 (10Marostegui) 05Open→03Resolved 14This has been done. Give it 30-45 minutes for the change to get spread across the infra. [07:35:44] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:35:50] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:46:02] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:46:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:49:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:49:33] OH I got it [07:49:36] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:49:41] I know why I was confused yesterday for the train deployment [07:50:02] all the deployments slots have shifted one hour earlier cause they are defined relatively to a USA time zone [07:50:14] but are intended for non-US consumption [07:50:15] fun [07:50:36] "when=2024-03-13 01:00 SF" [07:50:37] aka PST [07:53:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:53:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:59:36] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:59:43] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:59:47] 06SRE, 10SRE-Access-Requests, 10Gerrit: Not able to access Gerrit - https://phabricator.wikimedia.org/T360006#9625673 (10Peachey88) [08:00:04] hashar and jnuche: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T0800) [08:01:32] jouncebot: refresh [08:01:32] I refreshed my knowledge about deployments. [08:01:35] jouncebot: now [08:01:36] For the next 0 hour(s) and 58 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T0800) [08:03:00] !log Moved "UTC morning backport window" and "MediaWiki train" deployment windows from PST to UTC effectively shifting them one hour later. That is due to daylight saving time kicking it at different time. With the change, the windows are now at their usual time relatively to CET [08:03:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:50] urbanecm: Amir1: jnuche: I have moved the backport window to 9:00 CET and the MediaWiki train at 10:00 CET [08:04:19] the deployment calendar has them defined with San Francisco timezone so they ended up one hour earlier this week [08:04:29] and that explains why I was confused yesterday [08:07:42] ack [08:20:21] 06SRE, 10SRE-Access-Requests, 10Gerrit: Not able to access Gerrit - https://phabricator.wikimedia.org/T360006#9625691 (10hashar) Please please give your login account next time which saves a bit of time figuring out which ones has to be diagnosed. Eventually I found it via the parent task on T356645#9522774... [08:20:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:20:31] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:29:49] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:29:55] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:30:52] (03PS1) 10JHathaway: jhathaway: update dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/1010848 [08:31:53] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:31:59] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:32:49] 06SRE, 10SRE-Access-Requests, 10Gerrit: 14Not able to access Gerrit - 14https://phabricator.wikimedia.org/T360006#9625700 (10hashar) 05Open→03Resolved a:03hashar 14I have manually reactivated the account, you should be able to login now :) [08:33:23] (03CR) 10JHathaway: [C: 03+2] jhathaway: update dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/1010848 (owner: 10JHathaway) [08:34:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:34:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:38:02] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:38:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:45:06] !log Disable GTID on eqiad es5 master T358199 [08:45:09] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:12] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [08:45:16] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:47:50] !log Disable GTID on eqiad es4 master T358199 [08:47:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:03] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:53:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:57:13] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:57:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:00:04] hashar and jnuche: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T0900) [09:01:11] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:01:18] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:04:21] I will run the train in a few minutes [09:04:26] I am finishing filing a bug report [09:15:01] 06SRE, 10SRE-Access-Requests, 10Gerrit: 14Not able to access Gerrit - 14https://phabricator.wikimedia.org/T360006#9625758 (10hashar) 14To address the root cause of the Gerrit user not being reactivated when the accounts is unblocked from Wikitech, I have reopened a past task and posted my analysis on it... [09:15:04] ok train [09:16:04] !log isaranto@deploy2002 helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [09:16:23] (03PS1) 10TrainBranchBot: group1 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010849 (https://phabricator.wikimedia.org/T354440) [09:16:26] (03CR) 10TrainBranchBot: [C: 03+2] group1 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010849 (https://phabricator.wikimedia.org/T354440) (owner: 10TrainBranchBot) [09:17:10] !log isaranto@deploy2002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [09:17:14] (03Merged) 10jenkins-bot: group1 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010849 (https://phabricator.wikimedia.org/T354440) (owner: 10TrainBranchBot) [09:23:01] (03CR) 10Ilias Sarantopoulos: httpbb: add ores-legacy tests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1010245 (https://phabricator.wikimedia.org/T359871) (owner: 10Ilias Sarantopoulos) [09:30:13] !log hashar@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.22 refs T354440 [09:30:19] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [09:31:06] * hashar whistles [09:32:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.139s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [09:37:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.139s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [09:42:32] !log hashar@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.22 refs T354440 (duration: 12m 18s) [09:42:39] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [09:44:06] did the train just hit Wikidata? [09:44:13] yese [09:44:20] common.js stopped loading [09:44:22] for several users [09:45:36] filing a bug now [09:45:37] site common.js or their user common.js? [09:45:39] ok [09:46:14] user common.js [09:46:37] T360014 extremely bare bones, will add more [09:46:38] T360014: common.js not loading on Wikidata - https://phabricator.wikimedia.org/T360014 [09:47:36] ok, can confirm I think [09:49:00] i boldly set that to UBN [09:49:50] what projects should i add to that task? [10:00:17] should the train be rolled back? [10:01:53] o/ [10:03:52] IMHO yes [10:04:06] the error seems easy enough to reproduce locally, so I don’t think we need to keep the site broken to investigate it [10:04:11] let me commute to my office (which is 30 seconds away) [10:04:15] and user JS is a pretty important feature [10:09:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:09:31] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:10:17] Jhs: Lucas_WMDE: there was another bug in that code yesterday which got fixed by matmarex [10:10:24] i am rolling back [10:11:24] I guess the issue can be reproduced on a group0 wiki meanwhile (eg mediawiki.org or test.wikipedia.org ) [10:14:10] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:15:12] Jhs: thank you for the bug report and flagging it for attention! :) [10:15:28] hashar, np :) [10:15:55] it takes a bit of time to rollback nowadays unfortunately, but the old versions will be rolled in a few minutes [10:16:27] I think we end up doing a full deployment rather than rolling back [10:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:17:59] fwiw i can reproduce on mediawiki.org too [10:19:00] my guess is that is less disruptive there [10:19:05] compared to commons/wikidata [10:21:42] yeah [10:26:36] !log hashar@deploy2002 rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.42.0-wmf.22" - T354440 T360014 [10:26:42] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [10:26:42] T360014: User JavaScript not loading on 1.42.0-wmf.22 (Wikidata, Commons, other group1 wikis) - https://phabricator.wikimedia.org/T360014 [10:37:14] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:39:14] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:39:20] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:43:03] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:43:10] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:47:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:47:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:51:54] lunch & [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1100) [11:03:55] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:04:02] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:12:28] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:12:35] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:15:32] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:15:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:17:45] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:17:52] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:22:47] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:22:54] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:26:59] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:27:06] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:27:59] (03PS1) 10Hashar: wikitech: allow unblocking inactive accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010853 (https://phabricator.wikimedia.org/T307558) [11:28:35] (03CR) 10Hashar: "That is a minor follow up to the otherwise excellent *wikitech: Update Gerrit blocking logic* change ( https://gerrit.wikimedia.org/r/c/op" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010853 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [11:31:24] (03PS1) 10Hashar: Revert "group1 wikis to 1.42.0-wmf.22" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010854 (https://phabricator.wikimedia.org/T354440) [11:31:32] (03CR) 10Hashar: [C: 03+2] Revert "group1 wikis to 1.42.0-wmf.22" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010854 (https://phabricator.wikimedia.org/T354440) (owner: 10Hashar) [11:31:40] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.42.0-wmf.22" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010854 (https://phabricator.wikimedia.org/T354440) (owner: 10Hashar) [11:33:50] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:33:56] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:38:35] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:38:42] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:41:53] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:41:59] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:43:27] (03PS1) 10Dreamy Jazz: Use iterator_to_array when calling ::assertCount [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010575 (https://phabricator.wikimedia.org/T360017) [11:43:43] (03PS1) 10Dreamy Jazz: Use iterator_to_array when calling ::assertCount [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010576 (https://phabricator.wikimedia.org/T360017) [11:44:24] (03PS1) 10Dreamy Jazz: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010577 (https://phabricator.wikimedia.org/T359979) [11:44:32] (03PS1) 10Dreamy Jazz: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010578 (https://phabricator.wikimedia.org/T359979) [11:44:48] (03PS2) 10Dreamy Jazz: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010577 (https://phabricator.wikimedia.org/T359979) [11:44:56] (03PS2) 10Dreamy Jazz: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010578 (https://phabricator.wikimedia.org/T359979) [11:44:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:45:05] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:45:20] (03CR) 10Klausman: [C: 03+1] Remove unecessary regexes from Lift Wing metrics [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010193 (owner: 10Elukey) [11:45:44] (03CR) 10Klausman: [C: 03+1] Add Dragonfly 2p2 cache to ml-serve k8s [puppet] - 10https://gerrit.wikimedia.org/r/1010535 (https://phabricator.wikimedia.org/T359416) (owner: 10Elukey) [11:47:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:47:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:47:53] (03Abandoned) 10Arturo Borrero Gonzalez: spicerack: avoid pyyaml 5.4.1 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1010507 (owner: 10Arturo Borrero Gonzalez) [11:50:38] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:50:45] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:52:16] (03PS7) 10Majavah: P:toolforge::proxy: drop grid engine dynamicproxy support [puppet] - 10https://gerrit.wikimedia.org/r/1010503 (https://phabricator.wikimedia.org/T314664) [11:52:24] (03PS7) 10Majavah: dynamicproxy: cleanup after removing toolforge support [puppet] - 10https://gerrit.wikimedia.org/r/1010504 (https://phabricator.wikimedia.org/T314664) [11:52:32] (03PS8) 10Majavah: dynamicproxy: add support for per-project zones [puppet] - 10https://gerrit.wikimedia.org/r/1010505 (https://phabricator.wikimedia.org/T342398) [11:52:40] (03PS10) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:52:48] (03PS6) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:52:49] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:52:56] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:53:28] (03CR) 10Daniel Kinzler: "I guess this was lost, is it still relevant?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/937090 (owner: 10Alexandros Kosiaris) [11:53:44] (03PS11) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:53:52] (03PS7) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:55:10] (03PS12) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:55:18] (03PS8) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:55:34] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:55:41] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:07:17] (03PS13) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [12:07:25] (03PS9) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [12:08:34] (03PS2) 10KartikMistry: Enable Content/Section translation on some Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010226 (https://phabricator.wikimedia.org/T353510) [12:13:55] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:14:39] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:14:46] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:17:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:24:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:24:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:27:00] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:27:07] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:42:14] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:42:21] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:49:34] (03PS1) 10Majavah: Remove labtesttoolsadmin [dns] - 10https://gerrit.wikimedia.org/r/1010884 [12:51:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:51:21] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:52:07] (03PS4) 10GergesShamon: Set ShowRollbackConfirmationDefaultUserOptions on arwiki to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010645 (https://phabricator.wikimedia.org/T355213) [12:52:40] Hi [12:54:46] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:54:53] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:57:12] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:57:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:00:05] RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1300). [13:00:05] mo_abualruz, _Gerges, and Dreamy_Jazz: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:13] \o [13:00:16] I can self-deploy [13:00:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:00:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:00:32] Hi [13:01:38] mo_abualruz: You around? [13:02:03] (03PS1) 10Majavah: hieradata: remove SGE nodes from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1010887 [13:02:03] Hello I am around I want to deploy the ticket please [13:02:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:02:45] Dreamy_Jazz: I think you can go ahead [13:02:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:02:52] * Lucas_WMDE hasn’t had lunch yet and hopes someone else can also deploy the other changes [13:03:03] (03CR) 10Dom Walden: [C: 03+1] Switch block schema to read-new/write-both mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) (owner: 10Tim Starling) [13:03:16] I can deploy [13:03:33] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1636/console" [puppet] - 10https://gerrit.wikimedia.org/r/1010887 (owner: 10Majavah) [13:03:54] I will deploy 1010698 [13:04:02] Okay [13:04:09] Do you want to go ahead now then [13:04:28] Thanks 🥰 [13:04:35] yea [13:04:42] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by mabualruz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010698 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson) [13:05:25] (03Merged) 10jenkins-bot: Disable night mode on history pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010698 (https://phabricator.wikimedia.org/T359183) (owner: 10Jdlrobson) [13:06:07] !log mabualruz@deploy2002 Started scap: Backport for [[gerrit:1010698|Disable night mode on history pages (T359183)]] [13:06:24] T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183 [13:08:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:08:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:08:33] !log mabualruz@deploy2002 jdlrobson and mabualruz: Backport for [[gerrit:1010698|Disable night mode on history pages (T359183)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:09:25] I'm busy right now, please postpone my deploy for half an hour [13:09:29] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1010645/4 [13:09:31] Checking teh changes with mwdebug [13:09:51] Gerges: Okay. I will proceed with my backports before your change then. [13:10:10] (after this change has been deployed ofc) [13:11:16] All good proceeding [13:11:19] !log mabualruz@deploy2002 jdlrobson and mabualruz: Continuing with sync [13:12:09] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010676 [13:12:15] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010677 [13:12:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:12:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:13:01] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1010887 (owner: 10Majavah) [13:14:28] (03PS1) 10Majavah: Archive repository [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/1010891 (https://phabricator.wikimedia.org/T359935) [13:14:35] (03CR) 10Majavah: [V: 03+1 C: 03+2] hieradata: remove SGE nodes from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1010887 (owner: 10Majavah) [13:14:38] (03CR) 10CI reject: [V: 04-1] Archive repository [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/1010891 (https://phabricator.wikimedia.org/T359935) (owner: 10Majavah) [13:15:10] (03CR) 10Majavah: [V: 03+2 C: 03+2] Archive repository [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/1010891 (https://phabricator.wikimedia.org/T359935) (owner: 10Majavah) [13:15:34] (03CR) 10CI reject: [V: 04-1] Archive repository [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/1010891 (https://phabricator.wikimedia.org/T359935) (owner: 10Majavah) [13:17:09] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:17:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:21:55] !log mabualruz@deploy2002 Finished scap: Backport for [[gerrit:1010698|Disable night mode on history pages (T359183)]] (duration: 15m 48s) [13:22:00] T359183: Exclude non-functional pages from night mode - https://phabricator.wikimedia.org/T359183 [13:22:17] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010575 (https://phabricator.wikimedia.org/T360017) (owner: 10Dreamy Jazz) [13:22:18] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010576 (https://phabricator.wikimedia.org/T360017) (owner: 10Dreamy Jazz) [13:24:25] Hi [13:24:43] (03Merged) 10jenkins-bot: Use iterator_to_array when calling ::assertCount [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010575 (https://phabricator.wikimedia.org/T360017) (owner: 10Dreamy Jazz) [13:24:49] (03Merged) 10jenkins-bot: Use iterator_to_array when calling ::assertCount [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010576 (https://phabricator.wikimedia.org/T360017) (owner: 10Dreamy Jazz) [13:25:04] thanks all the back port of 1010698 is done and changes are checked [13:25:11] 👍 [13:25:17] !log dreamyjazz@deploy2002 Started scap: Backport for [[gerrit:1010575|Use iterator_to_array when calling ::assertCount (T360017)]], [[gerrit:1010576|Use iterator_to_array when calling ::assertCount (T360017)]] [13:25:22] T360017: MediaModeration CI failing with "Passing an argument of type Generator for the $haystack parameter is deprecated." - https://phabricator.wikimedia.org/T360017 [13:26:10] Gerges: Are you back now? [13:26:14] Yes [13:26:28] Okay. I'll wait until these backports are done and then proceed with your change. [13:26:35] (03PS8) 10Majavah: P:toolforge::proxy: drop grid engine dynamicproxy support [puppet] - 10https://gerrit.wikimedia.org/r/1010503 (https://phabricator.wikimedia.org/T314664) [13:26:36] (03PS1) 10Majavah: Remove Toolforge grid engine [puppet] - 10https://gerrit.wikimedia.org/r/1010892 (https://phabricator.wikimedia.org/T314664) [13:27:35] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1010575|Use iterator_to_array when calling ::assertCount (T360017)]], [[gerrit:1010576|Use iterator_to_array when calling ::assertCount (T360017)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:27:38] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [13:28:19] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1637/co" [puppet] - 10https://gerrit.wikimedia.org/r/1010892 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [13:29:47] !log Disable GTID on eqiad s8 master T358199 [13:29:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:51] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [13:30:27] 06SRE, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029 (10Marostegui) 03NEW [13:31:57] !log Disable GTID on eqiad s7 master T358199 [13:32:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:05] !log Disable GTID on eqiad s6 master T358199 [13:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:22] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [13:36:46] !log Disable GTID on eqiad s5 master T358199 [13:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:54] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:37:01] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:38:18] !log dreamyjazz@deploy2002 Finished scap: Backport for [[gerrit:1010575|Use iterator_to_array when calling ::assertCount (T360017)]], [[gerrit:1010576|Use iterator_to_array when calling ::assertCount (T360017)]] (duration: 13m 01s) [13:38:26] T360017: MediaModeration CI failing with "Passing an argument of type Generator for the $haystack parameter is deprecated." - https://phabricator.wikimedia.org/T360017 [13:39:09] !log Disable GTID on eqiad s4 master T358199 [13:39:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:31] (03CR) 10Elukey: [V: 03+1 C: 03+2] Add Dragonfly 2p2 cache to ml-serve k8s [puppet] - 10https://gerrit.wikimedia.org/r/1010535 (https://phabricator.wikimedia.org/T359416) (owner: 10Elukey) [13:40:08] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010645 (https://phabricator.wikimedia.org/T355213) (owner: 10GergesShamon) [13:40:31] !log Disable GTID on eqiad s3 master T358199 [13:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:45] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [13:41:27] (03Merged) 10jenkins-bot: Set ShowRollbackConfirmationDefaultUserOptions on arwiki to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010645 (https://phabricator.wikimedia.org/T355213) (owner: 10GergesShamon) [13:41:50] !log dreamyjazz@deploy2002 Started scap: Backport for [[gerrit:1010645|Set ShowRollbackConfirmationDefaultUserOptions on arwiki to false (T355213)]] [13:41:54] T355213: Set ShowRollbackConfirmationDefaultUserOptions on arwiki to true - https://phabricator.wikimedia.org/T355213 [13:43:06] !log Disable GTID on eqiad s2 master T358199 [13:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:09] !log dreamyjazz@deploy2002 dreamyjazz and gergesshamon: Backport for [[gerrit:1010645|Set ShowRollbackConfirmationDefaultUserOptions on arwiki to false (T355213)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:44:43] Gerges: Please test [13:44:58] Ok [13:45:18] !log jclark@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1246.eqiad.wmnet [13:45:18] !log Disable GTID on eqiad s1 master T358199 [13:45:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:36] !log jclark@cumin1002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1246.eqiad.wmnet [13:46:06] !log jclark@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1246.eqiad.wmnet [13:46:38] !log jclark@cumin1002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1246.eqiad.wmnet [13:46:47] !log jclark@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1246.eqiad.wmnet [13:46:49] Dreamy_Jazz: done [13:46:52] Thanks. [13:46:54] !log jclark@cumin1002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1246.eqiad.wmnet [13:46:55] !log dreamyjazz@deploy2002 dreamyjazz and gergesshamon: Continuing with sync [13:47:34] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:47:41] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:48:58] Dreamy_Jazz: Are there any changes published on the main server? [13:49:09] It should be done soon. [13:49:11] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626616 (10Marostegui) For what is worth, this error is also present on the HW logs, even though it is a month old, it might be an indication of something el... [13:49:37] Okey [13:51:18] Thanks [13:51:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:51:35] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:52:18] (03CR) 10Herron: [C: 03+1] Remove unecessary regexes from Lift Wing metrics [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/1010193 (owner: 10Elukey) [13:52:47] !log Stopping MediaModeration scanning scripts for backport [13:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:23] !log dreamyjazz@deploy2002 Finished scap: Backport for [[gerrit:1010645|Set ShowRollbackConfirmationDefaultUserOptions on arwiki to false (T355213)]] (duration: 15m 33s) [13:57:28] T355213: Set ShowRollbackConfirmationDefaultUserOptions on arwiki to true - https://phabricator.wikimedia.org/T355213 [13:57:42] Gerges: This should be deployed to all wikis now. [13:57:47] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1010503 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [13:58:05] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010577 (https://phabricator.wikimedia.org/T359979) (owner: 10Dreamy Jazz) [13:58:06] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010578 (https://phabricator.wikimedia.org/T359979) (owner: 10Dreamy Jazz) [13:58:08] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1010892 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [14:00:05] Deploy window Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1400) [14:00:35] (03Merged) 10jenkins-bot: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1010577 (https://phabricator.wikimedia.org/T359979) (owner: 10Dreamy Jazz) [14:00:38] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626678 (10Marostegui) I've managed to get it boot up past the grub and it looks storage related: ` Starting default.target [85598.425324] XFS (dm-0): Metada... [14:00:51] (03Merged) 10jenkins-bot: Use wgCanonicalServer instead of wgSitename in intro text of email [extensions/MediaModeration] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010578 (https://phabricator.wikimedia.org/T359979) (owner: 10Dreamy Jazz) [14:01:16] !log dreamyjazz@deploy2002 Started scap: Backport for [[gerrit:1010577|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]], [[gerrit:1010578|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]] [14:01:24] T359979: Use wgCanonicalServer in notification message - https://phabricator.wikimedia.org/T359979 [14:03:45] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1010577|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]], [[gerrit:1010578|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:05:35] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [14:10:24] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: HW issues [14:10:26] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: HW issues [14:10:28] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626739 (10wiki_willy) ++ @VRiley-WMF & @Jclark-ctr for troubleshooting the hardware. (host was installed a few quarters ago) [14:11:03] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1061 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:13:17] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626774 (10Jclark-ctr) Followed dell troubleshooting steps. updated firmware for Bios ,idrac already most recent multiple bios firmwares versions have come... [14:16:13] !log dreamyjazz@deploy2002 Finished scap: Backport for [[gerrit:1010577|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]], [[gerrit:1010578|Use wgCanonicalServer instead of wgSitename in intro text of email (T359979)]] (duration: 14m 56s) [14:16:17] T359979: Use wgCanonicalServer in notification message - https://phabricator.wikimedia.org/T359979 [14:16:24] !log Afternoon UTC backport window done [14:16:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:17:35] !log Starting MediaModeration scanning scripts on group2 wikis and commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration [14:17:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:31] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626814 (10Marostegui) Rebooting the server resulted in the same XFS errors and the OS doesn't boot past mounting the filesystem. I am going to reimage it in... [14:33:02] !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm [14:33:07] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626818 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1246.eqiad.wmnet with OS bookworm [14:37:14] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:38:39] RECOVERY - SSH on db1246 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [14:40:22] (03PS1) 10Marostegui: installserver: Format /srv in db1246 [puppet] - 10https://gerrit.wikimedia.org/r/1010896 (https://phabricator.wikimedia.org/T359940) [14:41:03] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1061 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:44:31] (03CR) 10Marostegui: [C: 03+2] installserver: Format /srv in db1246 [puppet] - 10https://gerrit.wikimedia.org/r/1010896 (https://phabricator.wikimedia.org/T359940) (owner: 10Marostegui) [14:49:03] !log marostegui@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1246.eqiad.wmnet with OS bookworm [14:49:11] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626872 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1246.eqiad.wmnet with OS... [14:49:34] !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm [14:49:43] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626873 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1246.eqiad.wmnet wit... [14:54:05] back [14:54:26] jouncebot: nowandnext [14:54:26] For the next 0 hour(s) and 5 minute(s): Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1400) [14:54:26] In 2 hour(s) and 5 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1700) [14:54:48] nice… then I’ll probably backport https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1010856 in 5-10 minutes [14:56:03] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1182 to clone db1246', diff saved to https://phabricator.wikimedia.org/P58778 and previous config saved to /var/cache/conftool/dbconfig/20240313-145603-marostegui.json [14:56:36] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Cloning db1246 [14:56:50] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Cloning db1246 [14:57:14] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:59:03] Lucas_WMDE: TheresNoTime: for the resource loader WikiModule , I will apply the patch and roll the train [14:59:35] sounds good to me [14:59:36] the test on https://gerrit.wikimedia.org/r/1010871 can probably benefit some additional review ;) [14:59:43] that looks like a lot of lines for a test function [14:59:45] hashar: I was about to backport it [14:59:49] \o/ [14:59:56] (and test on https://www.mediawiki.org/wiki/User:Lucas_Werkmeister_(WMDE)/common.js) [15:00:03] jouncebot: now [15:00:04] No deployments scheduled for the next 1 hour(s) and 59 minute(s) [15:00:16] should I do it now or leave it to you? [15:01:23] well, I can kick off the gate-and-submit already I guess [15:01:39] let me know if I should skip the scap backport :) [15:01:56] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage [15:01:59] wikibugs: ping? [15:02:16] pong [15:02:40] hashar: ah, I didn’t see you +2ed it already ^^ [15:02:58] yeah we can do it [15:03:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 47.52% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:03:21] we are having our product & technology staff meeting right now (which is like half of the foundation), but I can follow in parallel [15:03:43] Lucas_WMDE: T360038 [15:03:44] T360038: AttributeError: 'GerritMessageBuilder' object has no attribute 'esc' - https://phabricator.wikimedia.org/T360038 [15:04:01] ah, sorry, I just restarted it [15:04:23] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage [15:08:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 47.52% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:10:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 48.29% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:11:49] hm, https://grafana.wikimedia.org/d/U7JT--knk/mediawiki-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus%2Fk8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All&from=now-7d&to=now&refresh=5m looks a bit concerning since this morning UTC [15:12:03] but not bad enough to block a deploy, I think [15:14:58] Lucas_WMDE: I think I have seen something about raising the 50% threshold [15:15:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 48.29% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [15:15:23] there are other namespaces alerting in a similar way [15:15:45] yeah, the jobrunners one looked even worse in grafana [15:16:40] yup [15:19:31] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:1010581|WikiModule: Fix pages merging (T360014)]] [15:19:37] T360014: User JavaScript not loading on 1.42.0-wmf.22 (Wikidata, Commons, other group1 wikis) - https://phabricator.wikimedia.org/T360014 [15:19:50] you are faster than me [15:20:53] my scap backport was faster, maybe [15:21:03] I didn’t do anything when it merged, I just watched the terminal come to life ^^ [15:21:57] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for [[gerrit:1010581|WikiModule: Fix pages merging (T360014)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [15:22:01] testing [15:22:15] yup, seems to work on mwdebug [15:22:17] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync [15:22:20] \o/ [15:23:04] 10SRE-swift-storage, 06Commons, 10ConfirmEdit (CAPTCHA extension), 06Editing-team, and 5 others: Make SwiftFileBackend::doStoreInternal defer the opening of file handles to stay in the concurrency limit - https://phabricator.wikimedia.org/T230245#9626946 (10Reedy) [15:24:46] !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002" [15:25:36] !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002" [15:25:38] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm [15:25:46] 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9626953 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1246.eqiad.wmnet with OS... [15:27:06] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1045 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [15:27:52] !log marostegui@cumin1002 START - Cookbook sre.mysql.clone of db1182.eqiad.wmnet onto db1246.eqiad.wmnet [15:32:59] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:1010581|WikiModule: Fix pages merging (T360014)]] (duration: 13m 27s) [15:33:06] T360014: User JavaScript not loading on 1.42.0-wmf.22 (Wikidata, Commons, other group1 wikis) - https://phabricator.wikimedia.org/T360014 [15:33:18] * Lucas_WMDE done [15:33:25] seems to work without wikimedia-debug now [15:34:28] hashar: all yours :) [15:36:01] Lucas_WMDE: excellent thank you! [15:36:26] and wikibugs seems to be back, yay [15:39:41] I am rolling the train [15:41:42] Best wishes [15:42:02] it is like 90% time waiting for k8s :) [15:42:26] nod. [15:42:43] at least earlier today the backend error log looked fine [15:45:00] Speaking of which, hashar, nagging needs to happen regarding https://phabricator.wikimedia.org/T345319 [15:46:26] PROBLEM - Check whether ferm is active by checking the default input chain on mw1362 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [15:46:58] dancy: that is an old one isn't it? [15:47:07] old and live. [15:47:09] or is that surfacing again with this week version? [15:47:16] never stopped surfacing [15:47:26] :-\ [15:47:28] hence the need for nagging [15:49:35] !log hashar@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.22 refs T354440 [15:49:39] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [15:49:44] Perhaps you can offer your wisdom. [15:50:33] I' [15:51:29] dancy: I have never heard of that HTMLFormatter ;) [15:52:26] then apparently a workaround has been found to drop the pcre call and replace it with strpos / substr [15:57:06] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1045 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [16:01:21] !log hashar@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.22 refs T354440 (duration: 11m 46s) [16:01:40] OH [16:01:40] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [16:01:48] yeah so that task is easy dancy [16:02:04] the HtmlFormatter has no maintainer [16:02:28] so that fallback to the good will of who ever knows / has the rights to do a git tag and bump the composer dependency in mediawiki core & vendor [16:03:27] I have commented [16:03:31] but in short, that needs a git tag [16:03:40] and bumping the version in core & vendor [16:07:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:07:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:16:26] RECOVERY - Check whether ferm is active by checking the default input chain on mw1362 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [16:17:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:17:56] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: hw troubleshooting: Unidentified for db1246.eqiad.wmnet - https://phabricator.wikimedia.org/T359940#9627126 (10Marostegui) 05In progress→03Open a:03Marostegui So the filesystem was totally corrupted. A full reimage (deleting all the partitions) seems to have fixed... [16:19:46] (03PS1) 10Arturo Borrero Gonzalez: aptrepro: enable thirdparty/kubeadm-k8s-1-24 for buster and bullseye [puppet] - 10https://gerrit.wikimedia.org/r/1010906 (https://phabricator.wikimedia.org/T359619) [16:26:17] (03CR) 10Klausman: [C:03+2] httpbb: add ores-legacy tests [puppet] - 10https://gerrit.wikimedia.org/r/1010245 (https://phabricator.wikimedia.org/T359871) (owner: 10Ilias Sarantopoulos) [16:30:12] (03PS1) 10Elukey: ml-services: add OMP_NUM_THREADS to readability settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010909 [16:32:09] !log root@cumin1002 START - Cookbook sre.discovery.datacenter status all services in all: None - None [16:32:13] !log root@cumin1002 END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None [16:33:38] !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1021.eqiad.wmnet with reason: Decommissioning — T354561 [16:33:45] T354561: Decommission restbase10[19-27] - https://phabricator.wikimedia.org/T354561 [16:33:51] !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1021.eqiad.wmnet with reason: Decommissioning — T354561 [16:36:08] dancy: https://gerrit.wikimedia.org/r/c/HtmlFormatter/+/1010910/ (and parent change) [16:37:21] Dear all, services switchover dryrun is in progress [16:37:27] jouncebot now [16:37:33] jouncebot next [16:38:01] jouncebot: nowandnext [16:38:01] No deployments scheduled for the next 0 hour(s) and 21 minute(s) [16:38:02] In 0 hour(s) and 21 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1700) [16:38:23] thanks taavi [16:41:56] fun finding, we still have 1.42.0-wmf.19 code running [16:43:03] from mwmaint2002 [16:43:04] hashar: is it persistRevisionThreadItems.php on viwiki by any chance? [16:43:17] * Lucas_WMDE just attached to that tmux again, coincidentally [16:43:17] PHP Warning: require(/srv/mediawiki/php-1.42.0-wmf.19/includes/deferred/DeferredUpdatesScopeMediaWikiStack.php): failed to open stream: No such file or directory [16:43:25] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/990703 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah) [16:43:29] I guess it is broken cause the files have been deleted [16:43:42] I don't know which script runs :/ [16:43:49] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/990704 (owner: 10Majavah) [16:44:14] I’m trying to check now [16:44:44] that script has only been running since last week, surely that wasn’t wmf.19? https://sal.toolforge.org/log/zpdcFI4BGiVuUzOdVAoW [16:44:53] trace.id 380b9f967c680823f4d8c549 [16:45:10] (03PS1) 10Marostegui: Revert "installserver: Format /srv in db1246" [puppet] - 10https://gerrit.wikimedia.org/r/1010584 [16:45:13] it can probably be ignored I guess [16:45:30] logstash says that one was on ukwiki [16:45:37] so probably not “my” maintenance script ^^ [16:45:48] yup [16:46:37] there’s some other long-running scripts in htop I don’t know much about, could be any of them I guess [16:46:41] oh wait, logstash has the cli_argv [16:46:47] yeah it’s one of the migrateLinksTable’s [16:47:09] OH [16:47:27] probably Amir1 then [16:47:49] yes, that's me [16:47:57] what's going on? [16:48:05] migrateLinksTable on ukwiki died [16:48:07] !log root@cumin1002 START - Cookbook sre.discovery.datacenter status all services in all: None - None [16:48:08] https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-1-7.0.0-1-2024.03.13?id=i7iUOI4BX0U9mJhK7u0k [16:48:10] !log root@cumin1002 END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None [16:48:17] it's fine, I restart it now [16:48:19] I’m guessing you just need to run it again and hopefully nothing worse happened [16:48:21] yeah [16:48:30] yeah, it has happened multiple times [16:48:32] nbd [16:48:37] 👍 [16:49:16] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Sorry, I forgot to add this, thanks for doing it!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010909 (owner: 10Elukey) [16:49:24] (03CR) 10Marostegui: [C:03+2] Revert "installserver: Format /srv in db1246" [puppet] - 10https://gerrit.wikimedia.org/r/1010584 (owner: 10Marostegui) [16:52:46] !log marostegui@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1182.eqiad.wmnet onto db1246.eqiad.wmnet [16:53:24] Dear all, mediawiki dry-run is in progress [16:56:54] (03CR) 10Elukey: [C:03+2] ml-services: add OMP_NUM_THREADS to readability settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010909 (owner: 10Elukey) [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1700) [17:00:12] !log elukey@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [17:02:23] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58779 and previous config saved to /var/cache/conftool/dbconfig/20240313-170222-root.json [17:17:05] Amir1: thank you :) [17:17:29] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58781 and previous config saved to /var/cache/conftool/dbconfig/20240313-171728-root.json [17:17:37] my guess is the mediawiki train deletes the php files and anything running for more than x weeks end up with a fatal when they try to include a file [17:17:48] anyway, that looked harmless and a mystery is solved [17:17:51] * hashar dinner & [17:22:00] oh [17:22:11] dancy: I am pushing the signed tag for HtmlFormatter [17:23:03] ok [17:26:21] damn [17:26:30] it is already on https://packagist.org/packages/wikimedia/html-formatter#4.1.0 [17:26:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:26:40] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:27:14] dancy: and to update it for mediawiki the guide should be https://gerrit.wikimedia.org/g/mediawiki/vendor/+/refs/heads/master#mediawiki_vendor [17:27:51] with a first patch for vendor to ship the lib then a mediawiki/core patch that bumps the version and depends-on the vendor patch [17:28:01] I am off for dinner [17:29:32] (03Abandoned) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:32:34] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58782 and previous config saved to /var/cache/conftool/dbconfig/20240313-173234-root.json [17:34:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:35:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:43:53] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:44:00] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:45:23] hashar: Got it. Making an attempt [17:47:40] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58783 and previous config saved to /var/cache/conftool/dbconfig/20240313-174740-root.json [17:53:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [17:57:51] (SwaggerProbeHasFailures) firing: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://cxserver.svc.eqiad.wmnet:4002 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures [17:58:26] (RoutinatorRsyncErrors) resolved: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [18:00:05] hashar and jnuche: Deploy window Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240313T1800) [18:02:46] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58784 and previous config saved to /var/cache/conftool/dbconfig/20240313-180245-root.json [18:02:51] (SwaggerProbeHasFailures) resolved: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://cxserver.svc.eqiad.wmnet:4002 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures [18:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:17:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:17:52] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58785 and previous config saved to /var/cache/conftool/dbconfig/20240313-181751-root.json [18:21:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:26:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-misc at codfw: 50% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-misc&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [18:29:22] (03PS1) 10Jdlrobson: Fix Issue with localization of special page titles in exclusion logic [skins/MinervaNeue] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010585 (https://phabricator.wikimedia.org/T359958) [18:31:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-misc at codfw: 50% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-misc&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [18:32:57] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After recloning db1246', diff saved to https://phabricator.wikimedia.org/P58786 and previous config saved to /var/cache/conftool/dbconfig/20240313-183256-root.json [18:52:34] dancy: I have +2ed the vendor and core patch. Thank you for having taken care of those [18:52:57] then I guess that will solve T345319 & T348402 when it is rolling next week [18:52:58] T345319: TypeError: Argument 1 passed to HtmlFormatter\HtmlFormatter::onHtmlReady() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314 - https://phabricator.wikimedia.org/T345319 [18:52:58] T348402: MobileFrontend's transforms replaced some spaces in inlined `