[00:02:33] !log reedy@deploy2002 jforrester, reedy, zabe: Backport for [[gerrit:1233860|Updated phpunit/phpunit from 9.6.21 to 9.6.33 (T415723)]], [[gerrit:1233862|Revert "Language: Namespace dependency classes" (T415619)]], [[gerrit:1233858|build: Upgrade PHPUnit from 10.5.59 to 10.5.62 to unblock CI (T415723)]], [[gerrit:1233859|Updated phpunit/phpunit from 9.6.21 to 9.6.33 (T415723)]] synced to the testservers (see https://wikite [00:02:33] ch.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:02:39] T415723: CI blocked from installing phpunit by CVE-2026-24765 - https://phabricator.wikimedia.org/T415723 [00:02:40] T415619: Creation of dynamic property MediaWiki\Language\Dependency\FileDependency::$filename is deprecated {"exception":"[object] (ErrorException(code: 0) - https://phabricator.wikimedia.org/T415619 [00:02:51] !log reedy@deploy2002 jforrester, reedy, zabe: Continuing with sync [00:12:10] Reedy: Roll-out error rate look OK? Not stuck in a can't-deploy-forwards-or-backwards cache disaster? [00:12:23] just slooow because l10n rebuild [00:13:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [00:13:33] we're still getting ~100 a minute on the .12 deserialization glitch. am wondering if rolling train forward would fix that. [00:14:10] or if that's more of a bad-state-in-cache disaster. [00:14:17] I don't think the wmf.13 fixes will help much for that. We could bump the cache epoch? [00:15:11] !log reedy@deploy2002 Finished scap sync-world: Backport for [[gerrit:1233860|Updated phpunit/phpunit from 9.6.21 to 9.6.33 (T415723)]], [[gerrit:1233862|Revert "Language: Namespace dependency classes" (T415619)]], [[gerrit:1233858|build: Upgrade PHPUnit from 10.5.59 to 10.5.62 to unblock CI (T415723)]], [[gerrit:1233859|Updated phpunit/phpunit from 9.6.21 to 9.6.33 (T415723)]] (duration: 37m 10s) [00:15:18] T415723: CI blocked from installing phpunit by CVE-2026-24765 - https://phabricator.wikimedia.org/T415723 [00:15:18] T415619: Creation of dynamic property MediaWiki\Language\Dependency\FileDependency::$filename is deprecated {"exception":"[object] (ErrorException(code: 0) - https://phabricator.wikimedia.org/T415619 [00:16:02] it's all mediawiki.org, for whatever that's worth. [00:16:22] MW.org is the only group0 wiki with lots of Translate activity, IIRC. [00:16:24] So that tracks. [00:16:29] yeah, makes sense. [00:18:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [00:19:03] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [00:23:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [00:34:03] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [00:40:23] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1233934 [00:40:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1233934 (owner: 10TrainBranchBot) [00:52:55] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1233934 (owner: 10TrainBranchBot) [00:53:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [00:53:46] (03CR) 10Dwisehaupt: frack dns cleanup and reconfig (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) (owner: 10Dwisehaupt) [00:58:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:03:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:06:32] FIRING: [12x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:09:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:10:44] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1233944 [01:10:44] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1233944 (owner: 10TrainBranchBot) [01:14:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:15:00] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:16:53] (03CR) 10Ssingh: frack dns cleanup and reconfig (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) (owner: 10Dwisehaupt) [01:19:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:24:45] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:29:41] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:31:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:36:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:38:23] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1233944 (owner: 10TrainBranchBot) [01:41:06] (03PS1) 10Zabe: Start reading from il_target_id on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233947 (https://phabricator.wikimedia.org/T413669) [01:46:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:51:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [01:58:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [02:03:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [02:04:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [02:05:11] FIRING: ProbeDown: Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_collab_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:09:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [02:10:11] RESOLVED: ProbeDown: Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_collab_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:13:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:05:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:10:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:38:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:43:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [04:42:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [04:49:56] 10ops-magru: Inbound errors on interface cr2-magru:xe-0/1/0 (Transit: EdgeUno (E1-SER-7853-IP) {#70091}) - https://phabricator.wikimedia.org/T415743 (10phaultfinder) 03NEW [04:52:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:02:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:06:32] FIRING: [12x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:07:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:09:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:12:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:19:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:24:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:29:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:29:41] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:34:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:34:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:41:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:46:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [05:50:45] (03CR) 10Clare Ming: [C:03+1] Removed `mpic` as local service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) (owner: 10Santiago Faci) [05:54:31] (03CR) 10Clare Ming: [C:03+2] Test Kitchen UI: Deploy v.1.1.7 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233696 (https://phabricator.wikimedia.org/T415325) (owner: 10Santiago Faci) [05:56:22] (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v.1.1.7 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233696 (https://phabricator.wikimedia.org/T415325) (owner: 10Santiago Faci) [06:01:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:04:36] (03PS5) 10Ryan Kemper: opensearch-semantic-search: provision namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [06:04:36] (03PS1) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [06:06:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:07:43] (03Abandoned) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1231046 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [06:09:47] (03CR) 10Ryan Kemper: "Oops, got a little confused. This patch needs work to split into namespace provision versus deployment, but the patch i linked in the aban" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1231046 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [06:27:50] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: provision namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [06:28:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:29:25] (03CR) 10Brouberol: opensearch-semantic-search: deploy eqiad & codfw (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [06:31:46] (03CR) 10Brouberol: [C:03+2] airflow: store large XCOMs in s3 to alleviate load on the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233776 (https://phabricator.wikimedia.org/T415661) (owner: 10Brouberol) [06:33:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:36:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:38:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:39:25] (03PS1) 10Marostegui: Revert "db1163: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234143 [06:40:37] (03CR) 10Marostegui: [C:03+2] Revert "db1163: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234143 (owner: 10Marostegui) [06:40:37] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1163: After schema change [06:41:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:42:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db2203 with weight 0 T415171', diff saved to https://phabricator.wikimedia.org/P87977 and previous config saved to /var/cache/conftool/dbconfig/20260128-064215-marostegui.json [06:42:23] T415171: Switchover s1 master (db2212 -> db2203) - https://phabricator.wikimedia.org/T415171 [06:42:30] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db2203 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/1229526 (https://phabricator.wikimedia.org/T415171) (owner: 10Gerrit maintenance bot) [06:42:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T415171 [06:48:27] !log Starting s1 codfw failover from db2212 to db2203 - T415171 [06:48:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:32] T415171: Switchover s1 master (db2212 -> db2203) - https://phabricator.wikimedia.org/T415171 [06:49:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T415171', diff saved to https://phabricator.wikimedia.org/P87978 and previous config saved to /var/cache/conftool/dbconfig/20260128-064900-marostegui.json [06:49:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:49:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db2203 to s1 primary and set section read-write T415171', diff saved to https://phabricator.wikimedia.org/P87979 and previous config saved to /var/cache/conftool/dbconfig/20260128-064934-marostegui.json [06:49:55] !log marostegui@dns1006 START - running authdns-update [06:50:06] (03CR) 10Marostegui: [C:03+2] wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/1229527 (https://phabricator.wikimedia.org/T415171) (owner: 10Gerrit maintenance bot) [06:50:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2212 T415171', diff saved to https://phabricator.wikimedia.org/P87980 and previous config saved to /var/cache/conftool/dbconfig/20260128-065037-marostegui.json [06:51:03] !log marostegui@dns1006 END - running authdns-update [06:51:08] !log marostegui@dns1006 START - running authdns-update [06:52:20] !log marostegui@dns1006 END - running authdns-update [06:52:46] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2212.codfw.wmnet with reason: schema change [06:53:20] (03PS1) 10Marostegui: db2212: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1234145 (https://phabricator.wikimedia.org/T411163) [06:54:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:54:22] !log Deploy schema change on old s1 master db2212 T411163 T411164 [06:54:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:29] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [06:54:29] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [06:54:33] (03CR) 10Marostegui: [C:03+2] db2212: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1234145 (https://phabricator.wikimedia.org/T411163) (owner: 10Marostegui) [06:59:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T0700) [07:04:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:05:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:09:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:10:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:14:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:19:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:23:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:24:04] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2165 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234158 (https://phabricator.wikimedia.org/T415748) [07:24:10] (03PS1) 10Gerrit maintenance bot: wmnet: Update s8-master alias [dns] - 10https://gerrit.wikimedia.org/r/1234159 (https://phabricator.wikimedia.org/T415748) [07:26:06] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1163: After schema change [07:28:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:38:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:44:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:44:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87984 and previous config saved to /var/cache/conftool/dbconfig/20260128-074424-marostegui.json [07:44:32] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [07:44:32] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [07:49:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:54:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P87985 and previous config saved to /var/cache/conftool/dbconfig/20260128-075432-marostegui.json [07:57:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:00:05] Amir1, Urbanecm, and awight: UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T0800). Please do the needful. [08:00:05] No Gerrit patches in the queue for this window AFAICS. [08:02:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:04:41] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P87986 and previous config saved to /var/cache/conftool/dbconfig/20260128-080441-marostegui.json [08:05:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:10:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:14:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P87987 and previous config saved to /var/cache/conftool/dbconfig/20260128-081449-marostegui.json [08:14:56] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance [08:14:59] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [08:14:59] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [08:15:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:16:41] (03CR) 10Joal: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1233836 (https://phabricator.wikimedia.org/T414389) (owner: 10Xcollazo) [08:18:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:18:45] (03CR) 10Joal: airflow: store large XCOMs in s3 to alleviate load on the database (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233776 (https://phabricator.wikimedia.org/T415661) (owner: 10Brouberol) [08:20:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:21:42] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [08:26:46] (03CR) 10Gehel: [C:03+2] Add comments to dse-k8s-eqiad airflow helmfiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/1233730 (owner: 10Joal) [08:29:30] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:29:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:34:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:38:51] (03PS2) 10Arnaudb: gerrit: remove differenciated logs for mod_qos [puppet] - 10https://gerrit.wikimedia.org/r/1234269 [08:38:51] (03CR) 10Arnaudb: "this should help to reduce alerting on diskspace" [puppet] - 10https://gerrit.wikimedia.org/r/1234269 (owner: 10Arnaudb) [08:39:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:52:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:57:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:00:04] brennen and andre: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T0900). [09:06:32] FIRING: [12x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:14:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:18:45] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply [09:19:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:19:25] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply [09:21:42] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [09:24:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:29:41] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:38:15] RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:39:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:42:08] (03CR) 10Marostegui: [C:03+1] clone: setup and start repl on target host early [cookbooks] - 10https://gerrit.wikimedia.org/r/1233761 (https://phabricator.wikimedia.org/T415564) (owner: 10Federico Ceratto) [09:43:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:48:03] (03PS4) 10Dpogorzelski: admin: add the ml-builder-docker group [puppet] - 10https://gerrit.wikimedia.org/r/1230280 [09:48:03] (03PS1) 10Dpogorzelski: aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1234291 [09:57:54] (03CR) 10Giuseppe Lavagetto: [C:03+1] aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1234291 (owner: 10Dpogorzelski) [09:58:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [09:59:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:01:38] (03CR) 10Elukey: "I had a chat with Moritz about this and it seems good, but since it is a new sudo-like set of permissions it will have to go through a for" [puppet] - 10https://gerrit.wikimedia.org/r/1230280 (owner: 10Dpogorzelski) [10:03:21] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply [10:03:59] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply [10:13:01] (03CR) 10Dpogorzelski: [C:03+2] aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1234291 (owner: 10Dpogorzelski) [10:23:07] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1216165 (owner: 10PipelineBot) [10:23:19] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211772 (owner: 10PipelineBot) [10:26:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:27:51] (03PS8) 10Mvolz: Remove deprecated parameter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204813 (https://phabricator.wikimedia.org/T361576) [10:28:24] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 28 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204813 (https://phabricator.wikimedia.org/T361576) (owner: 10Mvolz) [10:31:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:32:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:36:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:39:50] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234307 [10:51:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:56:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:57:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [10:59:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1100) [11:01:30] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:04:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:06:30] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:11:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:16:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:21:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:27:38] (03PS3) 10Dzahn: ncredir: remove wikipedia25.org, keep wikipedia25.com to www.wikipedia25.org [puppet] - 10https://gerrit.wikimedia.org/r/1216856 (https://phabricator.wikimedia.org/T408592) [11:32:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:36:50] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:37:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:42:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [11:54:34] (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234307 (owner: 10PipelineBot) [11:56:39] (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234307 (owner: 10PipelineBot) [12:00:05] mvolz: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1200). [12:01:13] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:01:18] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:02:08] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:02:15] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:06:38] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:06:47] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:13:32] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:13:39] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:15:34] So it's been 20 minutes since I've merged to deployment-charts and the helmfile still seems outdated... any ideas? [12:18:59] elukey: thoughts? [12:22:01] brouberol: ^ I think you have local changes in the airflow values in deployment-charts preventing the updater timer? [12:22:42] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:23:22] (03PS1) 10Brouberol: airflow: store XCOMs>256KB in s3 to alleviate load on the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234346 (https://phabricator.wikimedia.org/T415661) [12:24:20] that would do it. I tried and apparently I can just manually edit the files on the server rather than letting the updater do it but that's probably a bad idea haha [12:24:34] taavi oops sorry, let me stash them [12:25:07] done [12:25:44] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:27:11] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:27:33] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:27:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:28:28] (03CR) 10Brouberol: [C:03+1] aptrepo: remove yarn due to expired key [puppet] - 10https://gerrit.wikimedia.org/r/1234291 (owner: 10Dpogorzelski) [12:28:58] PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:36:13] (03PS4) 10Mvolz: citoid: downgrade version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230315 (owner: 10PipelineBot) [12:37:05] (03CR) 10Mvolz: [C:03+2] citoid: downgrade version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230315 (owner: 10PipelineBot) [12:37:48] PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:38:03] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:38:10] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:40:07] 06SRE, 10Observability-Metrics, 06SRE Observability (FY2025/2026-Q3): Add Druid as a Private Grafana Datasource - https://phabricator.wikimedia.org/T410933#11561886 (10BCornwall) Considering Turnilo's development has largely stalled, Superset has usability issues regarding on-the-fly data spelunking, this se... [12:40:39] (03PS1) 10Majavah: P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) [12:40:42] (03PS1) 10Majavah: openldap: offboard-user: Query list of Cloud VPS root keys [puppet] - 10https://gerrit.wikimedia.org/r/1234358 (https://phabricator.wikimedia.org/T398214) [12:41:12] (03CR) 10CI reject: [V:04-1] P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) (owner: 10Majavah) [12:41:18] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:41:43] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:42:51] !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply [12:42:52] (03CR) 10Joal: [C:03+1] "Thank you!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234346 (https://phabricator.wikimedia.org/T415661) (owner: 10Brouberol) [12:43:17] !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply [12:43:47] !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply [12:43:50] (03PS2) 10Majavah: P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) [12:43:50] (03PS2) 10Majavah: openldap: offboard-user: Query list of Cloud VPS root keys [puppet] - 10https://gerrit.wikimedia.org/r/1234358 (https://phabricator.wikimedia.org/T398214) [12:44:16] !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply [12:44:47] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7953/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) (owner: 10Majavah) [12:44:54] (03CR) 10Brouberol: [C:03+2] airflow: store XCOMs>256KB in s3 to alleviate load on the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234346 (https://phabricator.wikimedia.org/T415661) (owner: 10Brouberol) [12:45:19] (03PS3) 10Majavah: P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) [12:45:19] (03PS3) 10Majavah: openldap: offboard-user: Query list of Cloud VPS root keys [puppet] - 10https://gerrit.wikimedia.org/r/1234358 (https://phabricator.wikimedia.org/T398214) [12:45:52] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7954/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) (owner: 10Majavah) [12:47:46] (03Merged) 10jenkins-bot: airflow: store XCOMs>256KB in s3 to alleviate load on the database [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234346 (https://phabricator.wikimedia.org/T415661) (owner: 10Brouberol) [12:48:38] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230283 (owner: 10PipelineBot) [12:52:17] (03Abandoned) 10Dzahn: zookeeper: add parameter and path to tls cert passphrase [puppet] - 10https://gerrit.wikimedia.org/r/1233697 (https://phabricator.wikimedia.org/T405119) (owner: 10Dzahn) [12:56:59] (03PS5) 10Dzahn: zookeeper: add ssl.keyStore.passwordPath [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) [12:58:40] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/output/1224908/7955/zuul1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) (owner: 10Dzahn) [12:59:55] (03PS4) 10Majavah: P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) [12:59:55] (03PS4) 10Majavah: openldap: offboard-user: Query list of Cloud VPS root keys [puppet] - 10https://gerrit.wikimedia.org/r/1234358 (https://phabricator.wikimedia.org/T398214) [12:59:55] (03PS1) 10Majavah: wmflib: deep_merge: Do not duplicate array values [puppet] - 10https://gerrit.wikimedia.org/r/1234366 [13:00:37] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7957/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398217) (owner: 10Majavah) [13:01:14] (03CR) 10Dzahn: [V:03+1] "https://puppet-compiler.wmflabs.org/output/1224908/7956/" [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) (owner: 10Dzahn) [13:01:14] jouncebot: nowandnext [13:01:14] No deployments scheduled for the next 0 hour(s) and 58 minute(s) [13:01:14] In 0 hour(s) and 58 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1400) [13:01:27] Going to do my backport early [13:03:35] (03CR) 10Dzahn: [V:03+1 C:03+2] zookeeper: add ssl.keyStore.passwordPath [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) (owner: 10Dzahn) [13:05:25] (03PS6) 10Dzahn: zookeeper: add ssl.keyStore.passwordPath [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) [13:07:31] 06SRE, 10SRE-Access-Requests: Gashamo Hawd Somali Ethiopia - https://phabricator.wikimedia.org/T415781 (10Ahmedderai1000) 03NEW [13:08:12] 06SRE, 10SRE-Access-Requests: Gashamo Hawd Somali Ethiopia - https://phabricator.wikimedia.org/T415781#11561994 (10Ahmedderai1000) Hawd Zone [13:10:57] (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1224908/7958/" [puppet] - 10https://gerrit.wikimedia.org/r/1224908 (https://phabricator.wikimedia.org/T405119) (owner: 10Dzahn) [13:25:07] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [13:26:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:29:41] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:33:05] !log jforrester@deploy2002 mwscript-k8s job started: /srv/mediawiki/php-1.46.0-wmf.13/extensions/Translate/scripts/createMessageIndex.php --wiki=mediawikiwiki # T415725 [13:33:11] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_In - https://phabricator.wikimedia.org/T415725 [13:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:41:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:49:26] FIRING: [10x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:52:16] (03CR) 10Xcollazo: "Thanks for the review @joal@wikimedia.org." [puppet] - 10https://gerrit.wikimedia.org/r/1233836 (https://phabricator.wikimedia.org/T414389) (owner: 10Xcollazo) [13:56:35] !log sfaci@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply [13:57:04] !log sfaci@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply [13:59:00] (03PS1) 10Anzx: kajwiki: add tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) [13:59:02] (03PS2) 10Anzx: kajwiki: add tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) [13:59:53] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 28 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) (owner: 10Anzx) [14:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor My software never has bugs. It just develops random features. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1400). [14:00:05] Dreamy_Jazz, Mvolz, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:06] (03CR) 10CI reject: [V:04-1] kajwiki: add tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) (owner: 10Anzx) [14:00:23] \o [14:00:25] o/ [14:01:07] I can self deploy if you're happy for me to go. [14:01:17] just doing a quick grep for the setting [14:01:24] it still seems to be used in wmf.11… [14:01:55] hm, and in wmf.13 too? [14:02:09] (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) (owner: 10Anzx) [14:03:33] ok I think I see what’s going on. should be good to go then [14:03:42] Mvolz: go ahead :) [14:03:53] Thanks for double checking! [14:04:19] unrelated, I think it’s possible we can’t deploy at all due to T415725 [14:04:21] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_In - https://phabricator.wikimedia.org/T415725 [14:04:42] but I would say let’s try it [14:04:55] Mvolz: o/ sorry I am at the SRE summit, do you still need help? [14:05:21] elukey: no, all fixed, sorry for the ping! [14:05:31] Lucas_WMDE: okay, I'll give it a go [14:06:57] Lucas_WMDE: hmm, it does not want to give me my one time password. says it doesn't recognise host? related? or coincidence? [14:07:22] Doesn't seem like it would be related... [14:07:26] don’t think that can be related [14:07:39] depending on your SSH config you might have to connect to the host differently than what the message says [14:07:55] (e.g. for me it’s `ssh deployment.eqiad.wmnet scap spiderpig-otp`) [14:08:44] okay, that workd [14:09:20] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mvolz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204813 (https://phabricator.wikimedia.org/T361576) (owner: 10Mvolz) [14:10:53] (03Abandoned) 10Anzx: kajwiki: add tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234395 (https://phabricator.wikimedia.org/T415038) (owner: 10Anzx) [14:11:02] (03Merged) 10jenkins-bot: Remove deprecated parameter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204813 (https://phabricator.wikimedia.org/T361576) (owner: 10Mvolz) [14:12:14] !log mvolz@deploy2002 Started scap sync-world: Backport for [[gerrit:1204813|Remove deprecated parameter (T361576)]] [14:12:19] T361576: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576 [14:13:06] Lucas_WMDE: um. https://test.wikidata.org/wiki/Wikidata:Main_Page [14:13:36] this too is T415725, I think [14:13:37] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [14:13:40] someone said a purge fixes it [14:13:50] hm, nope [14:14:10] not on this page at least [14:14:43] I regret starting this deploy because it (should) only affect test wikidata, so I won't be able to confirm anything. [14:14:54] well, you would test this on item pages, right? [14:14:57] those still work AFAICT [14:15:03] Ah okay. [14:15:04] trying to find one with citoid-y citations [14:15:09] I panicked. [14:15:11] :P [14:15:19] hm, no dice [14:16:19] https://test.wikidata.org/wiki/Q178735 is working. [14:16:51] oh dear [14:16:58] but yeah [14:17:00] !log mvolz@deploy2002 mvolz: Backport for [[gerrit:1204813|Remove deprecated parameter (T361576)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:17:36] in october 2018 rowling hadn’t gone full mask off yet (iirc), so, yeah, cool test item we have there :S [14:19:06] hm, I get “Unable to generate a reference from this input. Use the manual tab, or try again.” [14:19:41] ok, a different URL works for me [14:20:04] https://test.wikidata.org/w/index.php?title=Q178735&diff=prev&oldid=756379 [14:20:10] Yeah it works for me too. [14:20:14] yay [14:20:59] meanwhile in spiderpig, testserver checks failed [14:21:18] I think that’s due to the same error still [14:21:42] yeah in logstash it’s all CachedMessageGroupFactoryLoader errors [14:22:01] hmm. wat do? [14:22:03] I would suggest retrying a few times, and if it keeps failing with the same error (and nothing citoid-related), IMHO it’s okay to ignore failure and proceed with deployment [14:22:16] ok [14:23:40] I retried twice, no dice, but I don't see where the errors go? [14:23:47] err, are output? [14:24:56] they should be in https://logstash.wikimedia.org/app/dashboards#/view/mwdebug1002 I think? [14:25:07] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [14:25:37] yeah, the scap output says it’s sending to mwdebug-next.discovery.wmnet / mwdebug.discovery.wmnet [14:25:46] which matches the host names in that logstash [14:25:56] (mw-debug.codfw.next-55bdffb4c6-5gr5r, mw-debug.codfw.pinkunicorn-5b494d5994-ttr6g, etc.) [14:27:55] okay well I don't see anything. I guess I'll proceed. [14:28:04] related to the change* [14:28:39] okay [14:29:38] !log mvolz@deploy2002 mvolz: Continuing with sync [14:33:38] (03CR) 10Federico Ceratto: [C:03+2] clone: setup and start repl on target host early [cookbooks] - 10https://gerrit.wikimedia.org/r/1233761 (https://phabricator.wikimedia.org/T415564) (owner: 10Federico Ceratto) [14:33:45] the test item is now safe to click fyi :P [14:34:21] \o/ [14:34:23] :D [14:35:32] meanwhile the canary check in scap didn’t show an elevated error rate, so that’s good [14:35:47] !log mvolz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1204813|Remove deprecated parameter (T361576)]] (duration: 23m 33s) [14:35:52] T361576: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576 [14:36:13] Dreamy_Jazz: do you want to deploy your change next? [14:42:34] (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2026-01-15-194836 to 2026-01-27-063404 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234427 (https://phabricator.wikimedia.org/T353354) [14:42:54] (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2026-01-21-135031 to 2026-01-28-071101 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234428 (https://phabricator.wikimedia.org/T353354) [14:45:05] (03PS1) 10AOkoth: admin: change johannnes89 -> j89 [puppet] - 10https://gerrit.wikimedia.org/r/1234429 (https://phabricator.wikimedia.org/T414789) [14:45:16] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) (owner: 10Santiago Faci) [14:50:32] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance [14:50:51] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [14:50:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1165 (T415786)', diff saved to https://phabricator.wikimedia.org/P87999 and previous config saved to /var/cache/conftool/dbconfig/20260128-145057-marostegui.json [14:51:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:51:15] Dreamy_Jazz: do you want to deploy your config change? [14:51:39] Sorry, was away from the PC [14:51:41] I can deploy it now [14:51:48] ok, I think there should be enough time [14:52:15] note that you’ll probably run into T415725 during the deploy [14:52:15] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [14:52:15] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance [14:52:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2151 (T415786)', diff saved to https://phabricator.wikimedia.org/P88000 and previous config saved to /var/cache/conftool/dbconfig/20260128-145223-marostegui.json [14:52:49] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233752 (https://phabricator.wikimedia.org/T361199) (owner: 10Dreamy Jazz) [14:52:55] Thanks [14:54:00] (03Merged) 10jenkins-bot: CheckUser: Read new for user agent table migration everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233752 (https://phabricator.wikimedia.org/T361199) (owner: 10Dreamy Jazz) [14:54:31] !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1233752|CheckUser: Read new for user agent table migration everywhere (T361199)]] [14:54:37] T361199: Set user agent schema migration config to read new on WMF wikis - https://phabricator.wikimedia.org/T361199 [14:56:53] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1233752|CheckUser: Read new for user agent table migration everywhere (T361199)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:59:16] I'm getting errors that https://test.wikidata.org/wiki/Wikidata:Main_Page has a 500 error [14:59:29] yup, that’s T415725 [14:59:30] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [14:59:35] Thanks [14:59:39] you can look at the errors in logstash to confirm [14:59:51] all I see is CachedMessageGroupFactoryLoader [15:00:02] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [15:00:04] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1500) [15:00:08] (and a fair amount of CSP reports, heh) [15:00:08] (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Upgrade evaluators from 2026-01-15-194836 to 2026-01-27-063404 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234427 (https://phabricator.wikimedia.org/T353354) (owner: 10Jforrester) [15:00:18] Yeah, I had forgot that the error was "TypeError" in the interface [15:00:21] Proceeding as my test was fine [15:00:45] (03PS1) 10Wargo: Revert "REST: enable the site.v1 module" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234437 (https://phabricator.wikimedia.org/T415771) [15:02:27] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2026-01-15-194836 to 2026-01-27-063404 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234427 (https://phabricator.wikimedia.org/T353354) (owner: 10Jforrester) [15:03:31] !log apine@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:04:19] !log apine@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:04:30] !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1233752|CheckUser: Read new for user agent table migration everywhere (T361199)]] (duration: 09m 58s) [15:04:35] T361199: Set user agent schema migration config to read new on WMF wikis - https://phabricator.wikimedia.org/T361199 [15:04:38] I'm done [15:04:46] !log apine@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:05:04] !log UTC afternoon backport+config window done [15:05:05] thanks! [15:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:27] !log apine@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:05:30] Thanks for the reminder ping :D. On the phone with government tax people, so that's what caused time to slip away :D [15:05:36] !log apine@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:06:04] oh dear :D [15:06:18] !log apine@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:06:45] (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Upgrade orchestrator from 2026-01-21-135031 to 2026-01-28-071101 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234428 (https://phabricator.wikimedia.org/T353354) (owner: 10Jforrester) [15:08:36] (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2026-01-21-135031 to 2026-01-28-071101 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234428 (https://phabricator.wikimedia.org/T353354) (owner: 10Jforrester) [15:09:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:50] !log apine@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:10:18] !log apine@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:11:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88001 and previous config saved to /var/cache/conftool/dbconfig/20260128-151133-marostegui.json [15:11:42] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [15:11:44] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [15:12:04] !log apine@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:12:33] !log apine@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:12:47] !log apine@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:13:16] !log apine@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:21:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88002 and previous config saved to /var/cache/conftool/dbconfig/20260128-152141-marostegui.json [15:25:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T415786)', diff saved to https://phabricator.wikimedia.org/P88003 and previous config saved to /var/cache/conftool/dbconfig/20260128-152501-marostegui.json [15:25:10] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:30:05] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1500) [15:30:05] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1530) [15:30:49] <+stashbot> T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [15:30:50] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [15:31:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88004 and previous config saved to /var/cache/conftool/dbconfig/20260128-153150-marostegui.json [15:32:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:34:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:35:49] (03PS1) 10Jdlrobson: Replace width with max-width in ext.math.less [extensions/Math] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234442 (https://phabricator.wikimedia.org/T415577) [15:37:36] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:39:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:39:26] (03PS1) 10Jforrester: wikifunctions: Add memory usage to check-wf-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234444 [15:40:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P88005 and previous config saved to /var/cache/conftool/dbconfig/20260128-154010-marostegui.json [15:41:59] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88006 and previous config saved to /var/cache/conftool/dbconfig/20260128-154158-marostegui.json [15:42:10] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [15:42:11] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [15:42:15] FIRING: [3x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-int - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [15:42:16] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance [15:42:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88007 and previous config saved to /var/cache/conftool/dbconfig/20260128-154224-marostegui.json [15:47:15] FIRING: [3x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-int - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [15:47:58] (03CR) 10CI reject: [V:04-1] Replace width with max-width in ext.math.less [extensions/Math] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234442 (https://phabricator.wikimedia.org/T415577) (owner: 10Jdlrobson) [15:49:20] (03CR) 10Gehel: [C:03+2] "Key verified out of band." [puppet] - 10https://gerrit.wikimedia.org/r/1233566 (owner: 10Clare Ming) [15:49:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T415786)', diff saved to https://phabricator.wikimedia.org/P88009 and previous config saved to /var/cache/conftool/dbconfig/20260128-154919-marostegui.json [15:49:26] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:49:52] (03PS2) 10Majavah: wmflib: deep_merge: Do not duplicate array values [puppet] - 10https://gerrit.wikimedia.org/r/1234366 [15:49:52] (03PS5) 10Majavah: P:wmcs::instance: Convert root keys list to YAML [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398214) [15:49:55] (03PS5) 10Majavah: openldap: offboard-user: Query list of Cloud VPS root keys [puppet] - 10https://gerrit.wikimedia.org/r/1234358 (https://phabricator.wikimedia.org/T398214) [15:55:12] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7959/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234357 (https://phabricator.wikimedia.org/T398214) (owner: 10Majavah) [15:55:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P88010 and previous config saved to /var/cache/conftool/dbconfig/20260128-155518-marostegui.json [15:55:51] (03CR) 10Jdlrobson: "recheck" [extensions/Math] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234442 (https://phabricator.wikimedia.org/T415577) (owner: 10Jdlrobson) [15:58:52] PROBLEM - MariaDB Replica Lag: analytics-meta-replica on an-mariadb1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 689173.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [15:59:25] btullis: ^ [15:59:52] PROBLEM - MariaDB Replica Lag: analytics_meta on db1208 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 696541.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:00:23] btullis: and ^ [16:01:52] 06SRE, 10PageImages, 06Traffic: OGP lists fullsize thumbnail version of original instead the original itself - https://phabricator.wikimedia.org/T415598#11562882 (10TheDJ) [16:04:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P88011 and previous config saved to /var/cache/conftool/dbconfig/20260128-160429-marostegui.json [16:07:46] FYI anyone that might be interested while most SREs are at the summit - starting tomorrow around 16:00 UTC we will be sending events from 100% of pageviews on eswiki and ptwiki to allow Traffic and netops to have some numbers for an upcoming 100% enwiki instrument. We ran a very similar instrument in the fall on 10% of enwiki without issue, so not expecting anything weird now. [16:10:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T415786)', diff saved to https://phabricator.wikimedia.org/P88012 and previous config saved to /var/cache/conftool/dbconfig/20260128-161027-marostegui.json [16:10:34] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:10:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance [16:10:52] (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Add memory usage to check-wf-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234444 (owner: 10Jforrester) [16:10:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88013 and previous config saved to /var/cache/conftool/dbconfig/20260128-161052-marostegui.json [16:12:41] (03Merged) 10jenkins-bot: wikifunctions: Add memory usage to check-wf-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234444 (owner: 10Jforrester) [16:19:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P88014 and previous config saved to /var/cache/conftool/dbconfig/20260128-161937-marostegui.json [16:25:45] (03PS1) 10Abijeet Patro: Update cache version for message group caches [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) [16:27:30] (03CR) 10Abijeet Patro: "This can be tried if `createMessageIndex.php` script doesn't work to resolve the errors." [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [16:27:52] RECOVERY - MariaDB Replica Lag: analytics_meta on db1208 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:29:43] (03CR) 10Jforrester: "We tried that in https://sal.toolforge.org/log/IJLOBJwBvg159pQrxo9h but it didn't have any apparent effect." [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [16:32:42] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Worth a try IMHO." [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [16:33:59] (03CR) 10Jforrester: [C:03+1] Update cache version for message group caches [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [16:34:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T415786)', diff saved to https://phabricator.wikimedia.org/P88015 and previous config saved to /var/cache/conftool/dbconfig/20260128-163445-marostegui.json [16:34:52] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:35:02] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance [16:35:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88016 and previous config saved to /var/cache/conftool/dbconfig/20260128-163510-marostegui.json [16:36:50] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [16:36:52] RECOVERY - MariaDB Replica Lag: analytics-meta-replica on an-mariadb1002 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [16:37:48] RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [16:45:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88017 and previous config saved to /var/cache/conftool/dbconfig/20260128-164528-marostegui.json [16:45:36] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:46:38] (03PS1) 10Muehlenhoff: Remove access for iflorez [puppet] - 10https://gerrit.wikimedia.org/r/1234455 [16:47:16] (03CR) 10CI reject: [V:04-1] Remove access for iflorez [puppet] - 10https://gerrit.wikimedia.org/r/1234455 (owner: 10Muehlenhoff) [16:50:10] FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:54:14] RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:54:17] (03PS1) 10Joal: Update ext-EventStreamConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234457 (https://phabricator.wikimedia.org/T415638) [16:55:28] (03PS2) 10Muehlenhoff: Remove access for iflorez [puppet] - 10https://gerrit.wikimedia.org/r/1234455 [16:55:40] (03CR) 10Jgiannelos: [C:03+1] Update ext-EventStreamConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234457 (https://phabricator.wikimedia.org/T415638) (owner: 10Joal) [16:56:22] (03CR) 10JHathaway: [C:03+2] Remove access for iflorez [puppet] - 10https://gerrit.wikimedia.org/r/1234455 (owner: 10Muehlenhoff) [16:58:20] !log jmm@cumin2002 DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Iflorez out of all services on: 2487 hosts [17:00:37] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P88018 and previous config saved to /var/cache/conftool/dbconfig/20260128-170036-marostegui.json [17:00:50] jouncebot nowandnext [17:00:50] No deployments scheduled for the next 0 hour(s) and 59 minute(s) [17:00:50] In 0 hour(s) and 59 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1800) [17:02:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:07:51] brennen: should we try deploying https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/1234451? (cc James_F) [17:08:13] bah, I was about to say it should be easy to test because https://test.wikidata.org/wiki/Wikidata:Main_Page consistently fails for me, but of course now it loaded -.- [17:08:13] Lucas_WMDE: Yes, it almost certainly can't make it worse. [17:08:17] Lucas_WMDE: seems like a good idea. i can run it. [17:08:19] question though... [17:08:25] We're presumably going to have to bump again when we try re-namespacing :/ [17:08:27] is there going to be weirdness with .12 vs. 13? [17:09:07] (this is my free admission of the obvious that i don't actually have a mental model of the cache situation.) [17:09:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:09:36] same here, I haven’t looked closely into the issue at all :/ [17:09:47] I don't think anyone has. [17:09:57] but given that this seems to be about page translation, not interface messages, I would expect no cross-wiki caching here [17:10:05] so hopefully no wmf version issues from that, at least [17:10:42] Most of the noise does relate to translate [17:10:48] And a spew of related messages [17:10:53] ok, i'll deploy. i think if errors drop off from test.wikidata we can then assume safe to roll forward to group0? [17:11:12] Yes. [17:11:20] kk, going ahead. [17:11:24] sounds sensible to me [17:11:45] As individual reproduction steps "fix" themselves at times, so we can't guarantee that the new code is what fixed any individual case. [17:12:06] yeah… though FWIW right now I can at least reproduce it at https://test.wikidata.org/w/index.php?title=Wikidata:Main_Page&action=history [17:12:27] (might be more valuable than a case from half an hour ago, at least) [17:13:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy2002 using scap backport" [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [17:14:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:15:04] (03Merged) 10jenkins-bot: Update cache version for message group caches [extensions/Translate] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234451 (https://phabricator.wikimedia.org/T415725) (owner: 10Abijeet Patro) [17:15:40] !log brennen@deploy2002 Started scap sync-world: Backport for [[gerrit:1234451|Update cache version for message group caches (T415725)]] [17:15:45] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P88019 and previous config saved to /var/cache/conftool/dbconfig/20260128-171544-marostegui.json [17:15:47] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [17:17:58] !log brennen@deploy2002 abi, brennen: Backport for [[gerrit:1234451|Update cache version for message group caches (T415725)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [17:19:37] https://test.wikidata.org/w/index.php?title=Wikidata:Main_Page&action=history looks to work on mw-debug. [17:21:33] same here [17:21:38] cool [17:21:43] !log brennen@deploy2002 abi, brennen: Continuing with sync [17:21:44] * Lucas_WMDE goes to random pages a few times [17:21:44] (03Abandoned) 10Jdlrobson: Add namespace-specific collapsible section handlng for Parsoid mobile [extensions/MobileFrontend] (wmf/1.46.0-wmf.11) - 10https://gerrit.wikimedia.org/r/1227849 (https://phabricator.wikimedia.org/T407815) (owner: 10Jdlrobson) [17:22:42] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [17:23:29] (still no errors for me) [17:25:44] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [17:25:51] !log brennen@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234451|Update cache version for message group caches (T415725)]] (duration: 10m 11s) [17:25:57] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [17:26:42] * Lucas_WMDE looks at logspam-watch [17:26:47] the error is happening on wmf.12 as well? [17:26:59] I thought it was only on the new train, sorry [17:27:22] yeah, it's been happening primarily on .12 [17:27:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [17:28:05] last seen for .13 at 17:25 UTC, giving it a few minutes to see if it's actually fixed there and then i'll roll forward [17:28:24] And only seen in group0 wikis which had .13 rolled out to them. [17:28:33] well, crap. still seems to happen for .13. [17:28:39] So if it's fixed in wmf.13 now rolling the train will squash them? [17:28:45] ohhh, right, mediawikiwiki isn’t in testwikis [17:28:46] brennen: In Translate? Or a different cache user? [17:28:58] RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [17:29:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:29:16] good question, checking logstash [17:29:22] rate might have dropped [17:30:45] if I read https://logstash.wikimedia.org/goto/e4144698b4bc34c7d1888045b145b19f correctly, the rate dropped half an hour ago, which would be too early for it to be related to the backport [17:30:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88020 and previous config saved to /var/cache/conftool/dbconfig/20260128-173053-marostegui.json [17:31:01] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:31:10] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance [17:31:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88021 and previous config saved to /var/cache/conftool/dbconfig/20260128-173118-marostegui.json [17:31:40] and the latest occurrences have a different replicaset in the host (IIUC), so I think you’re right that it’s still happening :/ [17:32:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88022 and previous config saved to /var/cache/conftool/dbconfig/20260128-173225-marostegui.json [17:34:13] Lucas_WMDE, James_F: yeah, seems like [17:34:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:35:06] weird, at the moment I get an error from https://test.wikidata.org/w/index.php?title=Wikidata:Main_Page&action=history on the main servers but not on mwdebug [17:35:09] pretty consistently afaict [17:35:41] Cache-poisoning from good servers into bad mid-deploy? [17:35:50] but earlier today I’m pretty sure I got the error on mwdebug, so it’s not intrinsically invulnerable to the bug… [17:38:06] The only TypeErrors that I can see in prod are from Translate. [17:38:54] PROBLEM - SSH on bast4005 is CRITICAL: Server answer: Exceeded MaxStartups https://wikitech.wikimedia.org/wiki/SSH/monitoring [17:39:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:39:54] RECOVERY - SSH on bast4005 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [17:41:39] (03PS1) 10Zabe: BETA: Stop writing to il_to [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) [17:44:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [17:47:33] TypeError when viewing https://www.mediawiki.org/wiki/Extension:Math [17:47:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P88023 and previous config saved to /var/cache/conftool/dbconfig/20260128-174734-marostegui.json [17:49:41] FIRING: [10x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:00:04] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1800) [18:02:45] Krinkle, I think that's https://phabricator.wikimedia.org/T415725 [18:02:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P88024 and previous config saved to /var/cache/conftool/dbconfig/20260128-180243-marostegui.json [18:06:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88025 and previous config saved to /var/cache/conftool/dbconfig/20260128-180655-marostegui.json [18:07:02] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:11:50] (03CR) 10Dwisehaupt: frack dns cleanup and reconfig (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) (owner: 10Dwisehaupt) [18:14:51] (03PS3) 10Dwisehaupt: frack dns cleanup and reconfig [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) [18:15:52] (03CR) 10CI reject: [V:04-1] frack dns cleanup and reconfig [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) (owner: 10Dwisehaupt) [18:17:56] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88026 and previous config saved to /var/cache/conftool/dbconfig/20260128-181755-marostegui.json [18:18:02] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:18:13] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance [18:18:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88027 and previous config saved to /var/cache/conftool/dbconfig/20260128-181820-marostegui.json [18:22:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P88028 and previous config saved to /var/cache/conftool/dbconfig/20260128-182203-marostegui.json [18:24:50] (03PS4) 10Dwisehaupt: frack dns cleanup and reconfig [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) [18:27:00] (03PS2) 10Santiago Faci: Removed `mpic` as local service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) [18:29:15] (03CR) 10Dwisehaupt: "@jgreen@wikimedia.org here are the proposed changes to set us up for next week's maintenance." [dns] - 10https://gerrit.wikimedia.org/r/1233877 (https://phabricator.wikimedia.org/T364185) (owner: 10Dwisehaupt) [18:34:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [18:37:12] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P88029 and previous config saved to /var/cache/conftool/dbconfig/20260128-183711-marostegui.json [18:39:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [18:43:03] !log dancy@deploy2002 Installing scap version "4.238.0" for 2 host(s) [18:44:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [18:44:58] !log dancy@deploy2002 Installation of scap version "4.238.0" completed for 2 hosts [18:49:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [18:52:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88030 and previous config saved to /var/cache/conftool/dbconfig/20260128-185220-marostegui.json [18:52:27] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [18:52:37] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance [18:52:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88031 and previous config saved to /var/cache/conftool/dbconfig/20260128-185245-marostegui.json [18:54:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:00:05] brennen and andre: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Utc-7+Utc-0 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1900). [19:00:13] noooo [19:01:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:02:06] still blocked. [19:06:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:11:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:13:54] (03PS1) 10Dreamy Jazz: Set wgCheckUserSuggestedInvestigationsUseGlobalContributionsLink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234481 [19:14:14] jouncebot: nowandnext [19:14:15] For the next 1 hour(s) and 45 minute(s): MediaWiki train - Utc-7+Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T1900) [19:14:15] In 1 hour(s) and 45 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2100) [19:14:35] brennen: Can I backport if you are not running the train yet? [19:14:52] Just a simple mediawiki config change [19:14:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88032 and previous config saved to /var/cache/conftool/dbconfig/20260128-191454-marostegui.json [19:15:01] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:15:55] If you want me to wait until the backport window, I'm happy to do that [19:17:14] cc andre: thoughts on the above? [19:17:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:18:02] Dreamy_Jazz: better to ping b.rennen, I'm on my way out, sorry :-/ [19:18:17] Thanks. Have done, will await their response [19:18:34] Dreamy_Jazz: I think you should be okay to deploy. Keep in mind that the mediawiki error rate is high right now due to the blocker, so scap may complain about that during the deployment. [19:18:49] Yeah. I saw that earlier today doing another scap deployment [19:18:51] Thanks [19:19:09] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234481 (owner: 10Dreamy Jazz) [19:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:20:03] (03Merged) 10jenkins-bot: Set wgCheckUserSuggestedInvestigationsUseGlobalContributionsLink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234481 (owner: 10Dreamy Jazz) [19:20:35] !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1234481|Set wgCheckUserSuggestedInvestigationsUseGlobalContributionsLink]] [19:22:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:22:48] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1234481|Set wgCheckUserSuggestedInvestigationsUseGlobalContributionsLink]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [19:25:57] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [19:28:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88033 and previous config saved to /var/cache/conftool/dbconfig/20260128-192825-marostegui.json [19:28:31] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [19:29:17] sorry Dreamy_Jazz, stepped afk for food [19:30:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P88034 and previous config saved to /var/cache/conftool/dbconfig/20260128-193002-marostegui.json [19:30:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:30:27] !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234481|Set wgCheckUserSuggestedInvestigationsUseGlobalContributionsLink]] (duration: 09m 52s) [19:35:05] (03PS1) 10Jon Harald Søby: Enable VE in Project namespace on mswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234484 (https://phabricator.wikimedia.org/T415823) [19:35:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:35:24] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234484 (https://phabricator.wikimedia.org/T415823) (owner: 10Jon Harald Søby) [19:39:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:40:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:42:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:43:34] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P88035 and previous config saved to /var/cache/conftool/dbconfig/20260128-194333-marostegui.json [19:44:26] FIRING: [10x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:45:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P88036 and previous config saved to /var/cache/conftool/dbconfig/20260128-194511-marostegui.json [19:47:13] (03CR) 10Tacsipacsi: BETA: Stop writing to il_to (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [19:47:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:49:10] (03PS2) 10Zabe: BETA: Stop writing to il_to [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) [19:50:16] (03CR) 10Zabe: BETA: Stop writing to il_to (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [19:52:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:54:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [19:58:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P88037 and previous config saved to /var/cache/conftool/dbconfig/20260128-195842-marostegui.json [19:59:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:00:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88038 and previous config saved to /var/cache/conftool/dbconfig/20260128-200018-marostegui.json [20:00:25] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:00:34] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance [20:00:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88039 and previous config saved to /var/cache/conftool/dbconfig/20260128-200042-marostegui.json [20:04:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:09:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:13:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88040 and previous config saved to /var/cache/conftool/dbconfig/20260128-201351-marostegui.json [20:13:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:14:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance [20:14:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:14:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1187 (T415786)', diff saved to https://phabricator.wikimedia.org/P88041 and previous config saved to /var/cache/conftool/dbconfig/20260128-201416-marostegui.json [20:19:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:24:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [20:33:41] (03CR) 10Ottomata: [C:03+1] Update ext-EventStreamConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234457 (https://phabricator.wikimedia.org/T415638) (owner: 10Joal) [20:35:03] (03PS2) 10Mstyles: CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy) [20:35:15] (03PS3) 10Mstyles: CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy) [20:36:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88042 and previous config saved to /var/cache/conftool/dbconfig/20260128-203623-marostegui.json [20:36:29] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:36:41] (03CR) 10CI reject: [V:04-1] CommonSettings.php: Stop loading WebAuthn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233679 (https://phabricator.wikimedia.org/T303495) (owner: 10Reedy) [20:49:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T415786)', diff saved to https://phabricator.wikimedia.org/P88043 and previous config saved to /var/cache/conftool/dbconfig/20260128-204917-marostegui.json [20:49:24] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [20:51:31] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P88044 and previous config saved to /var/cache/conftool/dbconfig/20260128-205130-marostegui.json [20:54:40] thcipriani: Re. deleting l10n_cache-en.cdb for T415725 and having scap re-build and re-deploy, that sounds like a good attempt. [20:54:40] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [20:59:48] James_F: thanks! I'm still curious: what's the theory there? Like what disagrees with the cdb on disk? Is it the cdb on disk? [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2100). [21:00:05] cjming and Jhs: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:08] thcipriani: I'm mostly out of ideas and flailing at this point. [21:00:58] thcipriani: I'm… more than a little surprised that the Translate extension's content cache corruption is *not* fixed by running the Translate script to fix its cache, but *is* fixed (at least for aude) by wiping the i18n cache. [21:01:15] James_F: thank you for your candor. I still haven't actually caught one of these things, the pages seem to get fixed(?) before I can check them out. [21:02:23] thcipriani: I get one right now at e.g. https://www.mediawiki.org/wiki/MediaWiki_1.39/fr [21:03:22] :/ no error on the page as a user for me, logged in or logged out [21:03:37] yeah, same. this is just generally weird. [21:03:48] Maybe it's DC-related somehow? [21:03:57] Which'd be even worse to debug. [21:03:59] like if these are fixing themselves wouldn't we expect the rate to fall off over time? [21:04:12] i'm intuiting all bets are off with deploying until current issues are resolved? [21:04:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P88045 and previous config saved to /var/cache/conftool/dbconfig/20260128-210426-marostegui.json [21:04:31] if l10n rebuild is fixing locally, there's like...a buncha things: content cache, messageBlobStore, cdb files, ...etc? so like a noop sync would purge the message blob store. [21:04:41] Aha. [21:04:58] cjming: yeah, at this point i think we should prioritize train ops, given that it's weds and we're still on testwikis. [21:05:09] if I use k8s-mwdebug-eqiad it fatals. If I use k8s-mwdebug-codfw it renders correctly. [21:05:17] Have we somehow got a cache split? [21:05:24] oh goodie [21:05:27] * brennen raises eyebrow [21:05:34] If so, that's… worse. [21:06:34] Same differential (works on codfw, dies on eqiad) for k8s-mwdebug-next-* and k8s-mwdebug-experimental-* for me. [21:06:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P88046 and previous config saved to /var/cache/conftool/dbconfig/20260128-210639-marostegui.json [21:06:48] This feels like it's a clue. [21:06:51] it do [21:06:58] I'll put it on the task. [21:08:51] reproduced that with mwdebug [21:09:37] So… this is not the i18n cache that scap touches, as that's bit-identical from the docker image between DCs. [21:09:55] The fact that re-building the i18n cache worked for aude must have had a secondary effect somehow? [21:11:25] We could try it anyway, but it'll take ~45 mins and might well not fix anything. [21:17:45] !log dancy@deploy2002 Installing scap version "4.239.0" for 2 host(s) [21:19:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P88047 and previous config saved to /var/cache/conftool/dbconfig/20260128-211934-marostegui.json [21:19:35] !log dancy@deploy2002 Installation of scap version "4.239.0" completed for 2 hosts [21:20:05] The hammer is now easily accessible (via `scap sync-world --force-l10n-update`) should you choose to use it. [21:20:53] thanks dancy [21:21:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T415786)', diff saved to https://phabricator.wikimedia.org/P88048 and previous config saved to /var/cache/conftool/dbconfig/20260128-212148-marostegui.json [21:21:55] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:21:57] dancy: <# [21:21:59] er <3 [21:22:05] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance [21:22:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2193 (T415786)', diff saved to https://phabricator.wikimedia.org/P88049 and previous config saved to /var/cache/conftool/dbconfig/20260128-212213-marostegui.json [21:23:25] dancy: you missed an opportunity to name that feature flag `--make-scap-so-much-slower` :) [21:24:06] hmm. maybe we could add dollar signs to the end of the flag to indicate its cost [21:24:52] Please deposit 3 wikicoin to continue... [21:30:02] i lean towards using the hammer since we don't have other ideas and we're burning time trying to debug anyhow [21:31:59] going that direction after discussion with thcipriani [21:32:10] for want of other options, let's rebuild l10n :) [21:34:01] !log brennen@deploy2002 Started scap sync-world: Syncing with --force-l10n-update to see if it clears out T415725 [21:34:06] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [21:34:12] Good luck! [21:34:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T415786)', diff saved to https://phabricator.wikimedia.org/P88050 and previous config saved to /var/cache/conftool/dbconfig/20260128-213443-marostegui.json [21:34:48] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:35:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance [21:35:35] jouncebot nowandnext [21:35:36] For the next 0 hour(s) and 24 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2100) [21:35:36] In 0 hour(s) and 24 minute(s): Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2200) [21:37:58] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl1004.eqiad.wmnet, wikikube-ctrl1002.eqiad.wmnet, wikikube-ctrl1003.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [21:38:58] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:46:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:51:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [21:51:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T415786)', diff saved to https://phabricator.wikimedia.org/P88051 and previous config saved to /var/cache/conftool/dbconfig/20260128-215133-marostegui.json [21:51:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [21:52:15] (03PS1) 10Reedy: Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234516 (https://phabricator.wikimedia.org/T415834) [21:53:11] (03PS1) 10Reedy: Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234517 (https://phabricator.wikimedia.org/T415834) [21:53:18] jouncebot: nowandnext [21:53:19] For the next 0 hour(s) and 6 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2100) [21:53:19] In 0 hour(s) and 6 minute(s): Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2200) [21:55:10] this sync-world will probably take another ~20 [21:56:18] heh [21:57:50] we need a scap ETA bot [21:59:32] I assume the deployment window is out... Should I just reschedule? [21:59:58] Jhs: yeah, that'd probably be best at this point. my apologies for the inconvenience, it's been an unusually rocky week for train. [22:00:05] Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2200) [22:00:23] (well, unusual for these days.) [22:00:31] Or you can wait a little bit and we may be ok :P [22:00:33] brennen, no worries, shit happens :) [22:00:44] Reedy, yeah, i'll be around in case ^^ [22:01:04] my two patches should be relatively quick to get out first [22:01:25] rolling train forward should also be relatively quick, assuming nothing blows up. [22:01:42] well, and assuming this sync actually fixes the error rate. [22:04:33] (03CR) 10Reedy: [C:03+2] Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234516 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:04:35] (03CR) 10Reedy: [C:03+2] Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234517 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:06:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P88052 and previous config saved to /var/cache/conftool/dbconfig/20260128-220642-marostegui.json [22:08:14] (03PS1) 10Clare Ming: Remove old ssh key for cjming [puppet] - 10https://gerrit.wikimedia.org/r/1234520 [22:10:09] !log brennen@deploy2002 Finished scap sync-world: Syncing with --force-l10n-update to see if it clears out T415725 (duration: 36m 46s) [22:10:14] T415725: TypeError: MediaWiki\Extension\Translate\MessageGroupProcessing\CachedMessageGroupFactoryLoader::MediaWiki\Extension\Translate\MessageGroupProcessing\{closure}(): Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given - https://phabricator.wikimedia.org/T415725 [22:10:21] survey says.. [22:10:55] well, crap. [22:11:18] made it worse? or no better? [22:11:28] i don't think there's any difference. [22:12:03] :( [22:12:21] shall i go ahead and do the symfony patches? [22:12:32] !log roll restart druid middle managers on an-druid*, daemons stuck since the 21st (probably due to the upgrade) - T415799 [22:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:37] T415799: Since upgrade Druid realtime-ingestion tasks replication fails - https://phabricator.wikimedia.org/T415799 [22:12:43] Still going through CI, but feel free to :) [22:15:22] (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy2002 using scap backport" [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234517 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:15:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by brennen@deploy2002 using scap backport" [vendor] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234516 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:15:33] (03Merged) 10jenkins-bot: Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234516 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:15:40] (03Merged) 10jenkins-bot: Upgrade symfony/* [vendor] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234517 (https://phabricator.wikimedia.org/T415834) (owner: 10Reedy) [22:16:17] !log brennen@deploy2002 Started scap sync-world: Backport for [[gerrit:1234517|Upgrade symfony/* (T415834)]], [[gerrit:1234516|Upgrade symfony/* (T415834)]] [22:16:23] T415834: CVE-2026-24739: Symfony's incorrect argument escaping under MSYS2/Git Bash can lead to destructive file operations on Windows - https://phabricator.wikimedia.org/T415834 [22:20:24] !log brennen@deploy2002 brennen, reedy: Backport for [[gerrit:1234517|Upgrade symfony/* (T415834)]], [[gerrit:1234516|Upgrade symfony/* (T415834)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:21:51] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P88053 and previous config saved to /var/cache/conftool/dbconfig/20260128-222150-marostegui.json [22:22:17] !log brennen@deploy2002 brennen, reedy: Continuing with sync [22:28:31] !log brennen@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234517|Upgrade symfony/* (T415834)]], [[gerrit:1234516|Upgrade symfony/* (T415834)]] (duration: 12m 15s) [22:28:36] T415834: CVE-2026-24739: Symfony's incorrect argument escaping under MSYS2/Git Bash can lead to destructive file operations on Windows - https://phabricator.wikimedia.org/T415834 [22:29:04] one more thing checked off... [22:29:21] Jhs: Could do yours... cjming around too? [22:29:29] Reedy, sweet [22:30:04] (03CR) 10Reedy: [C:03+2] Enable VE in Project namespace on mswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234484 (https://phabricator.wikimedia.org/T415823) (owner: 10Jon Harald Søby) [22:30:16] (03CR) 10Reedy: [C:03+2] Removed `mpic` as local service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) (owner: 10Santiago Faci) [22:31:02] (03Merged) 10jenkins-bot: Enable VE in Project namespace on mswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234484 (https://phabricator.wikimedia.org/T415823) (owner: 10Jon Harald Søby) [22:31:35] (03Merged) 10jenkins-bot: Removed `mpic` as local service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) (owner: 10Santiago Faci) [22:34:10] !log reedy@deploy2002 Started scap sync-world: Backport for [[gerrit:1234484|Enable VE in Project namespace on mswiktionary (T415823)]], [[gerrit:1233670|Removed `mpic` as local service (T407805)]] [22:34:17] T415823: Enable Visual Editor on mobile for Malay Wiktionary project namespace - https://phabricator.wikimedia.org/T415823 [22:34:18] T407805: Rename mpic.wikimedia.org - https://phabricator.wikimedia.org/T407805 [22:36:27] !log reedy@deploy2002 sfaci, jhsoby, reedy: Backport for [[gerrit:1234484|Enable VE in Project namespace on mswiktionary (T415823)]], [[gerrit:1233670|Removed `mpic` as local service (T407805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:37:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T415786)', diff saved to https://phabricator.wikimedia.org/P88054 and previous config saved to /var/cache/conftool/dbconfig/20260128-223659-marostegui.json [22:37:05] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [22:37:16] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance [22:37:44] Jhs: Do you need to test? I don't mind either way ;) [22:37:48] Reedy, confirmed on mwdebug [22:38:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [22:38:11] !log reedy@deploy2002 sfaci, jhsoby, reedy: Continuing with sync [22:42:41] The connection to the server kubemaster.svc.eqiad.wmnet:6443 was refused - did you specify the right host or port? [22:42:44] well that's not fun [22:43:00] failed a few times and seems to be continuing... [22:43:14] i don't know if that's normal or not. [22:43:33] can't say I've seen that before [22:43:52] !log reedy@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234484|Enable VE in Project namespace on mswiktionary (T415823)]], [[gerrit:1233670|Removed `mpic` as local service (T407805)]] (duration: 09m 42s) [22:43:57] just had to mash enter a few times to get scap to continue [22:43:59] T415823: Enable Visual Editor on mobile for Malay Wiktionary project namespace - https://phabricator.wikimedia.org/T415823 [22:43:59] T407805: Rename mpic.wikimedia.org - https://phabricator.wikimedia.org/T407805 [22:44:53] * Reedy dumps it in phab [22:45:21] mashing enter to get scap to continue, at least, is a time honored tradition. [22:45:39] * brennen goes afk for a few. [22:50:29] uhhh all the errors stopped [22:50:33] not all the errors [22:50:42] the error that's been holding up the train [22:51:19] ever since Reedy finished that last deploy [22:52:17] https://phabricator.wikimedia.org/T415725 went from like 70/min to 0 [22:52:28] Congrats! [22:52:42] (03CR) 10Zabe: BETA: Stop writing to il_to (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234468 (https://phabricator.wikimedia.org/T415787) (owner: 10Zabe) [22:52:52] yw? [22:52:52] well except I'm pretty sure the deploy was unrelated? [22:52:58] Almost certainly [22:53:13] so that's...weird [22:53:14] could be some caching TTL expiring after the rebuild [22:53:28] it lines up so perfectly with your deploy [22:53:32] random unsolicited thought: it's been ~24hrs since https://sal.toolforge.org/log/CZGmAZwBvg159pQr64Gs, maybe a ~24hr cache somewhere? [22:53:43] thcipriani: get brennen back from his walk to move the train right now [22:53:49] hahaha [22:55:12] the very last error happened 22:42:30 [22:55:48] we do purge the messageblobstore at the end of scap. But I'd guess triggering that l10n rebuild did that too [22:56:39] blerg. computers. [22:58:35] * brennen returneth [22:59:26] welp, gonna move the train i guess, while muttering to myself [23:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260128T2300) [23:00:57] my thought is probably just move to group0, let it bake in a bit, and do group1 first thing in my thursday morning. [23:01:06] (03PS1) 10TrainBranchBot: group0 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234534 (https://phabricator.wikimedia.org/T413804) [23:01:09] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by brennen@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234534 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [23:01:37] brennen: let me know when I you are done with the train. I have one backport I need to do [23:01:46] Jdlrobson: ack, will do. [23:02:51] (03Merged) 10jenkins-bot: group0 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234534 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [23:03:53] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:08:58] !log brennen@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.13 refs T413804 [23:09:07] T413804: 1.46.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T413804 [23:11:36] A_smart_kitten: CachedMessageGroupFactoryLoader: private const CACHE_TTL = ExpirationAwareness::TTL_DAY; so there you go [23:14:12] things are pretty quiet on group0. thcipriani, whaddya think - move forward to group1 now? [23:15:04] brennen: it's your train, it does look pretty quiet [23:15:09] no objections [23:15:18] * thcipriani helpful [23:15:23] i might regret this, but it would be nice to actually have some logs to triage in the morning i guess. [23:15:52] (03PS1) 10TrainBranchBot: group1 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234536 (https://phabricator.wikimedia.org/T413804) [23:15:56] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by brennen@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234536 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [23:16:00] gotta keep the logs flowing [23:16:14] presumably, logs that don't say "Argument #1 ($value) must be of type DependencyWrapper, __PHP_Incomplete_Class given" :P [23:16:25] preferably not [23:16:26] A_smart_kitten: yes, specifically ones that do not say that. :D [23:16:44] (03Merged) 10jenkins-bot: group1 to 1.46.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234536 (https://phabricator.wikimedia.org/T413804) (owner: 10TrainBranchBot) [23:16:48] well, in an ideal world there would be no new ones because we're so good at this [23:16:54] but on the evidence... [23:17:18] naming things is hard [23:17:32] renaming things (in software) is even harder [23:17:56] smash renaming things together with caching and you're bound to have a good time [23:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:22:52] !log brennen@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.13 refs T413804 [23:22:57] T413804: 1.46.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T413804 [23:23:43] (03PS1) 10Bartosz Dziewoński: Configure rate limit class for local and global bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234538 (https://phabricator.wikimedia.org/T415588) [23:27:30] Jdlrobson: this is looking pretty good. i'd say go ahead with your backport. [23:30:51] thanks brennen [23:32:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jdlrobson@deploy2002 using scap backport" [extensions/Math] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234442 (https://phabricator.wikimedia.org/T415577) (owner: 10Jdlrobson) [23:35:15] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance [23:35:24] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88055 and previous config saved to /var/cache/conftool/dbconfig/20260128-233523-marostegui.json [23:35:26] (03CR) 10Clare Ming: [C:03+1] "@reedy@wikimedia.org tysm!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1233670 (https://phabricator.wikimedia.org/T407805) (owner: 10Santiago Faci) [23:35:28] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [23:39:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:43:51] (03Merged) 10jenkins-bot: Replace width with max-width in ext.math.less [extensions/Math] (wmf/1.46.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1234442 (https://phabricator.wikimedia.org/T415577) (owner: 10Jdlrobson) [23:44:26] !log jdlrobson@deploy2002 Started scap sync-world: Backport for [[gerrit:1234442|Replace width with max-width in ext.math.less (T415577)]] [23:44:26] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:44:33] T415577: increasing browser scale/decreasing window width causes punctuation marks after formulas go on the next line - https://phabricator.wikimedia.org/T415577 [23:46:34] !log jdlrobson@deploy2002 jdlrobson: Backport for [[gerrit:1234442|Replace width with max-width in ext.math.less (T415577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [23:48:46] !log jdlrobson@deploy2002 jdlrobson: Continuing with sync [23:52:52] !log jdlrobson@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234442|Replace width with max-width in ext.math.less (T415577)]] (duration: 08m 26s) [23:52:57] T415577: increasing browser scale/decreasing window width causes punctuation marks after formulas go on the next line - https://phabricator.wikimedia.org/T415577