[00:02:08] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [00:19:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [00:28:18] (03PS1) 10Tim Starling: Add the new profiling libraries [integration/config] - 10https://gerrit.wikimedia.org/r/874959 (https://phabricator.wikimedia.org/T291015) [01:56:36] (03CR) 10CI reject: [V: 04-1] zuul/parameter_functions.py: Make phan work for Phonos [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) (owner: 10Dmaza) [02:00:19] (03PS2) 10Dmaza: zuul/parameter_functions.py: Make phan work for Phonos Here is the patch that needs it Ia6a5ae10401d7b6a38dc74603bf2661543507441 [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) [02:03:14] (03CR) 10CI reject: [V: 04-1] zuul/parameter_functions.py: Make phan work for Phonos Here is the patch that needs it Ia6a5ae10401d7b6a38dc74603bf2661543507441 [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) (owner: 10Dmaza) [02:06:00] (03PS3) 10Dmaza: zuul/parameter_functions.py: Make phan work for Phonos [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) [02:44:06] (03CR) 10Tim Starling: [C: 03+2] Add the new profiling libraries [integration/config] - 10https://gerrit.wikimedia.org/r/874959 (https://phabricator.wikimedia.org/T291015) (owner: 10Tim Starling) [02:46:02] (03Merged) 10jenkins-bot: Add the new profiling libraries [integration/config] - 10https://gerrit.wikimedia.org/r/874959 (https://phabricator.wikimedia.org/T291015) (owner: 10Tim Starling) [02:47:58] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/874959 [02:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [02:59:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [03:19:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [03:26:38] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [08:39:43] 10Beta-Cluster-Infrastructure, 10ChangeProp: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 (10dcausse) [09:43:30] (03CR) 10Hashar: [C: 03+2] "Ooops" [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) (owner: 10Dmaza) [09:45:25] (03Merged) 10jenkins-bot: zuul/parameter_functions.py: Make phan work for Phonos [integration/config] - 10https://gerrit.wikimedia.org/r/874964 (https://phabricator.wikimedia.org/T322368) (owner: 10Dmaza) [09:48:40] (03PS1) 10DCausse: cirrus-streaming-updater: test with java8 [integration/config] - 10https://gerrit.wikimedia.org/r/875269 [09:51:09] (03CR) 10CI reject: [V: 04-1] cirrus-streaming-updater: test with java8 [integration/config] - 10https://gerrit.wikimedia.org/r/875269 (owner: 10DCausse) [09:59:47] (03PS2) 10DCausse: cirrus-streaming-updater: test with java8 [integration/config] - 10https://gerrit.wikimedia.org/r/875269 [10:22:03] (03CR) 10Hashar: [C: 03+2] "I have created the job:" [integration/config] - 10https://gerrit.wikimedia.org/r/875269 (owner: 10DCausse) [10:22:26] hashar: thanks! :) [10:23:54] (03Merged) 10jenkins-bot: cirrus-streaming-updater: test with java8 [integration/config] - 10https://gerrit.wikimedia.org/r/875269 (owner: 10DCausse) [10:25:06] dcausse: deployed and I have done a `recheck` on https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/871227 [10:26:53] hashar: thanks! I think the patch will still fail (there's a new spotbug violation to fix). But the failure should be identical in both builds [10:27:34] 00:00:26.433 [ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.27.2:check (default) on project cirrus-streaming-updater-parent: Execution default of goal com.diffplug.spotless:spotless-maven-plugin:2.27.2:check failed: You are running Spotless on JVM 8, which limits you to google-java-format 1.7. [10:27:35] :( [10:27:45] good luck! [10:28:03] meh [10:28:09] thanks! will take a look :) [10:51:33] (03CR) 10Hashar: "The Zuul test-prio pipeline is the same as the test pipeline but the jobs have a high precedence and are thus triggered before others. It" [integration/config] - 10https://gerrit.wikimedia.org/r/874926 (owner: 10Majavah) [11:57:38] Is login on beta broken for anyone else? [13:21:04] 10Beta-Cluster-Infrastructure, 10SRE: cannot curl to wiki from beta mw appservers - https://phabricator.wikimedia.org/T278599 (10LSobanski) 05Open→03Resolved a:03LSobanski Resolving based on the previous comment. Please reopen if this is not a satisfactory solution. [13:47:17] Platonides: the "unrelated" task is where the given repo spammed noise [14:05:18] 10Beta-Cluster-Infrastructure, 10ChangeProp: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 (10CBogen) [14:07:18] 10GitLab (Infrastructure), 10Data-Persistence-Backup, 10serviceops-collab, 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) I picked up the GitLab backup task again because we hit around 90% disk usage on the GitLab backup volume during backup c... [16:41:12] (03CR) 10Thcipriani: [C: 03+2] deploy_artifacts: add dry run mode [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/868461 (owner: 10Hashar) [16:41:23] (03CR) 10Thcipriani: [C: 03+2] deploy_artifacts: --version is a required option [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/868462 (owner: 10Hashar) [16:42:53] (03Merged) 10jenkins-bot: deploy_artifacts: add dry run mode [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/868461 (owner: 10Hashar) [16:42:56] (03Merged) 10jenkins-bot: deploy_artifacts: --version is a required option [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/868462 (owner: 10Hashar) [17:40:01] 10Release-Engineering-Team, 10Scap: Save K8s image build logs - https://phabricator.wikimedia.org/T323939 (10dancy) 05Open→03Resolved a:03dancy Scap 4.31.1 has been deployed with this change. [17:40:03] 10Release-Engineering-Team, 10Scap, 10MW-on-K8s: Scap Mediawiki K8s deployments - https://phabricator.wikimedia.org/T318536 (10dancy) [18:31:32] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, and 2 others: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) new admin group `deployment-jenkins` (gid: 838) has been created on deploy* and releases* servers. ` [dep... [18:32:04] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, and 2 others: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) 05Open→03In progress [18:34:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [18:39:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [18:59:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [19:08:26] 10Release-Engineering-Team, 10Scap: `scap backport` should not +2 already merged patches - https://phabricator.wikimedia.org/T326176 (10dancy) 05Open→03Resolved a:03dancy Deployed via scap 4.32.0. [19:16:42] (03CR) 10Jforrester: [C: 03+1] "WFM. Let's ship it?" [integration/config] - 10https://gerrit.wikimedia.org/r/874780 (https://phabricator.wikimedia.org/T321536) (owner: 10Hashar) [19:42:43] !log Ran maintenansce script refreshGlobalimagelinks.php for T322588 [19:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:42:46] T322588: Run `refreshGlobalimagelinks.php --pages=nonexisting` from the GlobalUsage extension - https://phabricator.wikimedia.org/T322588 [19:42:52] (hope i'm doing this right) [21:18:03] Krinkle: ah [21:20:23] (03PS2) 10Krinkle: Speed up integration-config-shellcheck-docker [integration/config] - 10https://gerrit.wikimedia.org/r/874780 (https://phabricator.wikimedia.org/T321536) (owner: 10Hashar) [21:20:54] (03CR) 10Krinkle: [C: 03+1] Speed up integration-config-shellcheck-docker [integration/config] - 10https://gerrit.wikimedia.org/r/874780 (https://phabricator.wikimedia.org/T321536) (owner: 10Hashar) [21:21:00] hashar: very nice! [21:56:54] btw, I've noticed that a significant portition of backport windows have recently ran over the calendar slot now that deploying a single patch takes significantly longer than it used to. should we get worried / try to change something to prevent them from running overtime? [21:59:44] 10GitLab, 10serviceops, 10serviceops-collab, 10Kubernetes: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10dancy) I verified today that trusted runners can now complete a network connection to kubestagemaster.svc.eqiad.wmnet:6443 so that pa... [22:17:01] taavi: we should get all the wikis running out of k8s so that instead of a slow rsync plus a slow container build we only have a slow container build. :) [22:21:04] progress! [22:22:36] bd808: rsync isn't slow, it's the php-fpm restarts that are slow! [22:32:17] let's deinstall php-fpm! [22:51:02] (03PS1) 10Tim Starling: Add Excimer packages to PHP 7.4+ dockerfiles [integration/config] - 10https://gerrit.wikimedia.org/r/875443 (https://phabricator.wikimedia.org/T291015) [23:00:33] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, and 2 others: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) ` Identity added: /etc/keyholder.d/deploy_jenkins (/etc/keyholder.d/deploy_jenkins) ` [23:03:14] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SRE, 10SRE-Access-Requests, and 2 others: New Keyholder identity for RelEng Jenkins service - https://phabricator.wikimedia.org/T324014 (10Dzahn) 05In progress→03Resolved @jnuche This is done now. on both deployment server keyholder has been re-arme... [23:09:38] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T325580 (10Umherirrender) [23:14:45] (03PS1) 10Krinkle: dev: Remove redundant 'variables_order' override [integration/docroot] - 10https://gerrit.wikimedia.org/r/875445 [23:17:14] (03PS2) 10Krinkle: dev: Add Access-Control-Allow-Origin to json files [integration/docroot] - 10https://gerrit.wikimedia.org/r/858612 (owner: 10Hashar) [23:17:26] (03CR) 10Krinkle: "I've documented the use case (Gerrit plugin) based on recent convo. LGTM!" [integration/docroot] - 10https://gerrit.wikimedia.org/r/858612 (owner: 10Hashar) [23:17:31] (03CR) 10Krinkle: [C: 03+2] dev: Remove redundant 'variables_order' override [integration/docroot] - 10https://gerrit.wikimedia.org/r/875445 (owner: 10Krinkle) [23:17:34] (03CR) 10Krinkle: [C: 03+2] dev: Add Access-Control-Allow-Origin to json files [integration/docroot] - 10https://gerrit.wikimedia.org/r/858612 (owner: 10Hashar) [23:19:05] (03Merged) 10jenkins-bot: dev: Remove redundant 'variables_order' override [integration/docroot] - 10https://gerrit.wikimedia.org/r/875445 (owner: 10Krinkle) [23:19:07] (03Merged) 10jenkins-bot: dev: Add Access-Control-Allow-Origin to json files [integration/docroot] - 10https://gerrit.wikimedia.org/r/858612 (owner: 10Hashar) [23:54:35] (03CR) 10Platonides: [C: 04-1] Add Excimer packages to PHP 7.4+ dockerfiles (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/875443 (https://phabricator.wikimedia.org/T291015) (owner: 10Tim Starling)