[00:25:26] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: `gitlab1001.wikimedia.org` - gitlab1001.wikimedia.org (**PA... [00:30:26] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Dzahn) [00:34:55] brennen: It might be a good time to pick up php7.4 again as default for mw-docker. I recall you looking at that some point, though I don't see it on the board at https://phabricator.wikimedia.org/tag/mediawiki-docker/ now. [00:35:01] Prod is well underway with the transition now [00:35:08] and beta as well. [01:34:00] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Dzahn) [03:14:18] 10Beta-Cluster-Infrastructure: Error: 502, Next Hop Connection Failed (Jul 2022) - https://phabricator.wikimedia.org/T312252 (10AlexisJazz) ..**now** it works.. [04:28:01] yeah, have been thinking similar re: php7.4. i'll update some things on the morrow. [05:20:58] CI failing with ENOSPC https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/159772/console [06:14:34] how do you tell what physical host a job runs on? is it always contint1001? [06:14:38] Project mediawiki-core-doxygen-docker build #35472: 04FAILURE in 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35472/ [06:19:13] TimStarling: at the top of the log "Building remotely on integration-agent-docker-1039 (pipelinelib Docker blubber) in workspace /srv/jenkins/workspace/wmf-quibble-selenium-php72-docker@3" [06:19:20] most jobs run on cloud VMs [06:19:37] where is the cloud [06:19:45] and in theory, jenkins is supposed to automatically wipe stuff to restore disk space when it runs out [06:19:49] our cloud :) [06:20:11] https://openstack-browser.toolforge.org/server/integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud [06:20:29] you should be able to ssh into that [06:20:36] apparently there is a cron job which wipes images when they get to 85% full or so, which sounds unreliable [06:21:37] I think it's https://integration.wikimedia.org/ci/view/All/job/maintenance-disconnect-full-disks/ [06:22:22] https://integration.wikimedia.org/ci/computer/ doesn't show any problems with 1039 so it might already be fixed now [06:22:32] yeah, I found the groovy source before I got distracted [06:23:02] trying to merge Daniel's changes but they conflict with each other [06:56:22] (03CR) 10Jaime Nuche: [C: 03+2] Clean up php fpm restart [tools/scap] - 10https://gerrit.wikimedia.org/r/811373 (https://phabricator.wikimedia.org/T266055) (owner: 10Ahmon Dancy) [07:01:03] (03Merged) 10jenkins-bot: Clean up php fpm restart [tools/scap] - 10https://gerrit.wikimedia.org/r/811373 (https://phabricator.wikimedia.org/T266055) (owner: 10Ahmon Dancy) [07:17:53] Yippee, build fixed! [07:17:53] Project mediawiki-core-doxygen-docker build #35473: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35473/ [08:14:54] Project mediawiki-core-doxygen-docker build #35474: 04FAILURE in 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35474/ [08:34:06] Amir1: https://phabricator.wikimedia.org/feed/7117545193276688502/ [08:35:39] oh no [08:35:48] taavi: yup, I had to reenable it twice so far [08:36:03] third time now [08:36:09] https://phabricator.wikimedia.org/people/manage/20152/ [08:37:43] taavi: btw, if you're curios, this is why: https://phabricator.wikimedia.org/T311866 [08:38:12] that's quite a lot of tasks :P [09:18:53] Yippee, build fixed! [09:18:54] Project mediawiki-core-doxygen-docker build #35475: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35475/ [09:19:05] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10MaryMunyoki) Hello @Aklapper. I am a Technical Program Manager for the Language & Inuka teams. I would like to be able to create milestones for the sprint work for the Language team.... [09:48:43] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [09:53:43] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [10:07:47] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) @MaryMunyoki Hi, I've added you. //Usual disclaimer: Please follow the [guidelines](https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Creating_new_p... [10:13:43] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [11:15:13] Project mediawiki-core-doxygen-docker build #35477: 04FAILURE in 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35477/ [11:43:57] (03CR) 10Jaime Nuche: [C: 04-1] Set umask in file creating cmds (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811785 (owner: 10Jeena Huneidi) [12:12:14] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Puppet failure on integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T312534 (10hashar) [12:21:18] Yippee, build fixed! [12:21:18] Project mediawiki-core-doxygen-docker build #35478: 09FIXED in 16 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/35478/ [12:22:09] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Puppet failure on integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T312534 (10hashar) The CI agent is up and running at https://integration.wikimedia.org/ci/computer/integration%2Dagent%2Ddo... [12:22:18] !log integration: rebooting `integration-agent-docker-1039` T312534 [12:22:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:22:20] T312534: Puppet failure on integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T312534 [12:31:58] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Puppet failure on integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T312534 (10hashar) 05Open→03Resolved a:03hashar The command failing is: ` /usr/sbin/usermod -aG docker 'jenkins-deplo... [13:10:26] (03PS1) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) [13:18:20] (03PS1) 10Jaime Nuche: scap startup: signal scap not to check for Python virtual environments [tools/train-dev] - 10https://gerrit.wikimedia.org/r/811992 (https://phabricator.wikimedia.org/T310858) [14:35:46] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T308072 (10Urbanecm_WMF) @jnuche FYI, I filled {T312544} earlier today about an error in #growthexperiments. That error is not related t... [14:39:24] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T308072 (10jnuche) @Urbanecm_WMF thanks! [14:58:53] (03CR) 10Ahmon Dancy: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:00:06] (03CR) 10Ahmon Dancy: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:07:22] (03CR) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:09:25] (03CR) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:09:35] (03CR) 10Ahmon Dancy: "LGTM" [tools/scap] - 10https://gerrit.wikimedia.org/r/811785 (owner: 10Jeena Huneidi) [15:14:22] (03CR) 10Ahmon Dancy: startup: warn user if not running from a Python virtual environment (032 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:22:54] (03PS2) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) [15:23:14] (03PS2) 10Jaime Nuche: scap startup: signal scap not to check for Python virtual environments [tools/train-dev] - 10https://gerrit.wikimedia.org/r/811992 (https://phabricator.wikimedia.org/T310858) [15:24:01] (03CR) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:24:37] (03CR) 10Ahmon Dancy: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:25:54] (03PS3) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) [15:26:19] (03CR) 10Jaime Nuche: startup: warn user if not running from a Python virtual environment (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:26:42] (03CR) 10Ahmon Dancy: [C: 03+2] startup: warn user if not running from a Python virtual environment [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:26:44] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T308072 (10cscott) T308072 is a regression in wmf.19 and crashes a handful of pages with multiple instances of identical `` tags... [15:32:43] (03Merged) 10jenkins-bot: startup: warn user if not running from a Python virtual environment [tools/scap] - 10https://gerrit.wikimedia.org/r/811987 (https://phabricator.wikimedia.org/T303559) (owner: 10Jaime Nuche) [15:35:52] (03CR) 10Ahmon Dancy: scap startup: signal scap not to check for Python virtual environments (031 comment) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/811992 (https://phabricator.wikimedia.org/T310858) (owner: 10Jaime Nuche) [15:36:39] (03CR) 10Ahmon Dancy: scap startup: signal scap not to check for Python virtual environments (031 comment) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/811992 (https://phabricator.wikimedia.org/T310858) (owner: 10Jaime Nuche) [16:41:42] (03PS1) 10Ahmon Dancy: Remove wmf-beta-autoupdate subcommand [tools/scap] - 10https://gerrit.wikimedia.org/r/812034 [16:47:15] !log deployment-prep: wikiadmin@172.16.3.206(enwiki)> delete from growthexperiments_mentor_mentee where gemm_mentor_id=93651; # testing a specific workflow in Special:MentorDashboard [16:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:51:27] 10GitLab (CI & Job Runners), 10serviceops, 10serviceops-collab: DNS/networking not working on Trusted Runners - https://phabricator.wikimedia.org/T311241 (10dduvall) Thanks for explaining that, @Jelto ! I was pulling my hair out the other day trying to troubleshoot. Would it be possible to move that script... [16:59:56] 10GitLab (CI & Job Runners), 10serviceops, 10serviceops-collab: DNS/networking not working on Trusted Runners - https://phabricator.wikimedia.org/T311241 (10dduvall) >>! In T311241#8062475, @dduvall wrote: > (The only outlier at that point would be the default docker image which doesn't seem re-configurable.... [20:10:33] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Dzahn) [20:11:30] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Dzahn) old VMs completely gone now. all decom boxes checked. [20:11:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10SRE, 10serviceops, and 2 others: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@c... [20:13:15] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10SRE, 10serviceops, and 2 others: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) :) yw doc1001.eqiad.wmnet has now been destroyed (via decom cook... [20:16:35] 10Continuous-Integration-Infrastructure, 10OOUI, 10Performance-Team: Demos page for OOUI in php is broken - https://phabricator.wikimedia.org/T297035 (10Dzahn) [20:17:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10SRE, 10serviceops, and 2 others: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) 05In progress→03Resolved the original ticket is resolved. doc... [21:10:41] !log clear stuck beta deployment jobs, T72597 [21:10:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:10:43] T72597: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597 [21:54:01] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 (10Dreamy_Jazz) See T305093#8063451 - autogenerated LocalSettings.php included a empty string for $wgLocaltimezone. With https://gerrit.wikimedia.org/... [22:04:26] (03PS4) 10Jeena Huneidi: Set umask in file creating cmds [tools/scap] - 10https://gerrit.wikimedia.org/r/811785 [22:06:58] (03CR) 10Jeena Huneidi: Set umask in file creating cmds (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/811785 (owner: 10Jeena Huneidi) [22:40:09] (03PS3) 10Ahmon Dancy: Many changes to support mwpresync [tools/train-dev] - 10https://gerrit.wikimedia.org/r/810384 [22:42:26] !log clear stuck beta deployment jobs (again), T72597 [22:42:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:42:28] T72597: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597 [22:46:07] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 (10Krinkle) @Dreamy_Jazz This task represents the current weekly production deployment of MediaWiki to Wikipedia.org and other WMF wikis. Afaik this d... [22:47:13] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 (10Dreamy_Jazz) I wasn't sure if wgLocaltimezone was always set, but if it is then that shouldn't be an issue here. [22:48:14] (03CR) 10Ahmon Dancy: [C: 04-2] "Not quite ready." [tools/train-dev] - 10https://gerrit.wikimedia.org/r/810384 (owner: 10Ahmon Dancy) [22:48:28] TheresNoTime: T72597 is a variant of the song that never ends :(( [22:48:29] T72597: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597 [22:49:49] (03CR) 10CI reject: [V: 04-1] Many changes to support mwpresync [tools/train-dev] - 10https://gerrit.wikimedia.org/r/810384 (owner: 10Ahmon Dancy) [22:51:25] I think I'm slowly narrowing down what triggers it.... I ended up writing some Python to ping me when a deployment job goes overdue so I can clear the stuck job, been a good few times the last month :p [22:53:21] Next step: Fixing it! [22:54:23] It's my understanding that it's a bad interaction between jobs that are triggered by Zuul and those triggered by Jenkins (e.g, by job schedule). [22:55:50] ahaha nope :3 [22:55:59] (nope to fixing it!) [22:56:28] I think you're right, yes.. namely the config update job it seems [22:59:55] (03PS4) 10Ahmon Dancy: Many changes to support mwpresync [tools/train-dev] - 10https://gerrit.wikimedia.org/r/810384 [23:01:46] (03PS1) 10Ahmon Dancy: git.clone_or_update_repo: Set core.sharedRepository=group [tools/scap] - 10https://gerrit.wikimedia.org/r/812099 [23:24:56] 10Phabricator: Archive my extra user account - https://phabricator.wikimedia.org/T312607 (10Bethany) [23:25:54] (03CR) 10Ahmon Dancy: [C: 03+1] Many changes to support mwpresync [tools/train-dev] - 10https://gerrit.wikimedia.org/r/810384 (owner: 10Ahmon Dancy)