[01:42:21] !log ladsgroup@deployment-deploy01:~$ foreachwikiindblist all-labs maintenance/migraeRevisionActorTemp.php (T275246) [01:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:42:24] T275246: Populate rev_actor and rev_comment_id - https://phabricator.wikimedia.org/T275246 [04:44:38] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:45:32] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:36:56] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) @Galahad, @diegodlh: I've added you. //Usual disclaimer: Please follow [guidelines](https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Creating_new_p... [08:38:59] 10Release-Engineering-Team (Priority Backlog 📥), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10Peachey88) [09:33:50] 10Release-Engineering-Team (Radar), 10GitLab, 10Security-Team, 10serviceops, 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10MoritzMuehlenhoff) >>! In T295481#7511899, @Dzahn wrote: > @Jelto I [[ https://wikitech.wikimedia.org/wiki/Ganeti#Ver... [09:41:25] o/ hello [09:41:51] I'm getting diskspace / memory errors on some of my patches [09:42:16] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/739753/ [09:42:30] but also on master https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/739756 [09:52:04] It's weird when it's happening across branches, not sure if `Wikibase` is on some special machine for CI? Some other more recent patches seems to pass without problems, maybe it's just temporary. [09:57:37] toan: hi, the jobs are balanced accros all the CI instances [09:57:45] regardless of repo or branch (well to simplify) [09:58:31] gotcha, thanks [09:58:35] on a dummy change like the above the failure is most probably caused by the underlying infra indeed. I am looking [09:58:42] 09:18:45 npm WARN tar TAR_ENTRY_ERROR ENOSPC: no space left on device, write [09:58:43] joy [10:00:09] toan: https://phabricator.wikimedia.org/T292729 is the task [10:00:40] thanks! [10:04:30] 7220 ./wmf-quibble-apache-selenium-php72-docker [10:04:34] 7G grr [10:13:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) **TLDR**, a Wikibase patch cause some job workspace to file 7GB+ of disk space. We... [10:55:27] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Aklapper) I took the liberty to silently re-tag existing server-side upload tickets with #server-side-upload-request and also removing #wikimedia-site-requests from those tickets (which... [11:05:08] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Urbanecm) >>! In T295231#7513107, @Aklapper wrote: > I took the liberty to silently re-tag existing server-side upload tickets with #server-side-upload-request and also removing #wikimed... [11:05:40] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: Undeploy DismissableSiteNotice from Wikimedia Beta Cluster - https://phabricator.wikimedia.org/T262122 (10Urbanecm) @Ammarpad Is this ready to go at any time? Or does it need to be coordinated with someone/something? [11:09:50] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Aklapper) >>! In T295231#7513154, @Urbanecm wrote: >Maybe we should re-consider the tag proposal by @legoktm and edit the project to be a component? Yeah, I'd recommend that. [11:11:15] 10Release-Engineering-Team (Doing), 10GitLab (Support), 10User-brennen: Couldn't fork a gitlab repository - https://phabricator.wikimedia.org/T295468 (10aborrero) I tried with a different repo today, and it worked. Was able to fork https://gitlab.wikimedia.org/taavi/python-flask-keystone into https://gitlab... [11:12:26] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Urbanecm) >>! In T295231#7513171, @Aklapper wrote: >>>! In T295231#7513154, @Urbanecm wrote: >>Maybe we should re-consider the tag proposal by @legoktm and edit the project to be a compo... [11:30:33] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10Lucas_Werkmeister_WMDE) I uploaded [a change](https://gerrit.wikimedia.org/r/c/mediawiki/ex... [11:43:00] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) Probably not, that is a just 210KB more. If we could get Wikibase `view/lib` and `... [11:43:29] toan: Lucas_WMDE: in short the job testing Wikibase with all extensions consume roughly 7.2GBytes disk space (repos + node modules) [11:43:36] so 3 builds consume ~ 21.6 GB [11:43:54] and the Jenkins agents have a 18GB partition [11:44:23] o_0 [11:44:37] 1016 wmf-quibble-selenium-php72-docker/src/extensions/Wikibase/view/lib [11:44:37] 1042 wmf-quibble-selenium-php72-docker/src/extensions/Wikibase/client/data-bridge [11:44:46] I am blaming NodeJS ecosystem :D [11:49:21] !log Restarting CI Jenkins due to maintenance-disconnect-full-disks job being deadlocked [11:49:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:51:31] eek that sounds crazy :S [12:02:25] hashar, we will wait for word from you on the CI restart (backport window is happening) [12:46:02] (03PS3) 10Hashar: jjb: don't mount src twice [integration/config] - 10https://gerrit.wikimedia.org/r/739654 [12:46:37] (03CR) 10Hashar: [C: 03+2] "Caught a few more, deploying!" [integration/config] - 10https://gerrit.wikimedia.org/r/739654 (owner: 10Hashar) [12:47:06] !log Deploying jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/739654 | jjb: don't mount src twice [12:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:47:19] which updates the phan and codehealth jobs [12:48:43] (03Merged) 10jenkins-bot: jjb: don't mount src twice [integration/config] - 10https://gerrit.wikimedia.org/r/739654 (owner: 10Hashar) [12:51:54] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:54:40] hashar: I just got a Phan failure on a change that shouldn’t affect Phan at all https://integration.wikimedia.org/ci/job/mwext-php72-phan-docker/149230/console [12:54:47] Invalid working directory specified, /src/extensions/Wikibase does not exist. [12:58:06] damn, and apitests failed with ENOSPC even though I only pushed one Wikibase change :( [12:59:22] hashar: ^ [12:59:43] Your change was supposed to make it not be mounted twice but it seems it's done not at all [13:04:47] oh no [13:05:03] * hashar rolls back [13:05:32] !log Rolledback Jenkins jobs update https://gerrit.wikimedia.org/r/c/integration/config/+/739654 | jjb: don't mount src twice [13:05:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:05:53] Lucas_WMDE: RhinosF1 yeah hmm so I definitely broke it sorry [13:06:07] Np [13:06:20] Today feels like the day to hide in an corner [13:06:45] (03PS1) 10Hashar: Revert "jjb: don't mount src twice" [integration/config] - 10https://gerrit.wikimedia.org/r/739640 [13:06:57] (03CR) 10Hashar: [C: 03+2] "rollback deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/739640 (owner: 10Hashar) [13:07:56] hashar: thanks [13:08:26] I am trying to make the CI jobs slightly simpler [13:08:30] but that is not that easy :-\ [13:08:43] (03Merged) 10jenkins-bot: Revert "jjb: don't mount src twice" [integration/config] - 10https://gerrit.wikimedia.org/r/739640 (owner: 10Hashar) [13:13:25] (03PS1) 10Hashar: jjb: don't mount src twice (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/739795 [13:19:26] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10diegodlh) Thank you, @Aklapper. I'll certainly follow the guidelines and ask if unsure. Have a great day! [13:57:41] (03PS1) 10Jbond: operations-puppet-catalog-compiler: increase number of threads to 4 [integration/config] - 10https://gerrit.wikimedia.org/r/739799 [14:03:09] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10toan) >>! In T292729#7512677, @hashar wrote: > **TLDR**, a Wikibase patch cause some job wo... [14:07:45] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10WMDE-leszek) @hashar WMDE appreciates any magic changes/boosts to the infrastructure that y... [14:23:46] 10Release-Engineering-Team (Radar), 10GitLab, 10Security-Team, 10serviceops, 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10akosiaris) >>! In T295481#7512587, @MoritzMuehlenhoff wrote: >>>! In T295481#7511899, @Dzahn wrote: >> @Jelto I [[ ht... [14:31:29] Project beta-scap-sync-world build #27752: 04FAILURE in 7 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27752/ [14:36:50] Yippee, build fixed! [14:36:51] Project beta-scap-sync-world build #27753: 09FIXED in 2 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/27753/ [14:41:45] !log Updating Jenkins jobs with https://gerrit.wikimedia.org/r/739795 jjb: don't mount src twice (take 2) [14:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:42:59] better :) [14:43:27] (03CR) 10Hashar: [C: 03+2] jjb: don't mount src twice (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/739795 (owner: 10Hashar) [14:44:22] I have rebuild the one that failed previously https://integration.wikimedia.org/ci/job/mwext-php72-phan-docker/149246/console [14:45:16] (03Merged) 10jenkins-bot: jjb: don't mount src twice (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/739795 (owner: 10Hashar) [14:45:18] https://integration.wikimedia.org/ci/job/mediawiki-core-php72-phan-docker/58207/console failed bah [14:53:34] (03PS1) 10Hashar: jjb: fix composer update for mediawiki-core phan job [integration/config] - 10https://gerrit.wikimedia.org/r/739817 [14:53:43] take 3 https://integration.wikimedia.org/ci/job/mediawiki-core-php72-phan-docker/58208/console [14:56:24] congrats to paladox for becoming a gerrit upstream maintainer: https://groups.google.com/g/repo-discuss/c/T06LGw7A4h0 :) [14:56:34] \o/ [15:17:36] (03CR) 10Hashar: [C: 03+2] jjb: fix composer update for mediawiki-core phan job [integration/config] - 10https://gerrit.wikimedia.org/r/739817 (owner: 10Hashar) [15:19:34] (03Merged) 10jenkins-bot: jjb: fix composer update for mediawiki-core phan job [integration/config] - 10https://gerrit.wikimedia.org/r/739817 (owner: 10Hashar) [15:19:47] (03CR) 10Hashar: [C: 03+2] "The CI instances have 4 cpu and only one executor. So that sounds correct :]" [integration/config] - 10https://gerrit.wikimedia.org/r/739799 (owner: 10Jbond) [15:21:32] (03Merged) 10jenkins-bot: operations-puppet-catalog-compiler: increase number of threads to 4 [integration/config] - 10https://gerrit.wikimedia.org/r/739799 (owner: 10Jbond) [15:22:57] (03CR) 10Hashar: [C: 03+2] "Lets build it! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/739544 (owner: 10JMeybohm) [15:24:47] (03Merged) 10jenkins-bot: helm_linter: Ensure not helm repositories are defined [integration/config] - 10https://gerrit.wikimedia.org/r/739544 (owner: 10JMeybohm) [15:28:04] !log Builder helm-linter image for https://gerrit.wikimedia.org/r/739544 [15:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:32:21] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Write and send release announcement for 1.37.0 - https://phabricator.wikimedia.org/T289594 (10Reedy) [15:39:04] (03CR) 10Hashar: [C: 03+2] "INFO:jenkins_jobs.builder:Number of jobs generated: 1" [integration/config] - 10https://gerrit.wikimedia.org/r/739545 (owner: 10JMeybohm) [15:41:54] (03Merged) 10jenkins-bot: jjb: update helm-linter job to releng/helm-linter:0.2.18 [integration/config] - 10https://gerrit.wikimedia.org/r/739545 (owner: 10JMeybohm) [15:42:21] (03CR) 10Hashar: [C: 03+2] "It is magic and well done!" [integration/config] - 10https://gerrit.wikimedia.org/r/739565 (https://phabricator.wikimedia.org/T295362) (owner: 10Zfilipin) [15:44:32] (03Merged) 10jenkins-bot: jjb: Create selenium-daily-beta-VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/739565 (https://phabricator.wikimedia.org/T295362) (owner: 10Zfilipin) [15:46:31] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Write and send release announcement for 1.37.0 - https://phabricator.wikimedia.org/T289594 (10Reedy) 05In progress→03Resolved https://lists.wikimedia.org/hyperkitty/list/mediawiki-announce@lists.wikimedia.org/thread/XEVG4HTPHRDHTV6GXJ4SP2ZSIJBBN27K/ [15:46:34] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release, 10Patch-For-Review: Release MW 1.37.0 - https://phabricator.wikimedia.org/T289585 (10Reedy) [15:47:11] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release, 10Patch-For-Review: Release MW 1.37.0 - https://phabricator.wikimedia.org/T289585 (10Reedy) 05In progress→03Resolved a:03Reedy [15:47:19] (03Abandoned) 10Ahmon Dancy: Add cdb_rebuild_using_rebuildLocalisationCache config option [tools/scap] - 10https://gerrit.wikimedia.org/r/737495 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [15:47:43] (03Abandoned) 10Ahmon Dancy: deploy: Ensure mwdeploy user is a member of the www-data group [tools/train-dev] - 10https://gerrit.wikimedia.org/r/738954 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [15:52:38] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:54:13] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Scap, 10Patch-For-Review: Improve efficiency of scap l10n operations - https://phabricator.wikimedia.org/T295304 (10dancy) 05In progress→03Resolved p:05Triage→03Medium [15:57:02] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10GitLab (CI & Job Runners): Run docker-gc resource monitor on gitlab runners - https://phabricator.wikimedia.org/T295707 (10dancy) 05In progress→03Resolved [15:57:04] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10GitLab (CI & Job Runners), 10User-brennen: runner-1002 is out of space - https://phabricator.wikimedia.org/T291221 (10dancy) [16:13:49] 10Release-Engineering-Team (Doing), 10GitLab (Support), 10User-brennen: Couldn't fork a gitlab repository - https://phabricator.wikimedia.org/T295468 (10brennen) > Feel free to close the task if you think this was a one-off failure. We've upgraded since this originally came up, so it's possible it was a bug... [16:24:43] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10dcaro) >>! In T295692#7510433, @MBinder_WMF wrote: > Given what's discussed so far (meaning there might be more worth considering)...probably? I can't say... [16:45:04] 10Beta-Cluster-Infrastructure, 10MinervaNeue: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10bwang) [16:45:32] 10Beta-Cluster-Infrastructure: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10Reedy) [16:46:05] 10Beta-Cluster-Infrastructure: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10Reedy) [16:46:07] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10SRE, 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10Reedy) [16:59:01] Reedy: lovely [16:59:11] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) Right, sorry, lemme re-rephrase: I'm assuming that you are tracking on behalf of one team or more (SRE?). Even if the data is just for your us... [17:02:52] (03PS1) 10Ahmon Dancy: scap clean: No backtrace if non-existent branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739873 [17:03:23] (03PS1) 10Ahmon Dancy: scap clean: Delete cache/l10n separately [tools/scap] - 10https://gerrit.wikimedia.org/r/739874 (https://phabricator.wikimedia.org/T295304) [17:03:54] (03CR) 10Ahmon Dancy: [C: 03+2] scap clean: No backtrace if non-existent branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739873 (owner: 10Ahmon Dancy) [17:04:20] (03CR) 10jerkins-bot: [V: 04-1] scap clean: Delete cache/l10n separately [tools/scap] - 10https://gerrit.wikimedia.org/r/739874 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [17:04:48] (03Merged) 10jenkins-bot: scap clean: No backtrace if non-existent branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739873 (owner: 10Ahmon Dancy) [17:05:08] (03PS2) 10Ahmon Dancy: scap clean: Delete cache/l10n separately [tools/scap] - 10https://gerrit.wikimedia.org/r/739874 (https://phabricator.wikimedia.org/T295304) [17:11:40] (03PS6) 10Hashar: jjb: play with Jinja2 [integration/config] - 10https://gerrit.wikimedia.org/r/739282 [17:15:17] (03CR) 10Ahmon Dancy: [C: 04-1] scap clean: Delete cache/l10n separately [tools/scap] - 10https://gerrit.wikimedia.org/r/739874 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [17:15:32] maintenance-disconnect-full-disks build 335022 integration-agent-docker-1012 (/: 23%, /srv: 96%, /var/lib/docker: 26%): OFFLINE due to disk space [17:16:33] hashar: ^ [17:18:30] yeah same issue has earlier today I bet [17:20:10] !log Pooled back integration-agent-docker-1012 [17:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:22:36] RhinosF1: thank you [17:22:52] Np [17:24:41] 10Release-Engineering-Team (Radar), 10Security-Team, 10serviceops, 10GitLab (CI & Job Runners), 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10brennen) [17:25:56] !log registered #wikimedia-gitlab (T295917) [17:26:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:26:01] T295917: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 [17:26:16] maintenance-disconnect-full-disks build 335024 integration-agent-docker-1002 (/: 19%, /srv: 98%, /var/lib/docker: 29%): OFFLINE due to disk space [17:28:42] 10Release-Engineering-Team (Doing), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10brennen) [17:29:03] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10dcaro) > I'm assuming that you are tracking on behalf of one team or more Wrong assumption, this is so far exclusively for myself, and my tasks. It might... [17:30:36] maintenance-disconnect-full-disks build 335025 integration-agent-docker-1002 (/: 19%, /srv: 60%, /var/lib/docker: 29%): RECOVERY disk space OK [17:31:58] stupid refactoring [17:32:17] now I can either `srcdir: /mediawiki` or `volumes: { src: /mediawiki }` [17:32:41] (03PS2) 10Hashar: jjb: more migration to yaml based volumes definitions [integration/config] - 10https://gerrit.wikimedia.org/r/739583 [17:38:41] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10jeena) [17:40:31] maintenance-disconnect-full-disks build 335027 integration-agent-docker-1010 (/: 25%, /srv: 100%, /var/lib/docker: 32%): OFFLINE due to disk space [17:43:33] (03PS1) 10Hashar: jjb: move docker run --volume option(s) at end [integration/config] - 10https://gerrit.wikimedia.org/r/739888 [17:45:29] maintenance-disconnect-full-disks build 335028 integration-agent-docker-1010 (/: 25%, /srv: 47%, /var/lib/docker: 31%): RECOVERY disk space OK [17:47:34] (03Abandoned) 10Ahmon Dancy: scap clean: Delete cache/l10n separately [tools/scap] - 10https://gerrit.wikimedia.org/r/739874 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [17:48:11] (03PS7) 10Hashar: jjb: play with Jinja2 [integration/config] - 10https://gerrit.wikimedia.org/r/739282 [17:48:13] (03PS3) 10Hashar: jjb: more migration to yaml based volumes definitions [integration/config] - 10https://gerrit.wikimedia.org/r/739583 [17:48:34] (03PS1) 10Ahmon Dancy: scap clean: Don't backtrace if in-use branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739891 [17:49:00] (03CR) 10Ahmon Dancy: [C: 03+2] scap clean: Don't backtrace if in-use branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739891 (owner: 10Ahmon Dancy) [17:51:02] 10Beta-Cluster-Infrastructure: deployment-echostore01 periodically going offline - https://phabricator.wikimedia.org/T296013 (10Majavah) [17:51:23] (03Merged) 10jenkins-bot: scap clean: Don't backtrace if in-use branch is supplied [tools/scap] - 10https://gerrit.wikimedia.org/r/739891 (owner: 10Ahmon Dancy) [18:06:36] (03PS1) 10Ahmon Dancy: scap clean: Make --delete flag optional [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 [18:07:08] (03CR) 10jerkins-bot: [V: 04-1] scap clean: Make --delete flag optional [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 (owner: 10Ahmon Dancy) [18:07:56] (03PS2) 10Ahmon Dancy: scap clean: Make --delete flag optional [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 [18:18:18] 10Project-Admins, 10User-dcaro: Create tag projects worktype-project, origin-user, origin-alert, origin-team - https://phabricator.wikimedia.org/T295692 (10MBinder_WMF) FWIW, I think that this approach isn't exclusive of personal tasks or otherwise. You could, for example, have "SRE-origin-user" and track that... [18:31:00] maintenance-disconnect-full-disks build 335037 integration-agent-docker-1008 (/: 20%, /srv: 97%, /var/lib/docker: 50%): OFFLINE due to disk space [18:32:01] (03PS3) 10Ahmon Dancy: scap clean: Make --delete flag optional [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 [18:32:03] (03PS1) 10Ahmon Dancy: scap clean / scap prep mods for T295304 [tools/scap] - 10https://gerrit.wikimedia.org/r/739907 (https://phabricator.wikimedia.org/T295304) [18:34:03] (03PS1) 10Nikki Nikkhoui: jjb, Zuul: [mediawiki/services/servicelib-node/spec] test and gate-and-submit pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/739908 (https://phabricator.wikimedia.org/T295994) [18:35:30] maintenance-disconnect-full-disks build 335038 integration-agent-docker-1008 (/: 20%, /srv: 12%, /var/lib/docker: 50%): RECOVERY disk space OK [18:36:18] (03CR) 10jerkins-bot: [V: 04-1] jjb, Zuul: [mediawiki/services/servicelib-node/spec] test and gate-and-submit pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/739908 (https://phabricator.wikimedia.org/T295994) (owner: 10Nikki Nikkhoui) [18:39:56] !log deployment-prep root@deployment-acme-chief03:/var/lib/acme-chief/certs/mx# rm new && mv dbe71be4db0b4e58a3da4fc410d322bd dbe71be4db0b4e58a3da4fc410d322bd-bak && ln -s 92b8ed4bf5494405a75a0b3fb1d59422 new # T296000 [18:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:40:00] T296000: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 [18:40:22] (03PS2) 10Nikki Nikkhoui: jjb, Zuul: [mediawiki/services/servicelib-node/spec] add test pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/739908 (https://phabricator.wikimedia.org/T295994) [18:43:09] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Legoktm) 05Resolved→03Open Re-opening since I guess we're not entirely satisfied with the current project setup. I proposed a tag as a minimal change that made it easier to find the... [18:43:18] !log deployment-prep remove wikifunctions-related from ACME chief to attempt to at least workaround T296000 [18:43:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:43:57] does anybody know why in InitialiseSettings-labs.php a key with a 'default' array and a '+enwiki' array are not being properly merged? [18:43:57] This is the config change: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/737503/14/wmf-config/InitialiseSettings-labs.php and this is the config diff https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-diffConfig-docker/9087/console [18:44:04] !log deployment-prep run puppet at deployment-acme-chief03 [18:44:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:44:19] joakino: i can have a look after i finish fixing certs at beta :) [18:44:24] it is replacing the first item in the default :/ [18:44:26] thanks urbanecm [18:50:22] majavah: maybe i fixed the certs! [18:50:38] (well, workarounded by removing bunch of stuff from the certs to generate) [18:51:16] https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page still gives me a warning [18:51:25] ah, i fixed the generation [18:51:37] syncing gives "permission denied (publickey) for the acme-chief system user [18:51:51] like this https://www.irccloud.com/pastebin/zphy7FJg/ [18:52:14] is keyholder armed? [18:52:41] good point [18:52:41] - The agent has no identities. [18:52:47] * urbanecm goes to arm it [18:53:26] now it says active and lists a key...but still permission denied :D https://www.irccloud.com/pastebin/7x84QO8f/ [18:53:45] want me to try? [18:53:52] Nov 18 18:53:32 deployment-acme-chief03 ssh-agent-proxy[391]: Refusing agent sign request for user root [18:54:20] majavah: if you can guide me what to look for, I'd prefer that [18:54:27] sure [18:54:41] currently i'm confused why /etc/acme-chief/cert-sync.conf talks about acme-chief04 [18:54:50] i ssh'ed there, and it has very old certs [18:55:16] acme-chief-cert-sync is the script responsible for updating certs from the active acme-chief server (03) to the passive one (04) [18:55:36] aha [18:56:30] see the log line I pasted above? [18:57:11] have a look into the keyholder::agent definition in modules/acme_chief/manifests/server.pp [18:58:10] yeah [18:58:33] thanks [18:58:35] it only lets users in the acme-chief group use the cert, root is not in that group [18:59:09] side note, why exactly are you looking into cert-sync? it doesn't help with the "I see expired certs" issue [18:59:27] because i originally thought it's syncing certs to other servers [18:59:31] (like...cache) [18:59:49] no, that's dealt with by puppet [19:00:06] in that case, running it at cache [19:01:45] so, deployment-cache-text06:/etc/acmecerts/unified/live now has the new cert [19:03:40] I wonder why puppet didn't reload trafficserver-tls.service [19:04:15] i'm not sure [19:04:16] !log taavi@deployment-cache-text06:~$ sudo systemctl reload trafficserver-tls.service [19:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:04:22] and it works fine now :D [19:04:28] i did that too! [19:04:31] and it didn't work [19:04:44] clearly it didn't like you [19:04:48] when did you do it? [19:04:49] looks so [19:04:54] like 30 secs ago? [19:05:05] slightly different command though (`/bin/systemctl reload trafficserver-tls` as root) [19:05:09] maybe it just took a while, there's some special wrapper that manages how ats runs [19:05:26] btw, the certs might have been fine all along, it was just missing a reload [19:05:32] they weren't [19:05:34] i was checking that [19:05:36] hmh [19:05:41] the service at acme-chief was down [19:05:48] ah [19:06:01] systemd complained about it shutting down too frequently [19:06:04] i had to disable mx certs _and_ wikiversions certs for it to work again [19:06:19] see https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/58d1602bc620b72f19b89e35f63e6e6fbbc89798%5E%21/#F0 and https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/88ab9da4d6602186c12d854dd7188701042b86a0%5E%21/#F0 [19:06:34] i'll report on task majavah, thanks for the help [19:09:00] (03CR) 10Thcipriani: [C: 03+2] scap clean: Make --delete flag optional (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 (owner: 10Ahmon Dancy) [19:09:54] (03Merged) 10jenkins-bot: scap clean: Make --delete flag optional [tools/scap] - 10https://gerrit.wikimedia.org/r/739897 (owner: 10Ahmon Dancy) [19:10:37] (03PS1) 10Ahmon Dancy: More ssh-keygn quieting [tools/train-dev] - 10https://gerrit.wikimedia.org/r/739920 [19:10:57] (03CR) 10Ahmon Dancy: [C: 03+2] More ssh-keygn quieting [tools/train-dev] - 10https://gerrit.wikimedia.org/r/739920 (owner: 10Ahmon Dancy) [19:11:25] (03Merged) 10jenkins-bot: More ssh-keygn quieting [tools/train-dev] - 10https://gerrit.wikimedia.org/r/739920 (owner: 10Ahmon Dancy) [19:11:42] 10Beta-Cluster-Infrastructure: 'en.wikipedia.beta.wmflabs.org' Certificate has expired - https://phabricator.wikimedia.org/T296000 (10Urbanecm) I logged to acme-chief and checked the cert there. `root@deployment-acme-chief03:/var/lib/acme-chief/certs/unified/live# openssl x509 -in rsa-2048.crt -text -noout` told... [19:11:46] majavah: ^^ [19:25:14] (03CR) 10Thcipriani: [C: 04-1] "Inline question: what creates wgCacheDirectory now?" [tools/scap] - 10https://gerrit.wikimedia.org/r/739907 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [19:27:54] (03CR) 10Ahmon Dancy: scap clean / scap prep mods for T295304 (034 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/739907 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [19:30:42] (03CR) 10Thcipriani: [C: 03+2] "Makes sense, I figured that might be the case, but I'm not very familiar with the php side of this process." [tools/scap] - 10https://gerrit.wikimedia.org/r/739907 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [19:32:03] (03Merged) 10jenkins-bot: scap clean / scap prep mods for T295304 [tools/scap] - 10https://gerrit.wikimedia.org/r/739907 (https://phabricator.wikimedia.org/T295304) (owner: 10Ahmon Dancy) [19:45:54] 10Continuous-Integration-Config, 10Codex, 10Design-Systems-team (Design Systems Team FY2021-22 Kanban Board), 10Patch-For-Review: Deploy current main branch's docs site to doc.wikimedia.org - https://phabricator.wikimedia.org/T293704 (10Catrope) I don't think Netlify is self-hostable, is it? So I'm not sur... [19:51:11] 10Release-Engineering-Team (Doing), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10Patch-For-Review, 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10brennen) p:05Triage→03Medium [19:51:37] 10Release-Engineering-Team (Doing), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10Patch-For-Review, 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10brennen) 05Open→03In progress [19:57:51] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Tacsipacsi) f7febb6754d6aa86562fd219c47b3e8909e69573 reverted group1 wikis to wmf.7, without any explanation (and unfortunat... [20:01:10] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Majavah) >>! In T293950#7514870, @Tacsipacsi wrote: > f7febb6754d6aa86562fd219c47b3e8909e69573 reverted group1 wikis to wmf.... [20:04:47] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T293952 (10Krinkle) ##### Risky Patch! 🚂🔥 >>! In T292489#7511285, @gerritbot wrote: > %%%[mediawiki/core@master] mediawiki.base: Deprecate stateful use of toString()%%% >... [20:05:29] maintenance-disconnect-full-disks build 335056 integration-agent-docker-1012 (/: 23%, /srv: 99%, /var/lib/docker: 26%): OFFLINE due to disk space [20:06:10] oh f** [20:08:16] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10jeena) Sorry for the miscommunication. What @Majavah wrote is correct. In the future I will try to update this task with mor... [20:10:30] maintenance-disconnect-full-disks build 335057 integration-agent-docker-1012 (/: 23%, /srv: 14%, /var/lib/docker: 26%): RECOVERY disk space OK [20:14:42] 10Phabricator, 10Release-Engineering-Team, 10serviceops: Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10mmodell) [20:27:44] :( [20:29:08] so those disk filing up I will dig into it seriously tomorrow [20:29:21] some builds take too much disk space for some reason, I guess I will look at redoing the partitions [20:29:37] and probably head at using instances with larger disk [20:30:37] sounds fun :) [20:30:48] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T293948 (10thcipriani) 05Open→03Resolved Closing out this task since it's live everywhere. [20:40:43] 10Release-Engineering-Team (Doing), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10Patch-For-Review, 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10brennen) 05In progress→03Resolved [20:41:24] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for JKieserman - https://phabricator.wikimedia.org/T296024 (10JKieserman) [20:41:32] 10Release-Engineering-Team (Doing), 10wikimedia-irc-libera, 10GitLab (Administration, Settings & Policy), 10Patch-For-Review, 10User-brennen: Create an IRC channel for GitLab collaboration - https://phabricator.wikimedia.org/T295917 (10brennen) [20:44:30] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10GitLab (Auth & Access), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10brennen) 05Open→03Resolved I've created: - https://gitlab.wikimedia.org/people/wmf-team-c... [20:55:04] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10GitLab (Auth & Access), 10User-brennen, 10cloud-services-team (Kanban): Create top level 'cloud' group on Gitlab - https://phabricator.wikimedia.org/T293741 (10Majavah) >>! In T293741#7515010, @brennen wrote: > I've created: > > - https://gitlab.wikimed... [20:57:58] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Limit GitLab shared runners to images from Wikimedia Docker registry - https://phabricator.wikimedia.org/T291978 (10brennen) > What will the criteria be for adding new upstream... [21:02:27] OH [21:02:41] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10ppelberg) [21:02:53] side effect of https://gerrit.wikimedia.org/r/c/integration/config/+/739517 [21:09:43] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Tacsipacsi) >>! In T293950#7514874, @Majavah wrote: >>>! In T293950#7514870, @Tacsipacsi wrote: >> f7febb6754d6aa86562fd219c... [21:15:42] 10Release-Engineering-Team (Done by Wed 24 Nov 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10jeena) [21:24:43] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) > @hashar WMDE appreciates any magic changes/boosts to the infrastructure that you... [21:45:38] (03PS1) 10Hashar: jjb: stop using host src for Quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/739939 (https://phabricator.wikimedia.org/T292729) [21:47:07] ^ so that would solve the Quibble jobs filing disk [21:47:17] but it is 11pm here so well it will wait tomorrow [21:48:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) >>! In T292729#7515201, @gerritbot wrote: > Change 739939 had... [22:12:42] (03CR) 10Hashar: [C: 03+2] jjb: stop using host src for Quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/739939 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [22:14:33] (03Merged) 10jenkins-bot: jjb: stop using host src for Quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/739939 (https://phabricator.wikimedia.org/T292729) (owner: 10Hashar) [22:14:48] !log Updated Quibble jobs so that they no more fil the /srv/ partition https://gerrit.wikimedia.org/r/739939 # T292729 [22:14:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:14:51] T292729: TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 [22:33:11] dancy: if I understand correctly, running the rebuildLocCache maint script on appservers is abandoned, but for future reference, we've generally never run maintenance scripts on appservers afaik, and writing localisation cache in particular is a tricky one. It could be supported I suppose, but I'd say as-is it's unsupported even if it happens to work, so would be good to check in with PET or other MW maintainers about changing that as [22:33:11] developers will generally assume that maint scripts do not run on live app servers, and that rebuild loc cache specificlaly doesn't since it's analogous with recache=false. To ensure mental models stay in sync :) [22:35:09] I'm curious what you ended up changing effectively to make the perf win you intended. I'm a bit lost among the various patches. No rush, just curious what changed and how it got faster :) [22:45:17] Hi Krinkle. I was about to gain some time out of `scap sync-world` by not copying CDB files from /srv/mediawiki-staging/php-/cache/l10n to /tmp and back during the l10n rebuild portion of the process. [22:45:46] (just on the deploy server) [22:47:44] yikes.. typos.. *I was able to gain.... [22:55:58] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for JKieserman - https://phabricator.wikimedia.org/T296024 (10thcipriani) p:05Triage→03Medium a:03thcipriani Hi @JKieserman ! I've added you to the training session on Thursday, December 2nd, hopefully you got the calen... [23:06:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10Tgr) p:05Unbreak!→03Triage [23:39:25] dancy: ack, okay so that did end up working out. I'm guessing that means we'll be passing the rebuildLc script a different parameter now for the directory and drop the copy logic from scrap wapper? [23:39:47] It's easy in hindsight I suppose ;) [23:40:45] Exactly: https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/738453/8/scap/tasks.py [23:42:11] No output directory parameter was needed. I left a comment about that in a later commit: https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/739040/2/scap/tasks.py#572 [23:46:01] dancy: ack, no new value needed changed from override to temp, to default location. Got it [23:46:42] The main difference is the ownership of the output directory/files. Was l10n-update, now www-data. (in /srv/mediawiki-staging only, /srv/mediawiki is all owned by mwdeploy) [23:46:53] I hope that "soon" we can also remove the json rebuilding part when we switch to php arrays which can be synced directly [23:48:46] Depending on how interested you are, it might also be fun to see if it's still worthwhile even in the cdb format, eg that it is actually still faster to sync json and rebuild vs syncing cdb binary. I'm guessing that given lc changes are rare and new trains need full builds anyway, maybe it can even go away today already without a loss in deploy time. [23:49:32] haha. I had just added "prove that changed json rsyncs better than changed cdb". [23:49:48] .. to my todo list about 30 minutes ago [23:53:30] dduvall, thcipriani: I just wrote up a feature request for Blubber at T296046. Before I dive into trying to implement I would love critical feedback on the idea. [23:53:31] T296046: Allow build time control of effective UID/GID for runtime in Blubber generated Dockerfile - https://phabricator.wikimedia.org/T296046 [23:57:20] dancy: awesome, well, looking forward to the result then. at least thinking about it won't be a wasted effort as we'll need that either way for the non-binary php files, but it's certainly not a priority right now, but cool that you're looking at it already :)