[00:59:18] 10Phabricator, 06Release-Engineering-Team, 07Performance Issue: Submitting actions taking ~10 seconds to load after making changes - https://phabricator.wikimedia.org/T360484#9678137 (10brennen) [01:36:04] 10Beta-Cluster-Infrastructure, 10Cloud-Services: Launching new bullseye deployment-prep instances fails, no sudo access - https://phabricator.wikimedia.org/T361536 (10thcipriani) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.or... [01:36:34] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Launching new bullseye deployment-prep instances fails, no sudo access - https://phabricator.wikimedia.org/T361536#9678194 (10thcipriani) [03:10:52] 06Project-Admins: 14Create a beta cluster equivalent for the Wikimedia-production-error tag - 14https://phabricator.wikimedia.org/T344999#9678253 (10Aklapper) 05Stalled→03Declined [07:02:53] 10Phabricator, 10wikimedia.biterg.io: Closed tickets in Bugzilla migrated without a closing date - https://phabricator.wikimedia.org/T107254#9678627 (10valerio.bozzolan) a:05valerio.bozzolan→03None I sincerely do not remember what I was doing here :D will retry to do something during the next WMHack. I wil... [07:30:55] 10Phabricator, 03Wikimedia-Hackathon-2024: Phorge (Phabricator) Code Review Sprint - https://phabricator.wikimedia.org/T356384#9678694 (10valerio.bozzolan) [07:33:26] 10GitLab (Pipeline Services Migration🐤), 06collaboration-services, 13Patch-For-Review: move security.wikimedia.org to kubernetes - https://phabricator.wikimedia.org/T350796#9678704 (10Arnoldokoth) [08:15:32] (03PS3) 10Hashar: Update branch name for releng/release.git [integration/utils] - 10https://gerrit.wikimedia.org/r/1016000 (https://phabricator.wikimedia.org/T361513) (owner: 10Majavah) [08:16:09] (03CR) 10Hashar: "I have pushed your change directly to the repository given the tox configuration is heavily broken as it is :)" [integration/utils] - 10https://gerrit.wikimedia.org/r/1016000 (https://phabricator.wikimedia.org/T361513) (owner: 10Majavah) [08:21:47] (03PS2) 10Majavah: build: Update Tox config [integration/utils] - 10https://gerrit.wikimedia.org/r/1016004 [08:29:54] (03CR) 10Hashar: [C:03+2] build: Update Tox config [integration/utils] - 10https://gerrit.wikimedia.org/r/1016004 (owner: 10Majavah) [08:31:00] (03Merged) 10jenkins-bot: build: Update Tox config [integration/utils] - 10https://gerrit.wikimedia.org/r/1016004 (owner: 10Majavah) [08:46:52] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9678957 (10Jelto) [08:56:30] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9679050 (10CodeReviewBot) jelto merged https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/55 add CI pipelin... [08:59:57] (03PS1) 10Hashar: Archive mediawiki/extensions/CodeReview [integration/config] - 10https://gerrit.wikimedia.org/r/1016298 (https://phabricator.wikimedia.org/T309052) [09:05:53] hashar: Did you check for recent activity in the extension? :P [09:06:45] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9679092 (10Jelto) The Trusted Dockerfile Runner `gitlab-runner2004` is available now. The first project which is allowed to u... [09:06:55] Reedy: yeah there were some bulk updates to it? [09:07:30] from the history gathered on the task, shoutwikis were the sole over users we could tell about [09:07:42] hence I guess why Jack Phoenix is added as a reviewer [09:07:47] but thye have moved to Phabricator [09:08:09] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9679101 (10elukey) PKI intermediate cloud node fixed, now I think tha... [09:08:13] (03CR) 10Hashar: [C:03+2] Archive mediawiki/extensions/CodeReview [integration/config] - 10https://gerrit.wikimedia.org/r/1016298 (https://phabricator.wikimedia.org/T309052) (owner: 10Hashar) [09:08:23] looks like ashley wanted to keep it around "because it works" [09:08:59] I don't really understand why we archive things like we do [09:09:05] ie blanking the repos [09:09:30] But similarly... The changes I'd made to it recently weren't "bulk changes" done by a script [09:09:36] I'd purposefully been keeping it working [09:09:50] (03Merged) 10jenkins-bot: Archive mediawiki/extensions/CodeReview [integration/config] - 10https://gerrit.wikimedia.org/r/1016298 (https://phabricator.wikimedia.org/T309052) (owner: 10Hashar) [09:10:25] for blanking I don't know, the progress has more or less grown organically with steps being described in #cleanup > "Fill an archive request" which leads to https://phabricator.wikimedia.org/maniphest/task/edit/form/33/ [09:12:24] I also don't see the need to keep maintaining it given nobody uses it anymore [09:12:48] I do/did :) [09:12:52] but not in a public wiki [09:14:06] OHAHEraaREZRH [09:14:28] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9679125 (10Jelto) >>! In T357612#9676305, @bd808 wrote: > Just a note that #toolforge has [[https://gerrit.wikimedia.org/g/op... [09:17:07] Reedy: so you are the last person on earth still using Subversion? :) [09:17:31] (I am exaggerating on purpose for the narrative) [09:20:49] I still use SVN but not with Extension:CodeReview [09:28:11] Reedy: so essentially I can rollback what I did, decline that task requesting CodeReview to be archived and then list you as the maintainer? [09:35:12] 10Continuous-Integration-Infrastructure, 10CoverMe, 10Wikidata, 10wmde-wikidata-tech, 07Test-Coverage: Generate coverage report for Wikidata extensions - https://phabricator.wikimedia.org/T185211#9679207 (10Lucas_Werkmeister_WMDE) 05Resolved→03Open Yet the most important one by far, Wikibase, still d... [09:38:05] (03PS1) 10Reedy: layout.yaml: Add extension-coverage to Wikibase extension [integration/config] - 10https://gerrit.wikimedia.org/r/1016303 (https://phabricator.wikimedia.org/T185211) [09:38:44] Lucas_WMDE: what could possibly go wrong [09:39:28] (03CR) 10Reedy: [C:03+2] layout.yaml: Add extension-coverage to Wikibase extension [integration/config] - 10https://gerrit.wikimedia.org/r/1016303 (https://phabricator.wikimedia.org/T185211) (owner: 10Reedy) [09:40:14] (03CR) 10Lucas Werkmeister (WMDE): "Can we test this before deploying it?" [integration/config] - 10https://gerrit.wikimedia.org/r/1016303 (https://phabricator.wikimedia.org/T185211) (owner: 10Reedy) [09:40:20] um [09:40:25] guess that +2 answers my question :| [09:40:34] We can revert it [09:40:41] I don't think we can easily test it, no [09:41:18] (03Merged) 10jenkins-bot: layout.yaml: Add extension-coverage to Wikibase extension [integration/config] - 10https://gerrit.wikimedia.org/r/1016303 (https://phabricator.wikimedia.org/T185211) (owner: 10Reedy) [09:41:21] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9679250 (10Ladsgroup) I see "local" private commits on the new puppet... [09:41:50] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1016303 [09:41:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:44:38] 10Continuous-Integration-Infrastructure, 10CoverMe, 10Wikidata, 10wmde-wikidata-tech, and 2 others: Generate coverage report for Wikidata extensions - https://phabricator.wikimedia.org/T185211#9679248 (10Lucas_Werkmeister_WMDE) Though we also have T288396 for Wikibase specifically (I temporarily forgot it... [10:38:41] 10Phabricator, 06translatewiki.net, 10Language-Team (Language-2024-January-March), 03Localization Infrastructure FY2023-24, and 2 others: 14Reduce or remove translation export threshold for Phabricator - 14https://phabricator.wikimedia.org/T360861#9679477 (10abi_) 05Open→03Resolved 14Exports were... [11:51:20] !log On deployment-deploy03.deployment-prep executed “mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki --skipclusters=main,echo,growth,mediamoderation en wikipedia test2wiki test2.wikipedia.beta.wmcloud.org” [11:51:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:54:31] Lucas_WMDE: the coverage job is not voting (it runs in the `coverage` pipeline which does not vote at all). Then there is a postmerge job which would surely fails and add some noise if the coverage fails [11:54:48] but maybe it will just work [12:40:48] I’ll be very surprised if it does [12:40:52] but we can try it out once T361520 is fixed [12:40:53] T361520: "The cypress npm package is installed, but the Cypress binary is missing" error prevents merging changes - https://phabricator.wikimedia.org/T361520 [12:45:07] 10Phabricator, 06Release-Engineering-Team, 06Trust-and-Safety: 14Account recovery help needed for Phabricator account Ifeatu_Nnaobi_WMDE - 14https://phabricator.wikimedia.org/T355414#9679535 (10Ifeatu_Nnaobi_WMDE) 05Open→03Resolved [12:46:39] 10Phabricator, 06translatewiki.net, 10Language-Team (Language-2024-January-March), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: 14Reduce or remove translation export threshold for Phabricator - 14https://phabricator.wikimedia.org/T360861#9679554 (10Nikerabbit) [12:49:08] 10Release-Engineering-Team (Priority Backlog 📥), 13Patch-For-Review, 05Release, 05Train Deployments: 1.42.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T360157#9679651 (10jnuche) [12:52:03] 10GitLab, 10Release-Engineering-Team (Radar), 10ChangeProp, 06collaboration-services, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9679718 (10jijiki) >>! In T360596#9676049, @akosiaris wrote: > > My 2, operationally minded,... [12:59:34] 10Continuous-Integration-Infrastructure, 10CoverMe, 10Wikidata, 10wmde-wikidata-tech, and 2 others: Generate coverage report for Wikidata extensions - https://phabricator.wikimedia.org/T185211#9679984 (10hashar) >>! In T185211#9679207, @Lucas_Werkmeister_WMDE wrote: > Yet the most important one by far, Wik... [13:00:27] 10Continuous-Integration-Config, 10[DEPRECATED] wdwb-tech, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, and 2 others: Re-start Wikibase test coverage reporting - https://phabricator.wikimedia.org/T288396#9679989 (10hashar) [13:00:32] 10Continuous-Integration-Infrastructure, 10CoverMe, 10Wikidata, 10wmde-wikidata-tech, and 2 others: Generate coverage report for Wikidata extensions - https://phabricator.wikimedia.org/T185211#9679988 (10hashar) [13:00:44] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9679979 (10elukey) I think that by default any puppetmaster that pull... [13:17:20] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9680032 (10hashar) ` name=TLDR rm /srv/git/labs/private/.git/hooks/pr... [13:19:42] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): 14Replace deployment-ores02 - 14https://phabricator.wikimedia.org/T361385#9680074 (10Andrew) 05Open→03Resolved 14Yep, it's gone now. [13:39:32] 06Release-Engineering-Team, 06collaboration-services, 10mwcli, 06serviceops-radar: Create /nonexistent directory for nobody user in golang images - https://phabricator.wikimedia.org/T331209#9680158 (10LSobanski) @Addshore I see that https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2750 has been impleme... [13:45:49] !log `rm /srv/git/labs/private/.git/hooks/pre-commit` in deployment-puppetserver-1 - T360595 [13:45:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:45:53] T360595: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595 [13:45:55] hashar: o/ [13:46:09] :) [13:46:17] tldr: Puppet is broken :) [13:46:20] if you have min - the trick worked but for some reason the vm is totally busy now [13:46:23] yeah [13:46:51] because of the file deletion? [13:46:52] it is all OOMs, I am wondering if the VM is too tiny, ram-wise [13:47:00] oh [13:47:02] right after the commit [13:47:42] which commit? [13:47:50] the private one that I added [13:48:33] the ooms are in the horizon's logs but not sure if they are recent or not [13:48:40] for sure I can't ssh to the puppetserver anymore [13:48:58] according to https://openstack-browser.toolforge.org/project/deployment-prep the server has 8G of ram [13:49:33] not sure how much puppetmaster04 had, and this runs puppet 7 IIUC [13:49:43] I can try to soft-reboot [13:50:25] I wanted to try if the git-sync worked with the local commit, but it looks like it didn't [13:52:20] https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=deployment-prep&var-instance=deployment-puppetserver-1&from=now-3h&to=now [13:52:28] elukey: that looks like it was a short burst? [13:52:45] 06Release-Engineering-Team, 10Scap, 13Patch-For-Review: deploy promote needs a timeout when waiting for CI - https://phabricator.wikimedia.org/T361585#9680261 (10CodeReviewBot) jnuche opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/264 deploy-promote: add timeout to loop waiting for v... [13:52:48] TIL this dashboard [13:52:53] yeah [13:53:05] it is well hidden! [13:54:35] the memory panel is not that helpful [13:54:50] I can't ssh so I assume that the oom killer did a mess and now the host is half usable [13:54:52] I am pretty sure I have fixed it at some point to show the actual metrics [13:55:00] so yeah [13:55:03] +1 on soft rebooting [13:55:42] !log soft reboot deployment-puppetserver-1 - T360595 [13:55:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:55:44] T360595: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595 [13:55:50] [468779.230737] Out of memory: Killed process 1020032 (java) total-vm:11908356kB, anon-rss:7781688kB, file-rss:0kB, shmem-rss:0kB, UID:104 pgtables:15912kB oom_score_adj:0 [13:55:51] hehe [13:55:56] and more and more java [13:56:12] anon-rss:7809820kB [13:56:15] and the instance has 8G [13:56:24] as to why there is Java on that instance? WEll I don't know [13:57:43] holy hell [13:58:21] /usr/bin/java -jar /usr/share/puppetserver/puppetserver.jar [13:58:23] that is it [13:58:31] if it is a jvm running it may be closure, puppetdb is written in it [13:58:32] I am off. I resign from IT for good [13:58:33] ah yeah [13:58:48] that was the last horror story I wanted to hear [13:59:34] no luck with the instance [13:59:38] anyway, at least the server answers now [13:59:55] ah okok now it works [14:00:21] and I don't see how the Linux oom killer kicked in [14:00:32] given the jvm should have a limited heap already [14:00:35] but who knows really [14:01:21] /usr/bin/java -Xms1g -Xmx3g [14:01:27] so 1g at start and 3g max [14:01:28] checking if a puppet run works [14:01:29] iirc [14:02:54] puppetserver.service down, not a great start [14:04:47] /usr/bin/java -Xms1g -Xmx7g [14:04:55] so yeah 7G of heap [14:05:18] I have no idea why java managed to reach anon-rss:7781688kB [14:05:40] then it is not like I know the actual relation between Java Heap size and the actual memory consumed [14:07:38] we probably need to ask for a bigger instance [14:07:42] just to be safe [14:07:57] anyway, puppet works, of course my trick didn't update the cfssl stuff [14:08:48] we already define the cfssl auth key as aaaabbbbccccdddd in the fake private repo [14:09:04] I tried to add in the same file a redefinition of the hiera key, hoping it would override and take the priority [14:09:11] but I think I need to add it elsewhere [14:11:04] andrewbogott: o/ is there a way in horizon to add more vram to a running VM without recreating it? [14:11:09] sort of what we do with ganeti [14:11:13] https://phabricator.wikimedia.org/P59175 :) [14:11:17] (just a reboot and that's it) [14:11:18] is for the history of OOM killing [14:12:16] horizon has a 'resize instance' button [14:12:24] elukey: yeah, you can resize, there should be a button on Horizon -- it'll reboot the VM as part of resizing and then you need to confirm it's working and tell Horizon all is well [14:12:25] have you tried reducing the java heap allocation? [14:12:30] hashar: if you check the command line of puppetserver you'll see that the jvm runs jruby [14:13:04] taavi: I woudn't touch the base config and just add some vram [14:13:21] you can also mess with 'profile::puppetserver::java_max_mem' hiera setting [14:13:34] (probably that's what taavi meant) [14:14:02] yesyes thanks both for the suggestions, I found the resize, IIUC I just need to select a new flavor and reboot right? [14:14:03] 10Continuous-Integration-Infrastructure, 07Jenkins, 06Release-Engineering-Team, 06collaboration-services, and 3 others: Jenkins core security advisory - 2024-03-20 - https://phabricator.wikimedia.org/T360759#9680404 (10hashar) I will look at upgrading the CI Jenkins tomorrow morning. [14:15:00] elukey: it'll reboot for you :) [14:15:14] gimme a sec, checking some stuff [14:15:33] But then a 'confirm resize' button will appear -- make sure all is well before clicking [14:15:52] so we have only ~30GB of vram left in there, and the next bump is +8 [14:16:30] at this point I can try to reduce the heap size and see [14:23:45] the integration puppet master has 3G of heap [14:24:17] I am trying to find where it is set to 7g [14:24:22] maybe there is a reason [14:24:54] though the oom killer did trigger once for theintegration puppet master :) [14:26:04] deployment-prep/deployment-puppetserver.yaml:profile::puppetserver::java_max_mem: 7g [14:26:17] I think as a rule I set it to n-1 for most servers I made. [14:26:21] I use https://gerrit.wikimedia.org/r/cloud/instance-puppet.git [14:26:26] Is 1GB not enough to run the rest of the system? [14:26:31] definitely not [14:26:36] :( [14:26:52] there must be a bunch of overhead [14:27:11] 7G still had the linux kernel to kill the jvm due to 7781688kB of anon-rss mem usage [14:27:16] andrewbogott: from https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+blame/master/deployment-prep/deployment-puppetserver.yaml it seems that you set 7g, was it a copy/paste from prev configs? [14:27:20] which looks like more than 7G [14:28:05] ...what project are we talking about right now? [14:28:09] cloud having a default set to1g [14:28:16] andrewbogott: deployment-prep sorry [14:29:16] OK, so the VM has 8GB and the limit is set to 7, that's consistent with me trying to set things to n-1 [14:29:40] if n-1 is too much then that worries me since I would really like to be able to run a puppetserver on 2GB of ram for a small project :( [14:30:58] the 7G is just for the heap [14:31:06] but there is more memory used for other things [14:31:08] by the jvm [14:31:20] okok let's go down to 5g [14:31:24] add in the rest of the system and the oom ends up triggering [14:31:28] I'll commit if everybody agrees [14:31:46] lowering it shouldn't break puppet in any case, it'll just make things slower [14:31:57] the integration server has 4G of RAM and the Java heap is at 3G but that still triggered the oom killer :D [14:32:24] as to how Puppet uses that memory, well I have no idea. My guess is caching of some sort maybe [14:34:14] elukey: +1 :) [14:34:32] 10Continuous-Integration-Config, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 2 others: Re-start Wikibase test coverage reporting - https://phabricator.wikimedia.org/T288396#9680507 (10ArthurTaylor) [14:35:33] 06Release-Engineering-Team, 10Scap, 13Patch-For-Review: scap deploy-promote needs a timeout when waiting for CI - https://phabricator.wikimedia.org/T361585#9680508 (10dancy) [14:35:34] 10Continuous-Integration-Config, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 2 others: [REPO][CLIENT][SW] Re-start Wikibase test coverage reporting - https://phabricator.wikimedia.org/T288396#9680509 (10ArthurTaylor) [14:37:03] * andrewbogott now worried that every puppetserver he made last week is going to oom [14:37:39] 10Continuous-Integration-Config, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: [REPO][CLIENT][SW] Re-start Wikibase test coverage reporting - https://phabricator.wikimedia.org/T288396#9680511 (10ArthurTaylor) [14:37:46] But... we run VMs with 1GB of ram all the time. So surely that's enough to run the overhead of the OS. Assuming that puppet doesn't gobble up tons of ram outside of the heap. [14:38:00] * andrewbogott should assume that puppet does all manner of hostile things [14:38:25] it is also a matter of how the kernel is configured, since the oom kicks in under certain conditions [14:39:07] if the heap of a jvm reaches its full it is 7 out of 8 total gbs used, and the oom might kick in [14:39:53] !log restart puppetserver on deployment-puppetserver-1 with 5g of Xmx (rather than 7g) - T360595 [14:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:39:55] T360595: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595 [14:40:03] all right, done! [14:40:10] now I can try to do another commit :D [14:43:02] worked, didn't die this tine [14:43:06] *time [14:44:15] 10GitLab (Infrastructure), 10Release-Engineering-Team (Radar), 06collaboration-services, 13Patch-For-Review: 14Add GitLab upgrades and maintenance to deployment calendar - 14https://phabricator.wikimedia.org/T336470#9680540 (10CodeReviewBot) 14jelto opened https://gitlab.wikimedia.org/repos/releng/rel... [14:45:07] elukey: congrats :) [14:45:30] thanks all for the support [14:45:42] buuut the cfssl value is not updated, so need more work [14:49:39] I am off [14:50:39] elukey: `puppet lookup --compile --node ` is your friend here [14:53:29] taavi: it is a weird use case, I was attempting an hack but it doesn't really work [14:54:40] and I am not 100% sure if the private repo in deployment-prep works with local commits now [14:54:46] 06Release-Engineering-Team, 10Scap, 13Patch-For-Review: scap deploy-promote needs a timeout when waiting for CI - https://phabricator.wikimedia.org/T361585#9680619 (10CodeReviewBot) jnuche merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/264 deploy-promote: add timeout to loop waiting... [14:58:59] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9680650 (10dancy) >>! In T357612#9679092, @Jelto wrote: > The Trusted Dockerfile Runner `gitlab-runner2004` is available now.... [14:59:00] 06Release-Engineering-Team, 10Scap, 13Patch-For-Review: 14scap deploy-promote needs a timeout when waiting for CI - 14https://phabricator.wikimedia.org/T361585#9680644 (10jnuche) 05Open→03Resolved a:03jnuche 14Fix will go out with the next scap release [15:02:53] hashar (when you get back): Regarding https://integration.wikimedia.org/ci/job/operations-mw-config-tox-docker/6803/console , what uid is running the job? [15:05:25] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9680674 (10elukey) Update: after a lot of configs, we are able now to... [15:05:34] yeah I believe that the values in the private repo are not picked up [15:21:00] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9680745 (10bd808) >>! In T357612#9679125, @Jelto wrote: >>>! In T357612#9676305, @bd808 wrote: >> Just a note that #toolforge... [15:24:57] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9680824 (10taavi) For that use case converting the repository to something like `docker-pkg` would be a better idea I think.... [15:28:10] 06Release-Engineering-Team, 10Recommendation-API, 07affects-translatewiki.net, 10Language-Team (Language-2024-April-June), and 2 others: Automatic merging of localization updates broken for recommendation-api - https://phabricator.wikimedia.org/T348655#9680841 (10Pginer-WMF) [15:28:31] kafka logging seems working now :) [15:37:56] 06Release-Engineering-Team, 10RESTBase: RESTBase scap deployment failed - https://phabricator.wikimedia.org/T361608 (10Jgiannelos) 03NEW [15:37:57] 06Release-Engineering-Team, 10RESTBase: RESTBase scap deployment failed - https://phabricator.wikimedia.org/T361608#9680932 (10Jgiannelos) [15:39:25] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9680945 (10CodeReviewBot) jelto opened https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/58 fix syntax in... [15:45:08] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9681012 (10elukey) Ok so the issue was that the `profile::pki::client... [15:51:24] 10Phabricator, 06collaboration-services, 10wikimedia.biterg.io: Closed tickets in Bugzilla migrated without a closing date - https://phabricator.wikimedia.org/T107254#9681047 (10Dzahn) [15:51:29] 10Continuous-Integration-Infrastructure, 07Jenkins, 07castor: castor-save-workspace-cache aborted during postbuild - https://phabricator.wikimedia.org/T352319#9681046 (10ArthurTaylor) Saw this again today for [[https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/166245/conso... [15:52:07] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9681067 (10CodeReviewBot) jelto merged https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/58 fix syntax in... [15:58:48] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9681147 (10thcipriani) >>! In T360595#9681012, @elukey wrote: > `depl... [16:04:55] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9681188 (10CodeReviewBot) jelto opened https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/59 fix syntax in... [16:06:12] 10GitLab (CI & Job Runners), 06collaboration-services, 13Patch-For-Review: Create a special-purpose Trusted Runner with Dockerfile frontend - https://phabricator.wikimedia.org/T357612#9681207 (10CodeReviewBot) jelto merged https://gitlab.wikimedia.org/repos/releng/buildkit/-/merge_requests/59 fix syntax in... [16:08:32] 10Release-Engineering-Team (Priority Backlog 📥), 13Patch-For-Review, 05Release, 05Train Deployments: 1.42.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T360157#9681234 (10TheresNoTime) [16:11:32] Reedy / hashar: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-docker-publish/98323/console looks like the coverage isn’t working, though the error isn’t obvious to me tbh [16:12:11] I’m guessing the nonzero exit is from `test -f /workspace/cover/index.html` (i.e. file missing), and that could be due to “Incorrect filter configuration, code coverage will not be processed”? [16:12:16] I might look a bit more tomorrow [16:27:12] 10Beta-Cluster-Infrastructure, 06MediaWiki-Platform-Team, 10MW-1.42-notes (1.42.0-wmf.25; 2024-04-02), 13Patch-For-Review: Cannot create a new wiki on beta cluster - https://phabricator.wikimedia.org/T358236#9681283 (10pmiazga) I managed to run this script locally in docker env. But it gets pass that line,... [16:28:24] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9681300 (10elukey) >>! In T360595#9681147, @thcipriani wrote: >> @thc... [16:32:02] 10GitLab, 06Diffusion-Repository-Administrators, 10Projects-Cleanup, 10Wikimedia Design Style Guide, and 2 others: Archive Design Style Guide code bases / project / docs - https://phabricator.wikimedia.org/T360362#9681303 (10Volker_E) Thanks @DDeSouza for this list! Note that #vuetest is decommissioned, se... [16:35:40] 10Continuous-Integration-Infrastructure: Running `cypress` in Wikimedia CI requires unusual env variables - https://phabricator.wikimedia.org/T361624 (10matmarex) 03NEW [17:05:47] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Infrastructure-Foundations, 10observability, and 2 others: beta-scap-sync-world fails: logstash_checker.py: KeyError: 'aggregations' - https://phabricator.wikimedia.org/T360595#9681515 (10colewhite) >>! In T360595#9681012, @elukey wrote: > `deplo... [20:58:36] 10Continuous-Integration-Config, 10MediaWiki-extensions-CentralAuth, 10MediaWiki-Installer, 06MediaWiki-Platform-Team, 07ci-test-error: Admin account created by the installer isn't made global by CentralAuth - https://phabricator.wikimedia.org/T358985#9682187 (10matmarex) [22:28:16] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Launching new bullseye deployment-prep instances fails, no sudo access - https://phabricator.wikimedia.org/T361536#9682456 (10thcipriani) [23:50:02] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for [YOUR DEVELOPER ACCOUNT USERNAME HERE] - https://phabricator.wikimedia.org/T361658 (10Berete5212) 03NEW [23:51:04] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Baba Berete - https://phabricator.wikimedia.org/T361658#9682573 (10Berete5212)