[00:07:20] (03update) 10bd808: SpiderPig: auto select first backport search match [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/731 (https://phabricator.wikimedia.org/T392508) [00:11:34] 10Continuous-Integration-Infrastructure, 13Patch-For-Review: CI is overwhelmed and lots of jobs are failing randomly (2025-04-29) - https://phabricator.wikimedia.org/T392963#10778692 (10thcipriani) >>! In T392963#10778630, @Daimona wrote: >>>! In T392963#10778428, @Daimona wrote: >> -- How to get a list of... [00:34:26] (03update) 10bd808: SpiderPig: auto select first backport search match [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/731 (https://phabricator.wikimedia.org/T392508) [00:38:23] (03approved) 10bd808: log.py: @version should be "1" [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/779 (owner: 10dancy) [01:55:24] 10Continuous-Integration-Infrastructure, 06collaboration-services, 13Patch-For-Review: CI is overwhelmed and lots of jobs are failing randomly (2025-04-29) - https://phabricator.wikimedia.org/T392963#10778743 (10Dzahn) [02:00:03] maintenance-disconnect-full-disks build 697403 integration-agent-docker-1062 (/: 26%, /srv: 95%, /var/lib/docker: 47%): OFFLINE due to disk space [02:05:03] maintenance-disconnect-full-disks build 697404 integration-agent-docker-1062 (/: 26%, /srv: 71%, /var/lib/docker: 46%): RECOVERY disk space OK [03:15:03] maintenance-disconnect-full-disks build 697418 integration-agent-docker-1062 (/: 26%, /srv: 95%, /var/lib/docker: 47%): OFFLINE due to disk space [03:25:03] maintenance-disconnect-full-disks build 697420 integration-agent-docker-1062 (/: 26%, /srv: 90%, /var/lib/docker: 47%): RECOVERY disk space OK [04:18:44] 10Gerrit, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Delete two branches from mediawiki/extensions/Translate Gerrit extension - https://phabricator.wikimedia.org/T392787#10778829 (10abi_) I have tried, and lack the necessary permissions {F59570437 size=full} Attempt from CLI... [06:28:24] Yippee, build fixed! [06:28:24] Project mediawiki-core-doxygen build #10048: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10048/ [06:30:59] !log integration: cleared /srv/jenkins/workspace on integration-agent-docker-1062 [06:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [06:43:28] (03CR) 10Jakob: "Amaziiing, thanks! Where did it fail with `localhost` before? Looks like PS14 also passed here after making sure that the env var was set " [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [06:47:24] 10Phabricator (Upstream), 07Upstream: Modified files not counted in total when attaching files - https://phabricator.wikimedia.org/T380361#10779005 (10Aklapper) [06:47:25] 10Phabricator, 10Release-Engineering-Team (Priority Backlog πŸ“₯): Update to Phorge upstream 2025.xx release - https://phabricator.wikimedia.org/T386558#10779006 (10Aklapper) [07:25:17] 10Continuous-Integration-Infrastructure, 06collaboration-services, 13Patch-For-Review: CI is overwhelmed and lots of jobs are failing randomly (2025-04-29) - https://phabricator.wikimedia.org/T392963#10779109 (10hashar) >>! In T392963#10778630, @Daimona wrote: > > Based on what we know now: the shell loo... [07:33:52] jakob_WMDE: Guten Tag. Do you have anything to add on https://gerrit.wikimedia.org/r/c/integration/quibble/+/1137857 ? [07:34:13] if it is fine, I will amend it to add tests for `strtobool` and +2it [07:34:14] :) [07:34:27] (03CR) 10Hashar: "I have looked up for a past issue in the git history of `integration/config` and `integration/quibble` but could not find it back. Or may" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [07:34:33] ah and I forgot to post my remark :) [07:34:46] 8 [07:36:07] hashar: bonjour! I'm happy with it as is :) [07:36:33] ok ): [07:36:36] ok :) [07:36:41] WRONG SMILEY [07:36:45] :b [07:36:51] I should learn about emojis [07:36:52] :D [07:36:59] 😜 [07:42:52] and if I read upstream Python code [07:43:00] strtobool() does not even return a boolean [07:43:02] but 0 or 1 [07:43:03] :) [07:43:22] :O [07:43:53] should be called strto0or1() then [07:44:15] or have it cast to a bool [07:44:33] when I retire, maybe I will contribute back to Python (or whatever) [07:44:34] :b [07:45:55] fixing strtobool sounds like a good first retirement project [07:48:42] hashar: we ultimately want either Wikibase's apitests job to run with OpenSearch, or possibly a separate job that only runs search-related api tests to avoid breaking anything unrelated that doesn't expect it to be enabled. what do you think would be a good next step? [07:48:56] running wikibase apitests locally with the new image is probably a good start [07:51:36] (03PS16) 10Hashar: Add support for OpenSearch [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [07:51:52] jakob_WMDE: I have added strtobool and added a bunch of text to the commit message. If both looks good to you I will +2 [07:52:04] and then cut a new Quibble release [07:52:15] or maybe hold [07:52:27] we can have a dedicated job in Jenkins for sure [07:53:00] then it depends on how your tests are written. Is that PHPUnit ? apitesting stack? Cypress? [07:53:15] apitesting stack, yup [07:53:57] how are the apitesting tests being run currently? Is that part of the general jobs? [07:54:19] (I should know really, but the stack is so wide this days that it is hard to remember about everything :b) [07:54:59] they are running as regular apitests, but aren't actually using CirrusSearch/OpenSearch, but some sql fallback search implementation at the moment [07:55:29] the Quibble jobs skip api testing apparently [07:57:38] it also wouldn't tell us anything at the moment. if I remember correctly, we currently actively disable CirrusSearch-based search in CI for wikibase [07:58:41] the repos you are interested in are WikibaseCirrusSearch and WikibaseLexemeCirrusSearch? [07:59:02] so we can add the apitest jobs to each of them (bby adding in zuul/layout.yaml the template `extension-apitests`) [07:59:36] then have CI inject QUIBBLE_OPENSEARCH=true to the *apitests* jobs when it is one of those two projects [07:59:51] no, Wikibase.git itself [07:59:55] AH [07:59:55] :) [08:00:32] which already runs the apitests jobs [08:00:55] yup! [08:01:07] so yeah we can do something such as: if ZUUL_PROJECT==Wikibase and 'apitests' in job.name: env+=QUIBBLE_OPENSEARCH=true [08:01:20] but we first need a new Quibble release [08:01:23] rebuild the images [08:01:32] switch all the Jenkins jobs to those new images [08:01:36] and pray :b [08:01:41] or prey [08:01:42] whatever [08:02:06] I am running the MediaWiki train then I got a meeting. I will do the Quibble release later today [08:02:19] haha, good plan! [08:03:25] great, thanks! I'll try to run the api-testing job locally against a wikibase change to see if anything falls over if OpenSearch is enabled [08:04:41] jakob_WMDE: can you check the commit message at https://gerrit.wikimedia.org/r/1137857 ? [08:04:49] that is your Quibble change :) [08:04:58] yes! [08:05:00] * jakob_WMDE looks [08:06:44] (03CR) 10Jakob: [C:03+1] "The changes in PS15 and PS16 LGTM, thanks!" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [08:07:07] (03CR) 10Hashar: [C:03+2] "Awesome!!!" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [08:07:11] Los geht's [that is powered by AI German for "let's roll"] [08:07:38] hehe, sounds about right! [08:07:47] (03CR) 10Hashar: [C:03+2] "Done" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [08:08:04] I don't think the code is that complicated [08:08:21] but the number of layers involved makes the whole model rather complicated to keep in a single brain :( [08:08:32] or in other terms: there are lot of moving parts [08:09:02] (and yesterday I was reviewing while multitasking, which was not a great way to do it and explains a bit of the back and forth I had) [08:09:04] anyway [08:26:13] hashar: hehe, working with the code itself was a bit of a challenge for me, but the bit of back and forth during the review was totally fine :) [08:26:38] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 13Patch-For-Review, 05Release, 05Train Deployments: 1.44.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T386222#10779304 (10hashar) [08:26:44] (03Merged) 10jenkins-bot: Add support for OpenSearch [integration/quibble] - 10https://gerrit.wikimedia.org/r/1137857 (https://phabricator.wikimedia.org/T386691) (owner: 10Jakob) [08:33:20] jakob_WMDE: you did great code wise :] [08:34:30] thanks :D [09:45:49] 10GitLab (Infrastructure), 10Ceph, 06collaboration-services, 10Data-Persistence-Backup, and 2 others: Migrate gitlab storage to apus (also: backups from S3?) - https://phabricator.wikimedia.org/T378922#10779631 (10Jelto) I switched the replica to use the read-only credentials but unfortunately I get a `Acc... [10:26:45] 10Phabricator: Custom task form for #MW-Interfaces-Team - https://phabricator.wikimedia.org/T392598#10779733 (10Aklapper) Hi, custom fields are available for //all and any// tasks in our Phabricator instance. Are further engineering teams also interested in this proposal? The problem doesn't sound uncommon so I... [10:36:41] 10GitLab (Infrastructure), 10Ceph, 06collaboration-services, 10Data-Persistence-Backup, and 2 others: Migrate gitlab storage to apus (also: backups from S3?) - https://phabricator.wikimedia.org/T378922#10779759 (10MatthewVernon) I think I found the relevant request - was this about 08:33 UTC today (and the... [11:58:07] hashar: I found that the wikibase CI config needed some tweaking in order for the CirrusSearch maintenance scripts to run, but even with that fixed some things don't seem quite right when I ran the api-testing job with the new image and my quibble changes locally. I'm hopeful that it only needs some more config adjustments, though. [11:58:35] I gotta run away for today and am off on friday, so I'll have to save myself that work for monday :D [12:03:36] 10Continuous-Integration-Infrastructure, 06collaboration-services, 13Patch-For-Review: CI is overwhelmed and lots of jobs are failing randomly (2025-04-29) - https://phabricator.wikimedia.org/T392963#10779987 (10Daimona) >>! In T392963#10779109, @hashar wrote: >>>! In T392963#10778692, @thcipriani wrote:... [12:07:26] (03PS1) 10Jelto: helm-linter: Remove duplicate update-alternatives for helm3 [integration/config] - 10https://gerrit.wikimedia.org/r/1140168 (https://phabricator.wikimedia.org/T387548) [12:11:05] 10Continuous-Integration-Infrastructure, 10Testing Support, 10ci-test-error (WMF-deployed Build Failure), 10MW-1.44-notes (1.44.0-wmf.23; 2025-04-01), 13Patch-For-Review: Selenium timeouts can cause the job to remain stuck until the build times out - https://phabricator.wikimedia.org/T389536#10780005 (10D... [12:22:06] Hey folks, I got another selenium job that is currently stuck: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/11852/console [12:22:18] Anyone around who would like to take a look at what's happening inside? [12:22:57] Like what processes are running, and whether we see the same chrome crash as https://phabricator.wikimedia.org/T389536#10675707 [12:28:15] https://grafana.wmcloud.org/goto/eQp3mVbNk?orgId=1 is basically flat so wth is it doing [13:00:16] 10Continuous-Integration-Infrastructure, 10Testing Support, 10ci-test-error (WMF-deployed Build Failure), 10MW-1.44-notes (1.44.0-wmf.23; 2025-04-01), 13Patch-For-Review: Selenium timeouts can cause the job to remain stuck until the build times out - https://phabricator.wikimedia.org/T389536#10780154 (10D... [13:02:06] Ooooooooh nice, I re-submitted both patches, and thanks to the success cache it only needs to run the single job that failed. [13:12:15] (03open) 10jnuche: Release 4.158.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/781 [13:14:23] (03merge) 10jnuche: Release 4.158.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/781 [13:27:34] Daimona: yup the success cache is a great thing :] [13:27:37] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): Add browser notification when deployment is awaiting user interaction - https://phabricator.wikimedia.org/T392487#10780297 (10jnuche) 05Openβ†’03Resolved In prod now, enjoy! [13:27:53] about the selenium test being stuck is that the same issue that was happening some weeks ago? [13:28:13] I can't spend time debugging it today, I have too many things to do unfortunately :/ [13:28:42] It is indeed amazing! I don't get to see it in action very often, but when it does, it's a relief. Great job y'all! [13:29:24] Re selenium: yes, same thing, it keeps happening a few times per day. Sometimes, like the example above, in gate-and-submit, which ends up disrupting my work (and surely not only mine?) [13:30:08] Also don't worry, I'm sure it'll come back again. I'm also not sure how to surface these, because we don't have reliable repro steps, so if we want to gather useful data, it needs to be done from builds that are currently stuck. [13:30:48] But that means someone needs to notice that the build is stuck, and find someone who can check the agent, all before the build reaches the 1h timeout. As I pointed out in the task, the process isn't scalable. [13:33:03] yeah definitely not [13:33:07] but maybe it can be automatized [13:51:11] 10GitLab (Infrastructure), 10Ceph, 06collaboration-services, 10Data-Persistence-Backup, and 2 others: Migrate gitlab storage to apus (also: backups from S3?) - https://phabricator.wikimedia.org/T378922#10780399 (10Jelto) Thank you @MatthewVernon for digging into the logs. It was a bit tricky for me to find... [13:51:21] 06Project-Admins: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011 (10Pppery) 03NEW [13:51:24] 06Project-Admins: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10780410 (10Pppery) [13:52:20] (03PS1) 10Hashar: release: Quibble 1.14.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140182 (https://phabricator.wikimedia.org/T378797) [13:53:10] (03CR) 10Hashar: [C:03+2] release: Quibble 1.14.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140182 (https://phabricator.wikimedia.org/T378797) (owner: 10Hashar) [13:54:05] (03PS1) 10Hashar: release: Start 1.14.1 cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140183 [13:55:55] Actually, that's interesting. Can we print the process list before forcefully terminating the build? [13:56:17] maybe using a Jenkins plugin, I don't know [13:56:30] And add the chrome error log to the build artifacts [13:56:46] 06Project-Admins: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10780449 (10A_smart_kitten) Anecdotally, I feel like I've previously seen #shoutwiki_calendar used for tasks relating to this extension. Whether or not it //should// be used like that is a different... [13:56:56] I also think there is a way to take a heapdump of nodejs if it is started with the right option, sending a specific signal would dump it [13:57:00] which would have the stacktrace [13:57:29] That'd help, yes [13:57:44] Chrome I don't know [13:57:52] I think it is started via Chromedriver [13:57:55] 06Project-Admins: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10780453 (10Pppery) It shouldn't. That's an unrelated codebase. T212165#10771377 was what made me finally file this, although it's been bothering me for a while. [13:58:04] which may or may not obey CHROMIUM_FLAGS [13:58:31] so maybe it is possible to set a flag that would have the chrome logs to be written under LOG_DIR or MW_LOG_DIR or simply the hardcoded /workspace/log [13:59:12] an alternative is to take some daemon system that watch nodejs process that have been running for too long and take a core dump of them (if that is at all possible?) [13:59:20] Or copy it over there? [13:59:55] (03CR) 10Kamila SoučkovΓ‘: [C:03+1] helm-linter: Remove duplicate update-alternatives for helm3 [integration/config] - 10https://gerrit.wikimedia.org/r/1140168 (https://phabricator.wikimedia.org/T387548) (owner: 10Jelto) [14:00:01] the core would be generted by the kernel which will write it somewhere on the host , probably under /var/tmp or similar [14:00:02] IIRC [14:00:45] maybe SIGTRAP would work [14:00:54] it dumps a Core rather than terminating [14:01:31] IIRC that is nodejs entering an infinite loop [14:01:45] well [14:01:50] some ajvascript code entering an infinite loop [14:02:05] which would reflect with a nodejs process using 100% of a CPU [14:03:25] also the https://plugins.jenkins.io/build-timeout/ is currently configured with a hard limit such as 30/40 or 60 minutes or whatever value [14:04:04] it has an other strategy which is to abort if the console has not received any output for X minutes [14:04:06] iirc [14:04:17] which would fit what is happening with the selenium jobs [14:12:58] We don't seem to have an infinite loop. There was no resource usage on agent 1040 while the job was stuck [14:13:36] (03PS1) 10Pppery: Add Kxeo to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/1140187 [14:13:57] (03PS2) 10Pppery: Zuul: Add Kxeo to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/1140187 [14:14:15] There's also T390125 [14:14:15] T390125: Find and document how to debug a NodeJS process on CI/Docker - https://phabricator.wikimedia.org/T390125 [14:21:04] (03Merged) 10jenkins-bot: release: Quibble 1.14.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140182 (https://phabricator.wikimedia.org/T378797) (owner: 10Hashar) [14:32:20] 06Project-Admins, 06ShoutWiki: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10780632 (10A_smart_kitten) Judging by , it looks like - prior to being renamed to `#ShoutWiki Calendar` in December 2016... [14:38:09] (03update) 10dancy: log.py: @version should be "1" [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/779 [14:40:14] (03merge) 10dancy: log.py: @version should be "1" [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/779 [14:40:15] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): Add browser notification when deployment is awaiting user interaction - https://phabricator.wikimedia.org/T392487#10780667 (10dancy) @jnuche I ran `sudo systemctl restart spiderpig-apiserver` just now to make this live. [14:44:06] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): Add browser notification when deployment is awaiting user interaction - https://phabricator.wikimedia.org/T392487#10780693 (10jnuche) > @jnuche I ran sudo systemctl restart spiderpig-apiserver just now to make this live. Thanks for doing... [14:58:56] (03CR) 10Hashar: [C:03+2] release: Start 1.14.1 cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140183 (owner: 10Hashar) [15:01:03] !log Tagged Quibble 1.14.0 @ 6d7c736d12daa7ea23b261ede02093f8fe7a83ae # T378797 T384927 T386691 [15:01:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:01:08] T378797: [SPIKE] Use PHPUnit test results cache timing data to distribute tests in parallel runs - https://phabricator.wikimedia.org/T378797 [15:01:09] T384927: Download combined phpunit.results.cache timing data and use it to create balanced split_groups - https://phabricator.wikimedia.org/T384927 [15:01:09] T386691: How to e2e/integration test simple Item search - https://phabricator.wikimedia.org/T386691 [15:01:40] ah [15:01:45] ./utils/update-quibble [15:02:54] (03PS1) 10Hashar: dockerfiles: update Quibble to 1.14.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1140203 [15:03:03] https://phabricator.wikimedia.org/F59587530 [15:03:04] :) [15:06:12] (03CR) 10Hashar: [C:03+2] dockerfiles: update Quibble to 1.14.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1140203 (owner: 10Hashar) [15:07:27] (03Merged) 10jenkins-bot: dockerfiles: update Quibble to 1.14.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1140203 (owner: 10Hashar) [15:18:32] 10GitLab (Infrastructure), 10Ceph, 06collaboration-services, 10Data-Persistence-Backup, and 2 others: Migrate gitlab storage to apus (also: backups from S3?) - https://phabricator.wikimedia.org/T378922#10780806 (10MatthewVernon) Two thoughts - first, sorry, I was rebooting all the things today because of T... [15:19:54] (03Merged) 10jenkins-bot: release: Start 1.14.1 cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/1140183 (owner: 10Hashar) [15:21:16] Project mediawiki-core-doxygen build #10058: 04FAILURE in 3 min 13 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10058/ [15:23:02] Project beta-code-update-eqiad build #545903: 04FAILURE in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545903/ [15:26:54] (03update) 10dancy: spiderpig-otp: Don't offer a code that will expire too soon [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/782 (https://phabricator.wikimedia.org/T392815) [15:26:55] (03open) 10dancy: spiderpig-otp: Don't offer a code that will expire too soon [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/782 (https://phabricator.wikimedia.org/T392815) [15:29:34] (03update) 10dancy: spiderpig-otp: Don't offer a code that will expire too soon [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/782 (https://phabricator.wikimedia.org/T392815) [15:33:02] Project beta-code-update-eqiad build #545904: 04STILL FAILING in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545904/ [15:45:04] Yippee, build fixed! [15:45:05] Project beta-code-update-eqiad build #545905: 09FIXED in 2 min 4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545905/ [15:50:41] !log Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/1140203 [15:50:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:52:41] (03approved) 10dancy: SpiderPig: auto select first backport search match [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/731 (https://phabricator.wikimedia.org/T392508) (owner: 10bd808) [16:04:45] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): Dismiss interaction notification if someone responded to the interaction - https://phabricator.wikimedia.org/T393026 (10dancy) 03NEW [16:05:13] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): Dismiss interaction notification if someone responded to the interaction - https://phabricator.wikimedia.org/T393026#10781025 (10dancy) [16:14:57] the Quibble 1.14.0 images have been built [16:16:26] (03PS1) 10Hashar: jjb: switch jobs to Quibble 1.14.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1140215 (https://phabricator.wikimedia.org/T378797) [16:22:09] (03CR) 10Hashar: "I am not deploying it this week since:" [integration/config] - 10https://gerrit.wikimedia.org/r/1140215 (https://phabricator.wikimedia.org/T378797) (owner: 10Hashar) [16:30:21] (03CR) 10Hashar: [C:03+2] Zuul: Add Kxeo to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/1140187 (owner: 10Pppery) [16:31:43] (03Merged) 10jenkins-bot: Zuul: Add Kxeo to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/1140187 (owner: 10Pppery) [16:32:58] (03open) 10jnuche: spiderpig: dismiss notifications when user selects interaction choice [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/783 (https://phabricator.wikimedia.org/T393026) [16:33:08] Yippee, build fixed! [16:33:09] Project mediawiki-core-doxygen build #10059: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10059/ [16:33:11] 10Release-Engineering-Team (Yak Shaving πŸƒπŸͺ’), 10Scap (SpiderPig πŸ•ΈοΈ): SpiderPig should support train deployments - https://phabricator.wikimedia.org/T392610#10781219 (10thcipriani) a:03dduvall [16:34:23] (03update) 10jnuche: spiderpig: dismiss notifications when user selects interaction choice [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/783 (https://phabricator.wikimedia.org/T393026) [16:35:23] (03update) 10jnuche: spiderpig: dismiss notifications when user selects interaction choice [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/783 (https://phabricator.wikimedia.org/T393026) [16:36:36] 10Scap (SpiderPig πŸ•ΈοΈ): Integrate mwscript-k8s with SpiderPig - https://phabricator.wikimedia.org/T392275#10781234 (10thcipriani) [16:45:24] (03approved) 10dancy: spiderpig: dismiss notifications when user selects interaction choice [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/783 (https://phabricator.wikimedia.org/T393026) (owner: 10jnuche) [16:46:20] (03merge) 10jnuche: spiderpig: dismiss notifications when user selects interaction choice [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/783 (https://phabricator.wikimedia.org/T393026) [16:49:27] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 06collaboration-services: Upgrade phab (phorge) hosts to bookworm - https://phabricator.wikimedia.org/T372619#10781316 (10Aklapper) I feel that once T386558 has been resolved this very task could be looked into. I have not ran into remaining PHP 8.x issues local... [17:02:40] (03PS1) 10Sbisson: WikimediaMessages: add cldr as phan dependency [integration/config] - 10https://gerrit.wikimedia.org/r/1140208 (https://phabricator.wikimedia.org/T391230) [17:07:46] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 06collaboration-services: Upgrade phab (phorge) hosts to bookworm - https://phabricator.wikimedia.org/T372619#10781371 (10Dzahn) There is T377889 (phab1005) which is separate new hardware. We can use it to test that without having to touch existing servers. [17:08:24] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 06collaboration-services: Upgrade phab (phorge) hosts to bookworm - https://phabricator.wikimedia.org/T372619#10781373 (10Dzahn) [17:12:53] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034 (10thcipriani) 03NEW [17:12:57] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781429 (10thcipriani) p:05Triageβ†’03Unbreak! [17:13:01] Project beta-code-update-eqiad build #545914: 04FAILURE in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545914/ [17:13:26] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781432 (10hashar) [17:13:27] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 07Essential-Work, 13Patch-For-Review, 05Release, 05Train Deployments: 1.44.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T386222#10781431 (10hashar) [17:18:03] Project mediawiki-core-doxygen build #10060: 04FAILURE in 0.61 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10060/ [17:23:01] Project beta-code-update-eqiad build #545915: 04STILL FAILING in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545915/ [17:33:02] Project beta-code-update-eqiad build #545916: 04STILL FAILING in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545916/ [17:43:02] Project beta-code-update-eqiad build #545917: 04STILL FAILING in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545917/ [17:53:02] Project beta-code-update-eqiad build #545918: 04STILL FAILING in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545918/ [18:03:02] Project beta-code-update-eqiad build #545919: 04STILL FAILING in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545919/ [18:13:01] Project beta-code-update-eqiad build #545920: 04STILL FAILING in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545920/ [18:18:03] Project mediawiki-core-doxygen build #10061: 04STILL FAILING in 0.63 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10061/ [18:23:01] Project beta-code-update-eqiad build #545921: 04STILL FAILING in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545921/ [18:33:01] Project beta-code-update-eqiad build #545922: 04STILL FAILING in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545922/ [18:36:59] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781753 (10LSobanski) The current hypothesis is that during the DNS change both hosts were considered to be primary and unexpected replication... [18:43:02] Project beta-code-update-eqiad build #545923: 04STILL FAILING in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545923/ [18:53:04] Project beta-code-update-eqiad build #545924: 04STILL FAILING in 4.2 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545924/ [18:54:19] !log Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad while Gerrit is down. [18:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:57:14] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781805 (10ssingh) >>! In T393034#10781753, @LSobanski wrote: > The current hypothesis is that during the DNS change both hosts were considered... [19:16:58] PROBLEM - gerrit process on gerrit2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [19:18:03] Project mediawiki-core-doxygen build #10062: 04STILL FAILING in 0.35 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10062/ [19:18:31] FIRING: [4x] ProbeDown: Service gerrit2002:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:18:39] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T393050 (10phaultfinder) 03NEW [19:22:09] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T393050#10781848 (10ABran-WMF) [19:23:24] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T393050#10781854 (10ABran-WMF) 05Openβ†’03Resolved p:05Triageβ†’03Medium a:03ABran-WMF [[ https://docs.google.com/document/d/1kh6vYGLdGIEpN-EsUaXb6u82gNW5TvBkoI_yCPjB6_8/edit?tab=t.0 | Incident documen... [19:43:00] 06Project-Admins, 06Security-Team, 10Vulnerability Management, 07SecTeam-Processed, 07Security: Add numerous Security Vuln-* Tags in Phabricator - https://phabricator.wikimedia.org/T387508#10781905 (10Pppery) Anything left to do here? [19:49:14] (03open) 10dancy: spiderpig-otp: Add --qr flag to generate QR code [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/784 [19:49:17] (03update) 10dancy: spiderpig-otp: Add --qr flag to generate QR code [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/784 [19:54:21] (03update) 10dancy: spiderpig-otp: Add --qr flag to generate QR code [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/784 [19:56:49] (03update) 10dancy: spiderpig-otp: Add --qr flag to generate QR code [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/784 [20:06:27] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781941 (10Peachey88) [20:18:03] Project mediawiki-core-doxygen build #10063: 04STILL FAILING in 0.79 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10063/ [20:29:09] 06Project-Admins, 06ShoutWiki: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10781983 (10A_smart_kitten) Ah, I think I might have found the context behind the tag being renamed: T154242#2905231 [20:31:10] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10781992 (10LSobanski) Replication from gerrit2002 -> gerrit1003 has affected two repos outside of draft commentsβ€”SmashP... [20:33:30] 06Project-Admins, 06ShoutWiki: Create project tag for calendar-Wikivoyage extension - https://phabricator.wikimedia.org/T393011#10781994 (10Pppery) These two extensions seem to have been conflated from very far back. My guess is the current project labeled #shoutwiki_calendar was originally intended to be for... [21:07:24] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10782114 (10LSobanski) Affected change in SmashPig was abandoned, this leaves operations/mediawiki-config as the remaini... [21:10:38] 06Project-Admins, 06Security-Team, 10Vulnerability Management, 07SecTeam-Processed, 07Security: Add numerous Security Vuln-* Tags in Phabricator - https://phabricator.wikimedia.org/T387508#10782115 (10sbassett) 05Openβ†’03Resolved p:05Triageβ†’03Medium >>! In T387508#10781905, @Pppery wrote: > An... [21:16:16] (03PS1) 10Hashar: Review access change [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1140241 [21:16:54] (03PS2) 10Hashar: Allow force push to reconstruct repo [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1140241 (https://phabricator.wikimedia.org/T393034) [21:17:04] (03CR) 10Hashar: [V:03+2 C:03+2] Allow force push to reconstruct repo [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1140241 (https://phabricator.wikimedia.org/T393034) (owner: 10Hashar) [21:18:51] (03PS1) 10Hashar: Revert "Allow force push to reconstruct repo" [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1140242 (https://phabricator.wikimedia.org/T393034) [21:19:01] (03CR) 10Hashar: [V:03+2 C:03+2] Revert "Allow force push to reconstruct repo" [mediawiki-config] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1140242 (https://phabricator.wikimedia.org/T393034) (owner: 10Hashar) [21:20:55] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10782142 (10Krinkle) [21:29:54] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10782182 (10hashar) [21:32:08] Yippee, build fixed! [21:32:08] Project mediawiki-core-doxygen build #10064: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/10064/ [21:46:24] I did 8:30 -> 23:30. That feels like good old days again :b [21:48:59] RECOVERY - gerrit process on gerrit2002 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [21:53:31] RESOLVED: [2x] ProbeDown: Service gerrit2002:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#gerrit2002:29418 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:18:38] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07Wikimedia-Incident: Investigate out of date refs following gerrit switchover - https://phabricator.wikimedia.org/T393034#10782329 (10Dzahn) I linked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1140250 to the wrong old ticket. Should... [23:30:36] 10Beta-Cluster-Infrastructure, 10observability, 13Patch-For-Review: Deployment-prep should host its own statsd/Prometheus server - https://phabricator.wikimedia.org/T241285#10782463 (10Krinkle) [23:46:59] !log Re-enabled https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/ [23:47:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:55:16] Yippee, build fixed! [23:55:16] Project beta-code-update-eqiad build #545925: 09FIXED in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/545925/