[00:04:01] 10GitLab (Upstream pit of despair 🕳️), 10Release-Engineering-Team (Radar), 07Upstream: GitLab group permissions are not inherited by sub-groups for groups of users invited to the parent repo - https://phabricator.wikimedia.org/T300939#10655090 (10bd808) >>! In T300939#10141337, @brennen wrote: > [[https://gi... [00:04:36] 10GitLab (Upstream pit of despair 🕳️), 10Release-Engineering-Team (Radar), 07Upstream: GitLab group permissions are not inherited by sub-groups for groups of users invited to the parent repo - https://phabricator.wikimedia.org/T300939#10655101 (10bd808) 05Stalled→03Open [07:40:11] 10Gerrit: Gerrit's syntax highlighting for PHP code breaks when encountering an apostrophe in a // comment in a function call - https://phabricator.wikimedia.org/T372404#10655929 (10kostajh) Thanks for tracking this down. This would be nice to fix, as the status quo makes it much more difficult to efficiently re... [08:10:03] (03PS1) 10Gergő Tisza: Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 [08:16:04] 10Continuous-Integration-Config, 07ci-test-error: mediawiki/operations-config patches with non-empty diffConfig output failing - https://phabricator.wikimedia.org/T389460 (10Tgr) 03NEW [08:16:23] 10Continuous-Integration-Config, 07ci-test-error: mediawiki/operations-config patches with non-empty diffConfig output failing - https://phabricator.wikimedia.org/T389460#10656001 (10Tgr) Probably caused by {c3e3056d6a4b80afc930a8c627e2e43393c9a079}? [08:17:03] (03PS2) 10Gergő Tisza: Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 (https://phabricator.wikimedia.org/T389460) [08:18:21] (03CR) 10CI reject: [V:04-1] Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 (https://phabricator.wikimedia.org/T389460) (owner: 10Gergő Tisza) [08:22:39] 10Continuous-Integration-Config, 07ci-test-error, 13Patch-For-Review: mediawiki/operations-config patches with non-empty diffConfig output failing - https://phabricator.wikimedia.org/T389460#10656022 (10Tgr) p:05Triage→03Unbreak! [08:23:26] o/ anyone around who can unbreak CI for mediawiki/operations-config? [08:24:15] (03PS3) 10Gergő Tisza: Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 (https://phabricator.wikimedia.org/T389460) [08:26:21] (03CR) 10Majavah: [C:03+2] Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 (https://phabricator.wikimedia.org/T389460) (owner: 10Gergő Tisza) [08:27:54] (03Merged) 10jenkins-bot: Copy diffConfig 7.4 settings for 8.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1129765 (https://phabricator.wikimedia.org/T389460) (owner: 10Gergő Tisza) [08:27:56] thanks taavi! [08:28:24] !log reloading zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1129765 [08:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:28:36] tgr_: done [08:33:40] (03PS1) 10Majavah: Archive operations/debs/bdsync [integration/config] - 10https://gerrit.wikimedia.org/r/1129768 (https://phabricator.wikimedia.org/T377882) [08:47:52] 10Continuous-Integration-Config, 07ci-test-error: mediawiki/operations-config patches with non-empty diffConfig output failing - https://phabricator.wikimedia.org/T389460#10656082 (10Tgr) 05Open→03Resolved a:03Tgr [08:48:34] (03CR) 10Gergő Tisza: "Caused T389460 (I think)." [integration/config] - 10https://gerrit.wikimedia.org/r/1129364 (owner: 10Reedy) [08:50:11] 10Phabricator: rewrite "Feature request" maniphest form to follow the what/why pattern - https://phabricator.wikimedia.org/T387072#10656104 (10Aklapper) Yeah, I'm open to this. Ideally I'd love the current situation/problem to be described (what _is_, and not what _is not_, but that seems hard to express in Engl... [09:03:19] 06Gerrit-Privilege-Requests, 10MediaWiki-extensions-Translate: Request membership in extension-Translate group for jhsoby - https://phabricator.wikimedia.org/T385963#10656132 (10taavi) 05Open→03Resolved a:03taavi Done. [09:14:54] 10Phabricator: rewrite "Feature request" maniphest form to follow the what/why pattern - https://phabricator.wikimedia.org/T387072#10656178 (10Novem_Linguae) I'm not sure if "reproduce" is the right word for a feature request. I'd also be in favor of less rather than more explanatory text. I think any switch fr... [09:21:09] (03CR) 10Arthur taylor: [C:03+1] "oof - that's unfortunate. Seems like a safe change though, and the test looks good." [integration/quibble] - 10https://gerrit.wikimedia.org/r/1129251 (owner: 10Hashar) [09:44:13] 10Phabricator: Evaluate adding "In progress" status to Phabricator. - https://phabricator.wikimedia.org/T288956#10656260 (10Aklapper) Things often sound good in theory and do not work in practice. In order for this to work, folks would need to realize and reset the status when something is **not** being act... [10:05:34] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: PHP on Beta cluster fails due to mismatching PCRE - https://phabricator.wikimedia.org/T387276#10656288 (10BTullis) Just as a note, I also had to upgrade these packages manually on the snapshot hosts, after applying this: [Upgrade snapshot hosts to... [10:09:17] (03CR) 10Kosta Harlan: Fix logging exception when using --resolve-requires (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/1129251 (owner: 10Hashar) [10:45:23] (03open) 10btullis: Add curl to the mediawiki-debug image for use with dumps [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/154 (https://phabricator.wikimedia.org/T352650 https://phabricator.wikimedia.org/T381473) [10:51:12] 10Continuous-Integration-Infrastructure, 06cloud-services-team, 10Cloud-VPS, 10Wikidata, and 2 others: Wikibase selenium tests timeout, seemingly due to "memory compaction" events on CI VMs - https://phabricator.wikimedia.org/T281122#10656478 (10zeljkofilipin) [11:24:27] 06Project-Admins: Create project tag for emi (etalemi-mingi) - https://phabricator.wikimedia.org/T383219#10656617 (10BamLifa) Hi @Aklapper, I'm not sure what I have to do here Trusted-contributors to help me create the project workboard. Can you help or guide me how I can do? [11:30:25] If anyone could find time to look at adding curl to mediawiki-debug, we'd be grateful. Thanks. https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/154 [11:53:18] 06Project-Admins: Create project tag for emi (etalemi-mingi) - https://phabricator.wikimedia.org/T383219#10656700 (10Aklapper) Please bring up general questions on https://www.mediawiki.org/wiki/Talk:Phabricator/Help as pinging individuals does not scale. See the text on https://phabricator.wikimedia.org/pro... [12:02:28] 06Gerrit-Privilege-Requests, 10MediaWiki-extensions-Translate: Request membership in extension-Translate group for jhsoby - https://phabricator.wikimedia.org/T385963#10656722 (10jhsoby) Thanks! [12:33:05] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 3 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484 (10Clement_Goubert) 03NEW [12:34:06] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 3 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10656829 (10Clement_Goubert) p:05Triage→03High [12:34:46] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 3 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10656831 (10Clement_Goubert) [12:39:54] (03close) 10btullis: Add curl to the mediawiki-debug image for use with dumps [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/154 (https://phabricator.wikimedia.org/T352650 https://phabricator.wikimedia.org/T381473) [12:43:33] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 3 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10656844 (10Clement_Goubert) [12:43:58] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 3 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10656845 (10Clement_Goubert) [12:48:24] 10Beta-Cluster-Infrastructure, 10Wikidata, 07Browser-Tests: Wikidata daily browser tests fails on Beta due to "Unable to store text to external storage" - https://phabricator.wikimedia.org/T242717#10656882 (10zeljkofilipin) [12:48:37] 10Release-Engineering-Team (Priority Backlog 📥), 13Patch-For-Review, 05Release, 05Train Deployments: 1.44.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T386216#10656885 (10jnuche) [12:58:38] 10Continuous-Integration-Config, 06Machine-Learning-Team, 10MediaWiki-Core-Tests, 10ORES, and 3 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#10656932 (10zeljkofilipin) [13:00:47] 10Release-Engineering-Team (Seen), 10Scap, 10Testing Support, 07Browser-Tests: Running smoke tests during deployment - https://phabricator.wikimedia.org/T187733#10656939 (10zeljkofilipin) [13:12:03] (03open) 10cgoubert: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) [13:57:04] 10Phabricator, 10Release-Engineering-Team (Doing 😎): Understand which Legalpad documents are still used - https://phabricator.wikimedia.org/T388962#10657164 (10Aklapper) 05Open→03Resolved a:03Aklapper 80 signatures added within the last 3 months, part of them likely spam accounts. I'll post some resu... [13:58:42] 10Phabricator, 06collaboration-services, 06Trust-and-Safety, 10Znuny: Sunset / Retire Legalpad Phabricator application? - https://phabricator.wikimedia.org/T363009#10657171 (10Aklapper) * L2 got sunset per T374406 and T349595 * L3 is linked from https://wikitech.wikimedia.org/wiki/SRE/Production_access , h... [14:00:33] 10Phabricator, 13Patch-For-Review, 07PM: Reset status of tickets that languish in "in progress" - https://phabricator.wikimedia.org/T380300#10657180 (10Aklapper) (I remember that I was not convinced when the "In Progress" task status was created in T288956 and I am still not convinced.) I've reset about 50... [14:05:28] What's going on with CI? https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4439/consoleFull <-- seems like a test timed out 20 minutes ago and then nothing else happened [14:05:48] And for context, this job is for the first patch in the gate-and-submit queue, so it's blocking the entire queue [14:06:51] And the queue itself is quite long right now, so it would help if it didn't wait for 20 minutes while doing nothing [14:17:44] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10657222 (10Daimona) p:05Triage→03Unbreak! Okay, current sta... [14:18:22] Alright, I UBN'ed T388416 because I've been seeing a bazillion of timeouts over the last couple weeks, and this really is just the straw that broke etc. [14:18:22] T388416: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416 [14:37:20] Daimona: that happened yesterday as well – if you see it in time, just abort the job IMHO [14:37:38] I wanted to give people a chance to SSH into the machine and see what was going on [14:38:13] But that chance is gone now. Nonetheless, I guess it won't take long before it happens again. [14:38:27] I also asked yesterday in here if it was worth decreasing the timeout from 60 minutes to e.g. 45 or 30 [14:38:38] doesn’t solve the timeouts but at least reduces the harm they do to the rest of the queue [14:39:56] I would really just like to figure out what's going on. When I filed T380061 two weeks ago I thought it might have been an impression, but by now it's clear that it isn't. CI really is timing out a lot lately. [14:39:57] T380061: Flaky selenium test "Failed to wait for mediawiki.base to be ready" (tests/selenium/specs/page.js) - https://phabricator.wikimedia.org/T380061 [14:40:05] The problem is, I'm not even sure where to start. [14:41:44] I'm around now so I can ssh somewhere if needed. [14:43:09] That build eventually timed out unfortunately :/ I don't know if there's a way to gather useful information now [14:45:43] * dancy shakes a fist at selenium tests. [14:45:45] Also taking a look at the load graphs https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=integration&var-instance=integration-agent-docker-1051&from=1742468344718&to=1742481844901 [14:50:15] The job got stuck at 13:40Z, there's an IO spike and a network spike for that exact time, but they're not the only ones even in a 4h window [14:50:45] 10Phabricator, 13Patch-For-Review, 07PM: Reset status of tickets that languish in "in progress" - https://phabricator.wikimedia.org/T380300#10657300 (10MBinder_WMF) This makes a lot of sense to me! I think it would also encourage breaking tasks into smaller pieces (for instance, in such cases where work take... [14:52:05] (All of this while I was trying to debug a different mysterious selenium timeout. Selenium really trying hard to be as annoying as it can be.) [14:53:31] And obviously, the one damned time you need it to fail, it will happily pass 100 times out of 100. [14:53:43] (03open) 10jnuche: jenkins-rel: update plugins to address vulnerabilities [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/99 (https://phabricator.wikimedia.org/T389362) [14:56:56] (03merge) 10jnuche: jenkins-rel: update plugins to address vulnerabilities [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/99 (https://phabricator.wikimedia.org/T389362) [14:59:34] There is an interesting long queue of castor-save-workspace-cache jobs. [14:59:57] About 29 queued up [15:03:34] I noticed that those tend to queue up a lot when there's a lot of stuff in gate-and-submit. But it's generally not a problem under normal circumstances [15:04:07] Hey :) would someone versed in scap have some time to help with https://phabricator.wikimedia.org/T389484 (once done with CI UBN obviously) [15:04:40] claime: I'm reviewing that right now. [15:04:45] dancy: <3 [15:05:10] I think there's some stuff to change in scap's code itself as well right [15:05:15] yep [15:05:15] but I got lost in it :D [15:05:26] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for davidcoronel - https://phabricator.wikimedia.org/T389444#10657365 (10DidiCoronel) Just requested Toolforge membership. Will wait to see how it goes. Thanks! [15:08:27] (03update) 10cgoubert: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) [15:10:05] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10657386 (10Jdforrester-WMF) > so these patches will be merged a... [15:10:17] dancy: I would also need to add something to DeploymentsConfig and its handler to say that specific deployments should use that image on top of adding the flag? [15:11:26] no train log triage today? [15:11:27] claime: I'm not sure about that part of things. I think swfrench-wmf could give you a better answer about that part of things. [15:11:44] ack [15:13:05] * swfrench-wmf hides [15:13:11] !log Rebooting integration-agent-docker-1046 (Seems to be be inaccessible since February) [15:13:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:15:17] my vote would be that we deprecate the `debug` boolean for selecting debug image and replace it with an enum-ish (`mw_kind`?) that's one of 'production' (default) 'debug' or 'cli' [15:15:31] (03open) 10annet: releases: Bump Codex to 1.21.1 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/65 (https://phabricator.wikimedia.org/T389094) [15:16:23] that would then be used to select which image "kind" to use, where "flavour" remains a build-arg variant on top of that [15:16:29] claime: thoughts? ^ [15:20:38] 10Continuous-Integration-Infrastructure: integration-agent-docker-1046 not accessible - https://phabricator.wikimedia.org/T389495 (10dancy) 03NEW [15:22:29] (03update) 10cgoubert: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) [15:22:38] (03update) 10cgoubert: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) [15:22:57] (03update) 10dancy: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) (owner: 10cgoubert) [15:22:58] swfrench-wmf: That sounds good yeah [15:23:27] I'm gonna push the merge request that allows for building the image [15:23:38] we can then see for the deployment side of thing [15:23:40] s [15:24:05] Cool. I can do the scap release right after that. [15:25:46] (03open) 10cgoubert: kubernetes.py: Build mediawiki-cli image [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/693 (https://phabricator.wikimedia.org/T389484) [15:25:59] (train log triage sorted out) [15:26:13] (thanks to jnuch e for the train log summary) [15:26:51] (03update) 10cgoubert: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) [15:26:54] (03update) 10cgoubert: kubernetes.py: Build mediawiki-cli image [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/693 (https://phabricator.wikimedia.org/T389484) [15:29:46] claime: sounds good, I can give this some thought later today and put together a proof of concept [15:29:52] swfrench-wmf: <3 [15:31:51] claime: Ready for https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 to be merged? [15:33:07] yeah [15:33:19] (03approved) 10dancy: kubernetes.py: Build mediawiki-cli image [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/693 (https://phabricator.wikimedia.org/T389484) (owner: 10cgoubert) [15:33:24] (03approved) 10dancy: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) (owner: 10cgoubert) [15:33:29] (03merge) 10dancy: mediawiki-cli: Create new image [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/155 (https://phabricator.wikimedia.org/T389484) (owner: 10cgoubert) [15:33:54] Merged. Ready for a `scap sync-world` test which I'll leave to you. [15:34:20] ack [15:35:21] (03merge) 10dancy: kubernetes.py: Build mediawiki-cli image [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/693 (https://phabricator.wikimedia.org/T389484) (owner: 10cgoubert) [15:36:47] (03open) 10dancy: Release 4.142.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/694 [15:38:07] 15:37:29 [tag-latest] Running sudo /usr/local/bin/docker-pusher -q docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli:latest [15:38:12] looks like that worked [15:38:44] Excellent [15:39:58] ty dancy <3 [15:43:06] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 4 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10657560 (10Clement_Goubert) Build succeeded, the image is currently being tested by @brouberol If it's conclusive, we can resol... [15:45:51] 10Continuous-Integration-Infrastructure: integration-agent-docker-1046 not accessible - https://phabricator.wikimedia.org/T389495#10657574 (10dancy) [15:48:20] 06Release-Engineering-Team, 10MW-on-K8s, 06serviceops: Refactor scap's kubernetes DeploymentsConfig - https://phabricator.wikimedia.org/T389499 (10Clement_Goubert) 03NEW [15:48:35] (03merge) 10lwatson: releases: Bump Codex to 1.21.1 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/65 (https://phabricator.wikimedia.org/T389094) (owner: 10annet) [15:53:06] 06Release-Engineering-Team, 06Data-Engineering, 10Dumps-Generation, 06Experimentation Lab, and 4 others: Create a mediawiki-cli image - https://phabricator.wikimedia.org/T389484#10657651 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Image has what's needed for `dumps`, resolving. [15:53:53] (03merge) 10dancy: Release 4.142.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/694 [16:00:50] 10Continuous-Integration-Infrastructure: integration-agent-docker-1046 not accessible - https://phabricator.wikimedia.org/T389495#10657684 (10dancy) 05Open→03Resolved a:03dancy Resolved by @Andrew. Thank you! Notes from @Andrew in IRC: ` 8:58 AM I migrated it to a new hypervisor which c... [16:21:26] We have another one: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4492/console [16:22:00] dancy: you still around / able to SSH into that machine? [16:23:08] I am. [16:23:35] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure), 13Patch-For-Review: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10657767 (10Daimona) >>! In T388416#106573... [16:24:03] Logging into `integration-agent-docker-1040.integration` [16:25:03] maintenance-disconnect-full-disks build 685768 integration-agent-docker-1041 (/: 26%, /srv: 96%, /var/lib/docker: 35%): OFFLINE due to disk space [16:25:03] Thank you! We have about 14 minutes before it gets killed. [16:27:01] There are 4 jobs running on that host right now... ffmpeg processes taking most of the CPU time. [16:27:31] Oooooooh that's interesting. [16:27:45] e.g. `ffmpeg -f x11grab -video_size 1280x1024 -i :94 -loglevel error -y -pix_fmt yuv420p /workspace/log/Verify-checkuser-can-make-checks%3A-Should-be-able-to-run-'Get-actions'-check-2025-03-20T15-49-34-031Z.mp4` [16:28:39] The system does not appear to be overloaded in any way. [16:28:53] Yeah, so, it's still recording the screen for the selenium tests. Apparently we are not correctly terminating ffmpeg when selenium times out then? [16:29:48] Hmm. unclear to me at this point. I'd have to find a way to associate each ffmpeg process with a specific job. [16:30:03] maintenance-disconnect-full-disks build 685769 integration-agent-docker-1041 (/: 26%, /srv: 93%, /var/lib/docker: 32%): RECOVERY disk space OK [16:31:02] The mp4 file name matches the last test run in https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php81/4492/consoleFull [16:31:12] Can you try terminating ffmpeg manually and see if it unlocks the job? [16:31:17] sure. [16:31:48] ok. I killed the Get-Actions one. [16:31:48] dancy: is there a way to build the new mediawiki-cli image (to include a recent patch in the dumps codebase) without triggering a full MW prod redeploy? [16:32:04] or am I looking at `scap sync-world --k8s-only` ? [16:33:03] brouberol: Try `scap build-images` (a new thing) [16:33:26] oooh A New Thing :D [16:33:40] I like new things [16:33:41] thanks! [16:34:43] Can you kill the other ones too? There should be a "get ips" and a "get users" [16:35:02] maintenance-disconnect-full-disks build 685770 integration-agent-docker-1041 (/: 26%, /srv: 98%, /var/lib/docker: 32%): OFFLINE due to disk space [16:35:26] Daimona: Done [16:35:43] Thank you. The job is still stuck unfortunately. [16:36:09] I need new ffmpeg processes being created and destroyed [16:36:13] *I see new... [16:36:24] How long do they stay alive? [16:36:46] Around 10 seconds. [16:36:49] Anything below say 15 seconds should be OK [16:37:04] Those could be from other jobs I guess? [16:37:37] Quite possible. Unclear on their origin. [16:38:01] I'll see if I can grab the environment variables from one [16:38:20] but they appear to have stopped for the time being [16:40:03] Yeah that sounds normal. But even if the job is still stuck, I'm of the idea that we're not properly terminating ffmpeg [16:40:03] maintenance-disconnect-full-disks build 685771 integration-agent-docker-1041 (/: 26%, /srv: 92%, /var/lib/docker: 32%): RECOVERY disk space OK [16:40:11] 10Phabricator, 10CAS-SSO, 10wikitech.wikimedia.org, 07LDAP: Password reset not working for uid=maskaret,ou=people,dc=wikimedia,dc=org account - https://phabricator.wikimedia.org/T389496#10657815 (10bd808) Ok there is funkiness but it may just be confusion about SUL account vs Developer account naming. @Gry... [16:40:52] 10Phabricator, 10CAS-SSO, 10wikitech.wikimedia.org, 07LDAP: Password reset not working as expected for Gryllida's Developer account - https://phabricator.wikimedia.org/T389496#10657818 (10bd808) [16:41:22] Alright now the build timed out [16:41:41] * dancy This is what was running inside of the container before the job finished: https://phabricator.wikimedia.org/P74283 [16:42:05] I made a patch that triggers a timeout on purpose: it sounds like it might be consistently reproducible, so let's see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1129873/1/tests/selenium/specs/page.js [16:42:57] Thank you. I'll take a look later today and try to collect my thoughts in a comment on the task. [16:43:10] OK. Good luck! [16:54:42] (03open) 10jforrester: Stop branching Graph for production, no longer used [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/156 (https://phabricator.wikimedia.org/T362317) [16:55:13] dancy: sorry to be a pain. I ran scap build-images from the deployment server but it seems that all layers were cached, and that the recent dumps changes were not pulled in the newly published image. Did I mess up something or is that expected? [16:55:29] (03update) 10jforrester: Draft: Stop branching Graph for production, no longer used [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/156 (https://phabricator.wikimedia.org/T362317) [16:56:21] brouberol: Taking a look [16:56:57] the related build logs are in my home on the deployment server,. but this is the snippet for the affected layer [16:56:57] 16:53:06 [mediawiki-publish-81] Step 5/16 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates git && cd /usr/src && git clone https://gerrit.wikimedia.org/r/operations/dumps && cd dumps && rm -rf .git [16:56:58] 16:53:06 [mediawiki-publish-81] ---> Using cache [16:56:58] 16:53:06 [mediawiki-publish-81] ---> 984268ba3f1c [16:57:56] brouberol: I just now deleted the related base image, so a re-run should pick up the changes to operations/dumps [16:58:30] is that something I should do whenever I rebuild the -cli image? [16:59:05] (we're currently in the middle of a very iterative process of "deploy / new bug / fix bug / deploy" so might have to rebuild this image a few times) [16:59:16] Seems like we need a specific flag to `build-images` to do what you need so you don't have to deal w/ the details. [17:00:14] But for the time being, yes.. you can use `docker image ls -f label=org.wikimedia.mediawiki-dumps` to locate the base image(s) and then use `docker image rm ` (e.g. `docker image rm 984268ba3f1c a19391731a08 2d46d1933d51`) [17:00:49] thank you! And indeed, the build logs show apt install logs, so that worked [17:00:56] m(_ _)m [17:09:15] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure), 13Patch-For-Review: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10658030 (10Daimona) Preliminary findings... [17:10:03] maintenance-disconnect-full-disks build 685777 integration-agent-docker-1052 (/: 26%, /srv: 98%, /var/lib/docker: 77%): OFFLINE due to disk space [17:11:05] Daimona: Dealing w/ the leaked ffmpeg processes is still probably a worthwhile thing to fix, since they consume a CPU that could be used for processing other jobs on the same node. [17:11:56] Yeah definitely! I'm still not sure if it's ffmpeg, but at least now I know where to look thanks to the data you collected. Once we figure that out we can think about the general slowness and other failures. [17:12:27] Sounds good. I'm stepping out for a break now. Happy to help out more later. [17:14:53] Yeah, me too. I have a hard limit on the time that I can spend yelling at selenium non-stop. [17:15:02] maintenance-disconnect-full-disks build 685778 integration-agent-docker-1052 (/: 26%, /srv: 40%, /var/lib/docker: 76%): RECOVERY disk space OK [17:33:02] 10Release-Engineering-Team (Priority Backlog 📥), 10Scap (SpiderPig 🕸️), 06collaboration-services: Scap SpiderPig: Routing for the web frontend - https://phabricator.wikimedia.org/T383946#10658168 (10Dzahn) On a deployment server in cloud VPS: ` Error 500 on SERVER: Server Error: Evaluation Error: Error whi... [17:33:04] 10Beta-Cluster-Infrastructure, 10Testing Support, 07Browser-Tests, 07OKR-Work: Make selenium users use botflags at beta-cluster by setting the bot flag (passing "bot=1" parameter) when saving edits - https://phabricator.wikimedia.org/T116027#10658169 (10zeljkofilipin) [17:34:35] 06Release-Engineering-Team, 10MW-on-K8s, 06serviceops: Refactor scap's kubernetes DeploymentsConfig - https://phabricator.wikimedia.org/T389499#10658190 (10Scott_French) Taking a step back, there are a couple of ways we could go about this. IMO, the two most obvious are as follows: **One option** is what we... [17:35:03] maintenance-disconnect-full-disks build 685782 integration-agent-docker-1048 (/: 26%, /srv: 96%, /var/lib/docker: 34%): OFFLINE due to disk space [17:40:03] maintenance-disconnect-full-disks build 685783 integration-agent-docker-1048 (/: 26%, /srv: 8%, /var/lib/docker: 33%): RECOVERY disk space OK [17:45:07] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for [YOUR DEVELOPER ACCOUNT USERNAME HERE] - https://phabricator.wikimedia.org/T389519 (10cwylo) 03NEW [17:56:42] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10Testing Support, 07Browser-Tests: Jenkins selenium job should fail when all tests are skipped - https://phabricator.wikimedia.org/T324480#10658450 (10zeljkofilipin) [17:59:24] 06Release-Engineering-Team, 10MW-on-K8s, 06serviceops: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds - https://phabricator.wikimedia.org/T389499#10658458 (10Scott_French) [18:02:08] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Tests, 10Testing Support: Run Selenium tests targeting Beta cluster - https://phabricator.wikimedia.org/T373680#10658502 (10zeljkofilipin) [18:04:08] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10Testing Support: Jenkins selenium job should fail when all tests are skipped - https://phabricator.wikimedia.org/T324480#10658539 (10zeljkofilipin) [18:07:19] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10Testing Support, 13Patch-For-Review, 10Quality-and-Test-Engineering-Team (Test Infrastructure): Make MediaWiki Wdio tests less slow (Sept 2019) - https://phabricator.wikimedia.org/T234002#10658593 (10zeljkofilipin) [18:07:31] 10Beta-Cluster-Infrastructure, 10Testing Support, 07OKR-Work: Make selenium users use botflags at beta-cluster by setting the bot flag (passing "bot=1" parameter) when saving edits - https://phabricator.wikimedia.org/T116027#10658601 (10zeljkofilipin) [18:07:43] 10Release-Engineering-Team (Seen), 10Scap, 10Testing Support: Running smoke tests during deployment - https://phabricator.wikimedia.org/T187733#10658605 (10zeljkofilipin) [18:08:00] 10Continuous-Integration-Config, 06Machine-Learning-Team, 10MediaWiki-Core-Tests, 10ORES, and 2 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939#10658603 (10zeljkofilipin) [18:08:36] 10Release-Engineering-Team (Radar), 10Testing Support, 07Epic, 07Testing-Roadblocks: Create and run a suite of end-to-end tests for the Wikimedia environment - https://phabricator.wikimedia.org/T248683#10658624 (10zeljkofilipin) [18:09:20] 06Release-Engineering-Team, 10MW-on-K8s, 06serviceops: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds - https://phabricator.wikimedia.org/T389499#10658653 (10Scott_French) [18:10:21] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10Testing Support, 13Patch-For-Review: Make MediaWiki Wdio tests less slow (Sept 2019) - https://phabricator.wikimedia.org/T234002#10658677 (10zeljkofilipin) [18:26:04] (03update) 10dancy: Fix test-train now that TrainInfo() requires an io [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/679 (owner: 10hashar) [18:27:02] (03approved) 10dancy: Fix test-train now that TrainInfo() requires an io [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/679 (owner: 10hashar) [18:28:17] (03merge) 10dancy: Fix test-train now that TrainInfo() requires an io [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/679 (owner: 10hashar) [18:29:04] (03update) 10dancy: Allow multiple kubernetes clusters to be used [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/681 (https://phabricator.wikimedia.org/T388761) (owner: 10oblivian) [18:29:55] (03update) 10dancy: Allow multiple kubernetes clusters to be used [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/681 (https://phabricator.wikimedia.org/T388761) (owner: 10oblivian) [18:34:54] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for [YOUR DEVELOPER ACCOUNT USERNAME HERE] - https://phabricator.wikimedia.org/T389519#10658791 (10Aklapper) @cwylo: Please link your developer/LDAP account https://ldap.toolforge.org/user/cwylo at https://phabricator... [18:38:53] (03update) 10dancy: Allow multiple kubernetes clusters to be used [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/681 (https://phabricator.wikimedia.org/T388761) (owner: 10oblivian) [18:39:33] (03merge) 10dancy: Allow multiple kubernetes clusters to be used [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/681 (https://phabricator.wikimedia.org/T388761) (owner: 10oblivian) [18:46:32] (03open) 10dancy: Release 4.143.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/695 [18:57:41] (03merge) 10dancy: Release 4.143.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/695 [19:08:55] 10Continuous-Integration-Config: Provide a way for "bot" patch providers (like LibIp or lsc) to be low-priority, and for zuul to try to merge them later after human-C+2'ed patches - https://phabricator.wikimedia.org/T389535 (10Jdforrester-WMF) 03NEW [19:10:03] maintenance-disconnect-full-disks build 685801 integration-agent-docker-1055 (/: 26%, /srv: 99%, /var/lib/docker: 39%): OFFLINE due to disk space [19:11:58] 10Phabricator: Evaluate adding "In progress" status to Phabricator. - https://phabricator.wikimedia.org/T288956#10658947 (10leila) I have a related question for you, @Aklapper, which is low priority : can the status be disabled on a board by board basis? (It has started to create some confusions on my teams'... [19:13:06] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10ci-test-error (WMF-deployed Build Failure): Selenium timeouts can cause the job to remain stuck until the build times out - https://phabricator.wikimedia.org/T389536#10658953 (10Daimona) [19:14:14] (03PS1) 10Dduvall: Avoid success cache key data collisions using null separator [integration/quibble] - 10https://gerrit.wikimedia.org/r/1129916 [19:14:21] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure), 13Patch-For-Review: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10658964 (10Daimona) p:05Unbreak!→03Hig... [19:15:02] maintenance-disconnect-full-disks build 685802 integration-agent-docker-1055 (/: 26%, /srv: 12%, /var/lib/docker: 36%): RECOVERY disk space OK [19:15:48] (03PS2) 10Dduvall: Avoid success cache key data collisions using null separator [integration/quibble] - 10https://gerrit.wikimedia.org/r/1129916 [19:16:05] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10ci-test-error (WMF-deployed Build Failure): Selenium timeouts can cause the job to remain stuck until the build times out - https://phabricator.wikimedia.org/T389536#10658978 (10Umherirrender) Sometimes selenium jobs hanging without timeout m... [20:47:04] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure), 13Patch-For-Review: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10659281 (10Daimona) [20:47:22] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10ci-test-error (WMF-deployed Build Failure): Selenium timeouts can cause the job to remain stuck until the build times out - https://phabricator.wikimedia.org/T389536#10659284 (10Daimona) [20:47:27] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure), 13Patch-For-Review: CI jobs failing with various timeouts (March 2025) - https://phabricator.wikimedia.org/T388416#10659283 (10Daimona) [21:25:03] !log civicrm upgraded from 7b532ad7 to fba4c3d6 [21:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:25:08] opps [21:27:12] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554 (10thcipriani) 03NEW [21:31:23] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659517 (10thcipriani) I'm making integration-agent-docker-(1060-1062) Brennen is making integration-agent-docker-(1063–1065) [21:41:39] !log integration: launched integration-agent-docker-106{3,4,5} (T389554) [21:41:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:41:41] T389554: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554 [21:47:00] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659605 (10Reedy) How long is a long time? [21:49:41] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for cwylo - https://phabricator.wikimedia.org/T389519#10659653 (10Reedy) [22:41:31] 06Project-Admins: Create project tag for MediaWiki-extensions-BoilerPlate - https://phabricator.wikimedia.org/T389568 (10apaskulin) 03NEW [22:50:38] !log integration: added jenkins nodes for integration-agent-docker-106{3,4,5} with 3 executors per each (T389554) [22:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:50:40] T389554: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554 [22:57:36] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659940 (10thcipriani) >>! In T389554#10659605, @Reedy wrote: > How long is a long time? Looking at the [[https://grafana.wmcloud.org/d/0g9N-7pVz/... [22:59:10] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659942 (10brennen) It looks like these are picking up jobs. Will have to monitor and make sure they don't blow up in some unusual way. [23:01:16] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659946 (10thcipriani) 05Open→03Resolved a:03thcipriani Whilst I failed to log mine (unlike @brennen :D), mine are also launched. I also... [23:01:33] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554#10659950 (10thcipriani) [23:12:48] 10Phabricator, 10CAS-SSO, 10wikitech.wikimedia.org, 07LDAP: Password reset not working as expected for Gryllida's Developer account - https://phabricator.wikimedia.org/T389496#10659986 (10bd808) 05Open→03Invalid `lang=irc [18:33] < gry> bd808: thanks, i was able to login as you told me the  w... [23:31:41] !log integration: thcipriani added integration-agent-docker-106{0,1,2} earlier today (T389554) [23:31:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:31:44] T389554: Add some integration executors to spread the load - https://phabricator.wikimedia.org/T389554