[00:36:29] https://www.theautomatedtester.co.uk/blog/2024/flakiness-isnt-from-your-test-framework/ [00:37:36] Talks about the hype and misinfo that brought Cypress into popularity. [00:56:33] FIRING: Queue (Jenkins jobs + Zuul functions): - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29 [01:02:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [01:09:34] RECOVERY - Work requests waiting in Zuul Gearman server on contint1002 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [01:11:33] RESOLVED: Queue (Jenkins jobs + Zuul functions): - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29 [01:20:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [01:22:34] RECOVERY - Work requests waiting in Zuul Gearman server on contint1002 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [06:51:54] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for eulersidentity - https://phabricator.wikimedia.org/T384170 (10Eulersidentity) 03NEW [08:19:32] (03CR) 10Hashar: [C:03+2] jjb: Use Quibble 1.12.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1112269 (owner: 10Dduvall) [08:20:00] !log Updating all Jenkins jobs to update Quibble to 1.12.0 [08:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:21:51] (03Merged) 10jenkins-bot: jjb: Use Quibble 1.12.0 [integration/config] - 10https://gerrit.wikimedia.org/r/1112269 (owner: 10Dduvall) [08:39:06] After waiting 30+ minutes for backport CI and jobs, I'm thinking it might be prudent to waste some cycles in order to speed things up for the operator. [08:40:46] hashar: one wacky idea: the backports are stacked on top of one another as they appear on-wiki, and CI gate-and-submit jobs run automatically ahead of time, but without the submit step. Then in theory, we're left with V+2 and k8s images ready to be applied in the window with nearly zero wait time. Unhappy path of course is that a patch fails CI or manual debug server testing. [08:57:05] Anyway, my backport job today was 29m of CI, 7m of k8s build, 4m sync to testservers, 2m sync to canaries, 7m sync to production. It would have be awesome if the first two phases could have been preloaded before my window. [08:58:01] awight: I think the break down is something such as: [08:58:01] 1) the CI Selenium jobs taking 20-25 minutes [08:58:01] 2) the first deploy on Monday builds an image from scratch cause the base image is automatically rebuilt during the week-end. That takes maybe 10 minutes, including several minutes "just" to upload the large image [08:58:01] 3) the deployment to kubernetes is done in 3 stages which each takes several minutes (test-servers, canaries, prod), I think mostly due to the large image being synced to the workers + helm overhead [08:59:31] we do have the timing collected in the logs (via https://logstash.wikimedia.org/app/dashboards#/view/f7e31de0-9f0d-11eb-863c-3588009e4dd9 then filter based on `event.duration:*` ) [09:00:39] and even if the image has been built, I think the way it is organized is that any tiny code change still invalidate several descendant container layers that need to be rebuilds [09:00:47] so yeah slow :/ [09:02:46] re. speeding up Selenium tests, I've been following up on https://phabricator.wikimedia.org/T370033 with a couple of investigation tasks. If anyone else wants to chime in on https://phabricator.wikimedia.org/T374003 I think our next steps there are going to be collecting that feedback and turning it into some development tasks. [09:54:57] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device (January 2025) - https://phabricator.wikimedia.org/T384187 (10Michael) 03NEW [09:55:18] !log Updating Quibble jobs to enable success cache experiment - T383243 [09:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:55:20] T383243: Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs - https://phabricator.wikimedia.org/T383243 [09:56:12] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device (January 2025) (integration-agent-docker-1045) - https://phabricator.wikimedia.org/T384187#10475496 (10Michael) [09:56:22] (03CR) 10Hashar: [C:03+2] jjb: Enable Quibble's success caching in MediaWiki jobs [integration/config] - 10https://gerrit.wikimedia.org/r/1112106 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall) [09:58:41] (03Merged) 10jenkins-bot: jjb: Enable Quibble's success caching in MediaWiki jobs [integration/config] - 10https://gerrit.wikimedia.org/r/1112106 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall) [10:55:03] maintenance-disconnect-full-disks build 668715 integration-agent-docker-1043 (/: 27%, /srv: 96%, /var/lib/docker: 39%): OFFLINE due to disk space [11:00:03] maintenance-disconnect-full-disks build 668716 integration-agent-docker-1043 (/: 27%, /srv: 30%, /var/lib/docker: 37%): RECOVERY disk space OK [11:03:23] 10Scap, 06serviceops: Retire use of scap proxies - https://phabricator.wikimedia.org/T384196 (10hnowlan) 03NEW [11:56:32] 10Scap, 06serviceops, 13Patch-For-Review: Retire use of scap proxies - https://phabricator.wikimedia.org/T384196#10475851 (10hnowlan) [12:19:22] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device (January 2025) (integration-agent-docker-1045) - https://phabricator.wikimedia.org/T384187#10475894 (10Urbanecm_WMF) p:05Triage→03High [12:27:27] 06Release-Engineering-Team, 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 06Web-Team: Allow JavaScript errors to fail CI builds - https://phabricator.wikimedia.org/T318902#10475909 (10kostajh) @Jdlrobson maybe #web-team is interested in this task? I also think it is interesting for #release-en... [13:33:07] (03update) 10aokoth: projects: add sre/miscweb/os-reports [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/103 (https://phabricator.wikimedia.org/T350794) [13:34:53] (03update) 10aokoth: projects: add sre/miscweb/os-reports [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/103 (https://phabricator.wikimedia.org/T350794) [13:35:46] (03update) 10aokoth: projects: add sre/miscweb/os-reports [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/103 (https://phabricator.wikimedia.org/T350794) [13:40:47] (03update) 10aokoth: projects: add sre/miscweb/os-reports [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/103 (https://phabricator.wikimedia.org/T350794) [13:44:41] (03merge) 10aokoth: projects: add sre/miscweb/os-reports [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/103 (https://phabricator.wikimedia.org/T350794) [14:15:02] maintenance-disconnect-full-disks build 668755 integration-agent-docker-1048 (/: 27%, /srv: 97%, /var/lib/docker: 37%): OFFLINE due to disk space [14:20:03] maintenance-disconnect-full-disks build 668756 integration-agent-docker-1048 (/: 27%, /srv: 18%, /var/lib/docker: 35%): RECOVERY disk space OK [14:20:24] 06Release-Engineering-Team, 10docker-pkg, 06serviceops: Attach opencontainers image metadata to docker images - https://phabricator.wikimedia.org/T345070#10476287 (10elukey) @dduvall Hi! Thanks a lot for the long explanation, I am trying to get back to this task looking for anything actionable. IIUC Gitlab a... [14:23:56] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device (January 2025) (integration-agent-docker-1048) - https://phabricator.wikimedia.org/T384209 (10Urbanecm_WMF) 03NEW [14:24:02] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device (January 2025) (integration-agent-docker-1048) - https://phabricator.wikimedia.org/T384209#10476324 (10Urbanecm_WMF) p:05Triage→03High [14:24:15] could someone take a look please? ^^ [14:25:29] Hey guys... Joe Biden here. I've decided to step down from the White House to focus on other projects. Billionaires are a threat to democracy, so check out https://BidenCash.st to put them in the bullseye. Keep an eye on the CNN inauguration for a promo code! [14:35:58] temporarily disabled the termbox and data-bridge test suites [14:36:08] which is from August 2023 :) [14:39:54] ah T354841 [14:39:54] T354841: Decide on how to move forward with Wikidata Bridge browser testing - https://phabricator.wikimedia.org/T354841 [14:42:41] out of all the bash things, https://bash.toolforge.org/quip/AU7VTzhg6snAnmqnK_pc is probably what i link the most [14:44:22] https://bash.toolforge.org/quip/AVCVyNIm1oXzWjit6EY3 is also very good [14:58:38] 10Continuous-Integration-Config, 10Release-Engineering-Team (Priority Backlog 📥), 10[DEPRECATED] wdwb-tech, 10Wikibase (3rd party installations), and 2 others: Move some Wikibase selenium tests to a standalone job - https://phabricator.wikimedia.org/T287582#10476603 (10hashar) The data bridge have been dis... [15:03:14] I wish we could run `npm install` to only install a subset of packages instead of all of them [16:00:32] (03PS1) 10Hashar: jjb: ensure npm script exists in wikibase-selenium job [integration/config] - 10https://gerrit.wikimedia.org/r/1112784 (https://phabricator.wikimedia.org/T287582) [16:02:22] (03CR) 10CI reject: [V:04-1] jjb: ensure npm script exists in wikibase-selenium job [integration/config] - 10https://gerrit.wikimedia.org/r/1112784 (https://phabricator.wikimedia.org/T287582) (owner: 10Hashar) [16:03:30] 10Continuous-Integration-Config, 10Release-Engineering-Team (Priority Backlog 📥), 10[DEPRECATED] wdwb-tech, 10Wikibase (3rd party installations), and 2 others: Move some Wikibase selenium tests to a standalone job - https://phabricator.wikimedia.org/T287582#10476986 (10hashar) a:03hashar [16:18:39] (03PS18) 10Hashar: Use standalone jobs for Wikibase Selenium tests [integration/config] - 10https://gerrit.wikimedia.org/r/676107 (https://phabricator.wikimedia.org/T287582) [16:18:45] (03PS2) 10Hashar: jjb: ensure npm script exists in wikibase-selenium job [integration/config] - 10https://gerrit.wikimedia.org/r/1112784 (https://phabricator.wikimedia.org/T287582) [16:21:20] (03CR) 10Hashar: "I have to reverify this change then reach out to WMDE for awareness. Then I can deploy this and then https://gerrit.wikimedia.org/r/c/medi" [integration/config] - 10https://gerrit.wikimedia.org/r/676107 (https://phabricator.wikimedia.org/T287582) (owner: 10Hashar) [16:35:01] 10Phabricator, 06Security-Team: Change the dropdown in security ticket dropdown to not include WMF Product and WMF Technology as two separate departments - https://phabricator.wikimedia.org/T384243 (10Urbanecm_WMF) 03NEW [16:54:48] 10Phabricator, 06Security-Team: Change the dropdown in security ticket dropdown to not include WMF Product and WMF Technology as two separate departments - https://phabricator.wikimedia.org/T384243#10477264 (10Bawolff) If the drop down is changing, perhaps miraheze should be added. Miraheze affiliated people r... [17:10:07] 10Phabricator, 06Security-Team: Create 'Author Affiliation' custom drop down field for forms - https://phabricator.wikimedia.org/T240999#10477309 (10Aklapper) [17:20:24] 10Phabricator, 06Security-Team: Change the dropdown in security ticket dropdown to not include WMF Product and WMF Technology as two separate departments - https://phabricator.wikimedia.org/T384243#10477322 (10Aklapper) I can neither find the dropdown config in https://gitlab.wikimedia.org/repos/phabricator/de... [17:22:48] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for JainLakshita28 - https://phabricator.wikimedia.org/T384247 (10Jainlakshita28) 03NEW [17:35:59] 10Phabricator, 10Observability-Logging: Phabricator dashboard in Logstash does not show event details anymore under "Data Gateway search" - https://phabricator.wikimedia.org/T383878#10477370 (10Aklapper) Thanks a lot @colewhite! Appreciated. [17:38:17] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for JainLakshita28 - https://phabricator.wikimedia.org/T384248 (10Jainlakshita28) 03NEW [17:53:01] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for JainLakshita28 - https://phabricator.wikimedia.org/T384248#10477425 (10Reedy) →14Duplicate dup:03T384247 [17:53:03] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for JainLakshita28 - https://phabricator.wikimedia.org/T384247#10477427 (10Reedy) [18:40:35] FIRING: Queue (Jenkins jobs + Zuul functions): - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29 [18:46:38] PROBLEM - Work requests waiting in Zuul Gearman server on contint1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [19:00:35] RESOLVED: Queue (Jenkins jobs + Zuul functions): - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29 [19:12:10] (03update) 10addshore: Fix OTEL configuration [repos/releng/cli] (cdanis/fixjaeger) - 10https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/596 (owner: 10cdanis) [19:12:27] (03merge) 10addshore: Fix OTEL configuration [repos/releng/cli] (cdanis/fixjaeger) - 10https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/596 (owner: 10cdanis) [19:12:57] (03open) 10addshore: Fix OTEL configuration [repos/releng/cli] - 10https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/597 [19:14:40] RECOVERY - Work requests waiting in Zuul Gearman server on contint1002 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [20:35:16] (03CR) 10Arlolra: Improve noise suppression by ensuring images are same size (032 comments) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1112393 (owner: 10Subramanya Sastry) [21:09:01] (03CR) 10Subramanya Sastry: Improve noise suppression by ensuring images are same size (032 comments) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1112393 (owner: 10Subramanya Sastry) [21:15:55] (03PS3) 10Subramanya Sastry: Improve noise suppression by ensuring images are same size [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1112393 [21:17:21] (03CR) 10Subramanya Sastry: "We don't use these for rt-testing much, but one could argue that these belong in the testreduce repo rather than the visual diff repo. I p" [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1109503 (https://phabricator.wikimedia.org/T383255) (owner: 10Subramanya Sastry) [21:54:15] (03CR) 10Arlolra: [C:03+2] Improve noise suppression by ensuring images are same size (032 comments) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1112393 (owner: 10Subramanya Sastry) [21:54:45] (03Merged) 10jenkins-bot: Improve noise suppression by ensuring images are same size [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1112393 (owner: 10Subramanya Sastry) [22:59:46] (03PS10) 10Subramanya Sastry: Commit Cloud VPS config files as our version of lazy puppetization [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1109127 (https://phabricator.wikimedia.org/T295907) [22:59:46] (03PS10) 10Subramanya Sastry: Poor man's puppetization of visual diffing vm and services [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1109519 (https://phabricator.wikimedia.org/T295907) [22:59:47] (03PS14) 10Subramanya Sastry: Add retry scripts to simplify retrying failures [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1109503 (https://phabricator.wikimedia.org/T383255)