[04:38:32] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance deployment-changeprop-1 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:18:33] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-changeprop-1 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:36:23] 10GitLab (Project and group requests), 06Release-Engineering-Team, 10Wikimedia Australia: Create new GitLab project group: wmau - https://phabricator.wikimedia.org/T429745 (10Samwilson) 03NEW [06:58:02] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07ci-test-error (WMF-deployed Build Failure): Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749 (10ABran-WMF) 03NEW [06:59:25] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Fetches from Gerrit aborted due to: GnuTLS recv error (-54): Error in the pull function - https://phabricator.wikimedia.org/T420865#12039170 (10ABran-WMF) 05Open→03Resolved Given t... [07:21:53] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 07ci-test-error (WMF-deployed Build Failure): Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12039254 (10ABran-WMF) 05Open→03In progress p:05Triage→03High [08:20:57] 10Gerrit, 06collaboration-services: Investigate Gerrit root disk usage and logging - https://phabricator.wikimedia.org/T425667#12039469 (10ABran-WMF) >>! In T425667#12032013, @Dzahn wrote: > I am not sure yet where this needs to be fixed: > > > ` > [sre-collaboration-services] [FIRING:1] AlertLintProblem col... [09:58:15] 10Phabricator, 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Add logout.d script for Phabricator - https://phabricator.wikimedia.org/T286904#12039921 (10LSobanski) @brennen @Aklapper https://phabricator.wikimedia.org/T406495 is now complete, how would you like to proceed from here? [10:01:35] 10GitLab (Auth & Access), 10CAS-SSO, 06collaboration-services, 06Infrastructure-Foundations: gitlab account maps to two different developer accounts - https://phabricator.wikimedia.org/T384025#12039937 (10LSobanski) 05Open→03Resolved a:03LSobanski From the information above, this looks like expec... [10:36:29] (03merge) 10kamila: Add a bookworm image flavour [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/246 (https://phabricator.wikimedia.org/T418200 https://phabricator.wikimedia.org/T429030) [10:43:19] (03update) 10kbach: Draft: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [10:45:49] (03update) 10kbach: Draft: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [10:48:21] (03update) 10kbach: Draft: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [10:48:57] (03update) 10kbach: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [10:50:51] (03update) 10kbach: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [11:09:02] (03update) 10kbach: Add rule reference [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/13 (https://phabricator.wikimedia.org/T422930) [11:25:11] (03PS1) 10Hslater: Zuul: [mediawiki/extensions/ContentStabilization] Add dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1304777 [12:24:42] (03update) 10jforrester: Stop branching ShortUrl for wmf [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/247 (https://phabricator.wikimedia.org/T107188) (owner: 10krinkle) [13:04:19] (03merge) 10jforrester: Stop branching ShortUrl for wmf [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/247 (https://phabricator.wikimedia.org/T107188) (owner: 10krinkle) [13:04:42] (03open) 10trueg: wikidata-platform-tools: A reusable image for different scripts [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/177 (https://phabricator.wikimedia.org/T425119) [13:33:54] 10Phabricator (Upstream), 06Release-Engineering-Team (Doing 😎), 07Upstream: EXCEPTION: (Exception) Policy identifier is an object PHID (''), but no object handle was provided. A handle must be provided for object policies. - https://phabricator.wikimedia.org/T361459#12040754 (10Aklapper) ...and put in also h... [13:34:26] 10Phabricator (2026-06-16), 06Release-Engineering-Team (Doing 😎), 10Development-Metrics: GrimoireLab's Perceval fails pulling Maniphest transaction data since 20260403 - https://phabricator.wikimedia.org/T428300#12040755 (10Aklapper) ...and put in also https://gitlab.wikimedia.org/repos/phabricator/phabricat... [13:42:37] 06Release-Engineering-Team (Priority Backlog 📥), 10Development-Metrics: Several projects and/or forks with overlapping names - https://phabricator.wikimedia.org/T429796 (10Aklapper) 03NEW p:05Triage→03Low [13:49:00] RelEng: Could someone please add @aklapper to https://gitlab.wikimedia.org/repos/releng/developer-metrics/sources/-/project_members ? I don't feel like forking for trivial edits. TIA! [14:22:40] 10GitLab (Project and group requests), 06Release-Engineering-Team, 10Wikimedia Australia, 07Essential-Work: Create new GitLab project group: wmau - https://phabricator.wikimedia.org/T429745#12040914 (10brennen) 05Open→03Resolved a:03brennen Created: https://gitlab.wikimedia.org/repos/wmau Added... [14:23:33] andre: yep, one sec [14:24:19] {{done}} [14:24:59] thank you! [14:29:40] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12040949 (10ABran-WMF) I tested the following envoy config to log more details on network errors: ` $ diff tls.deb... [14:31:26] (03update) 10aokoth: scap: replace phab2002 with phab2003 [repos/phabricator/deployment] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/deployment/-/merge_requests/109 (https://phabricator.wikimedia.org/T423727) [14:37:05] 10GitLab, 06Release-Engineering-Team (Radar), 06collaboration-services, 06Traffic: gitlab behind CDN: serve gitlab.wm.o via text-lb instead of dedicated IPs? - https://phabricator.wikimedia.org/T428903#12041004 (10ABran-WMF) 05In progress→03Resolved a:03ABran-WMF This is essentially resolved, the... [14:37:21] (03open) 10kamila: mediawiki-cli image: remove mwbzutils [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/248 (https://phabricator.wikimedia.org/T429030) [14:39:49] 06Release-Engineering-Team (Priority Backlog 📥), 10Development-Metrics: Several projects and/or forks with overlapping names - https://phabricator.wikimedia.org/T429796#12041023 (10Aklapper) Made one rename in https://gitlab.wikimedia.org/repos/releng/developer-metrics/sources/-/commit/277dc165f55f8388b2810b1b... [14:40:08] 10Gerrit, 06Release-Engineering-Team, 14Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)), 06collaboration-services, and 2 others: Jenkins job failing intermittently due to Gerrit HTTP 502 errors when interacting with repos - https://phabricator.wikimedia.org/T246763#12041024 (10ABran-WMF) [14:40:12] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12041025 (10ABran-WMF) [15:21:55] 10Phabricator, 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Add logout.d script for Phabricator - https://phabricator.wikimedia.org/T286904#12041211 (10Aklapper) Hi, I myself am not sure what to add apart from T286904#11091482. Please elaborate if anything is needed from our side - thanks! [15:27:17] 10GitLab (Infrastructure), 06collaboration-services: Explore apt pinning / hold for gitlab-ce package - https://phabricator.wikimedia.org/T429595#12041239 (10LSobanski) p:05Triage→03Medium [15:27:28] 10GitLab (Infrastructure), 06collaboration-services: Explore apt pinning / hold for gitlab-ce package - https://phabricator.wikimedia.org/T429595#12041240 (10LSobanski) a:03Jelto [15:31:49] (03PS1) 10Vaughn Walters: jjb: [catalyst-daily-core] Add core job [integration/config] - 10https://gerrit.wikimedia.org/r/1304835 (https://phabricator.wikimedia.org/T429629) [15:33:27] (03CR) 10CI reject: [V:04-1] jjb: [catalyst-daily-core] Add core job [integration/config] - 10https://gerrit.wikimedia.org/r/1304835 (https://phabricator.wikimedia.org/T429629) (owner: 10Vaughn Walters) [15:36:37] 10Phabricator, 06collaboration-services: Add cache policy to static resources in phab.wmfusercontent.org - https://phabricator.wikimedia.org/T429019#12041328 (10LSobanski) p:05Triage→03High [15:40:58] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Expand bundlesize test to support monitoring any ResourceLoader module and not just those loaded on mainspace views - https://phabricator.wikimedia.org/T429811 (10MusikAnimal) 03NEW [15:44:26] 10GitLab (CI & Job Runners), 06Release-Engineering-Team, 06collaboration-services: Standardize Debian package builds on GitLab CI - https://phabricator.wikimedia.org/T304491#12041436 (10LSobanski) 05Open→03Resolved a:03LSobanski Let's consider this task closed and create new tasks for any other ide... [15:45:04] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Expand bundlesize test to support monitoring any ResourceLoader module and not just those loaded on mainspace views - https://phabricator.wikimedia.org/T429811#12041439 (10MusikAnimal) [15:45:15] 10Gerrit, 06Release-Engineering-Team, 14Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)), 06collaboration-services, and 2 others: Jenkins job failing intermittently due to Gerrit HTTP 502 errors when interacting with repos - https://phabricator.wikimedia.org/T246763#12041440 (10ABran-WMF) 05Ope... [15:49:07] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12041464 (10Tgr) FWIW CI is occasionally failing with non-Gerrit network errors as well. E.g. https://gerrit.wikime... [15:52:33] (03PS2) 10Vaughn Walters: jjb: [catalyst-daily-core] Add core job [integration/config] - 10https://gerrit.wikimedia.org/r/1304835 (https://phabricator.wikimedia.org/T429629) [15:52:33] (03CR) 10Vaughn Walters: "After this is passing green for a few days, in a separate patch I will move the core tests to run first." [integration/config] - 10https://gerrit.wikimedia.org/r/1304835 (https://phabricator.wikimedia.org/T429629) (owner: 10Vaughn Walters) [15:53:54] (03open) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [15:57:01] (03update) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [16:04:09] (03update) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [16:05:30] 06Release-Engineering-Team, 06Infrastructure-Foundations: Add tox-uv support to the tox-v{3,4} Docker images - https://phabricator.wikimedia.org/T421348#12041590 (10elukey) ping :) [16:14:41] 10Phabricator, 10Catalyst (PatchDemo): Replace deprecated Phabricator Conduit API calls with their stable equivalents (PatchDemoBot) - https://phabricator.wikimedia.org/T428850#12041645 (10thcipriani) p:05Triage→03Low Tagging with #patchdemo means we'll see it during triage, setting this as low for now. If... [16:18:50] (03update) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [16:22:39] (03update) 10kamila: Draft: mediawiki-cli image: remove mwbzutils [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/248 (https://phabricator.wikimedia.org/T429030) [16:25:55] 06Release-Engineering-Team (Priority Backlog 📥), 10Catalyst (Luka Ijo Pimeja Jan), 05MW-1.46-release: Cannot select REL1_46 on PatchDemo - https://phabricator.wikimedia.org/T425165#12041707 (10thcipriani) a:05SDunlap→03None [16:30:22] (03PS5) 10Krinkle: Prepare 26.06.1 release [fresh] - 10https://gerrit.wikimedia.org/r/1304650 [16:30:28] (03PS3) 10Krinkle: Release 26.06.1 [fresh] - 10https://gerrit.wikimedia.org/r/1304651 [16:31:15] (03update) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [16:32:03] 06Release-Engineering-Team (Priority Backlog 📥), 10Catalyst (Luka Ijo Pimeja Jan), 05MW-1.46-release: Cannot select REL1_46 on PatchDemo - https://phabricator.wikimedia.org/T425165#12041758 (10thcipriani) a:03brennen [16:39:44] (03merge) 10dduvall: ci: Create a release from published images [repos/releng/zuul/zuul] - 10https://gitlab.wikimedia.org/repos/releng/zuul/zuul/-/merge_requests/19 [16:50:45] (03merge) 10aokoth: scap: replace phab2002 with phab2003 [repos/phabricator/deployment] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/deployment/-/merge_requests/109 (https://phabricator.wikimedia.org/T423727) [17:00:16] Krinkle: if you want me to review/approve the change you made to Fresh I am happy to do so ;) [17:00:26] though not today, I am heading out for dinner [17:05:36] hashar: Jenkins is doing nothing, any chance of restarting it? [17:05:46] Zuul shows no running jobs [17:06:24] it is busy [17:06:34] see the graph at the bottom of https://integration.wikimedia.org/zuul/ [17:06:40] "Queue (Jenkins jobs + Zuul functions)" [17:06:48] which is https://grafana.wikimedia.org/d/ad656c66-d8b5-4b09-a54b-61e7df71fb17/zuul-gearman-prometheus [17:07:04] I had assumed that just shows jobs that are waiting to be picked up [17:07:48] Hmm, those graphs do say 2 running jobs [17:08:43] I'm guessing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1304499 is causing lots of work to do :D [17:08:56] yup indeed [17:09:03] maybe cscott can split that series of deprecations [17:09:29] Come to think of it, every time I've seen that stack of patches CI starts going slowly :D [17:09:39] yup [17:09:57] I am off for dinner [17:10:02] zuul will eventually catch up [17:10:26] I think it's any stack of patches involving core [17:10:43] Because the merge conflict checking for core is very slow [17:10:55] puppet has a similar issue [17:11:06] Any patch to core (not just a stack) is noticibly slower to kick off tests [17:11:12] and core has a bunch of branches that takes a while to update cause the old Zuul code is quite dumb at it [17:12:31] I had to resubmit the stack because zuul was giving spurious merge conflict complaints on it. But I've thought about trying to give git-review an option to pace patch submission. [17:14:31] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12041867 (10Dreamy_Jazz) The above error is blocking all merges to gated extensions AFAICS [17:24:46] 10Gerrit (Gerrit 3.12): Upgrade to Gerrit 3.12 - https://phabricator.wikimedia.org/T392448#12041893 (10Aklapper) [17:43:24] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12041925 (10Dreamy_Jazz) The above error is seen in every `quibble-with-gated-extensions-vendor-mysql-php83` job th... [18:04:40] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12042017 (10Dreamy_Jazz) Filed {T429824} for the above error [18:09:33] 10Phabricator, 06Release-Engineering-Team (Doing 😎), 10Catalyst (PatchDemo), 13Patch-For-Review: Replace deprecated Phabricator Conduit API calls with their stable equivalents (PatchDemoBot) - https://phabricator.wikimedia.org/T428850#12042039 (10Aklapper) a:03Aklapper [18:13:28] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review, 07User-notice: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#12042047 (10Dzahn) seeing CI errors on `labs/codesearch` that seem caused by this [18:36:35] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12042102 (10Jdlrobson-WMF) Seen with every Vector 2022 change too. [18:40:52] 10Phabricator: Highlight Phabricator activity during hackathons in Phabricator statistics reports - https://phabricator.wikimedia.org/T425274#12042115 (10Aklapper) 05Open→03Declined I'm afraid that I have to realistically decline this (at least for this time), sorry. :-/ The hard part for me already star... [18:49:38] hashar: I'm OOO but yeah, anytime. thanks! [18:49:59] * Krinkle adds in Gerrit [18:56:26] (03open) 10bking: Add lvmd repo to trusted runners [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/178 (https://phabricator.wikimedia.org/T429310) [18:56:55] (03update) 10bking: Add lvmd repo to trusted runners [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/178 (https://phabricator.wikimedia.org/T429310) [19:30:18] !log thcipriani@integration-castor06:~$ sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-gated-extensions-vendor-mysql-php83 #T429824 [19:30:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:30:20] T429824: Gated extensions are blocked by corrupted npm tarball data - https://phabricator.wikimedia.org/T429824 [19:31:15] guys, I destroyed codesearch by accident. [19:31:20] trying to recreate it on trixie first [19:31:46] if that doesnt work.. back to bookworm as it was.. if that doesnt work.. back to backup restore attempts [19:32:11] mutante: not sure exactly what it'd be, but if i can help lemme know [19:32:52] brennen: thank you [19:33:36] currently wondering if it had network storage [19:33:46] that I can still just mount or if it was all on the VM [19:34:37] well.. no containers in object store on horizon. not sure yet which outcome I preferred, heh [19:35:32] i see "sourcebot-data" and "data2" under volumes [19:35:38] sourcebot was... the other thing that didn't pan out? [19:39:02] yea, sourcebot is what I wanted to delete [19:40:00] i guess that data2 volume could be something. i've only really glanced at this project a couple of times or i'd have a better idea how it was set up... [19:40:19] where is the button to attach it though [19:40:48] ah, with the instance of course [19:40:59] "manage attachments" under dropdown on the right? [19:41:15] oh, yeah, that makes sense i guess [19:41:25] I have one other thing I did not get yet. [19:41:42] there is the http://codesearch.wmcloud.org/ URL of course [19:41:51] but I dont see it under proxies and not under DNS records [19:41:54] and not under floating IPs [19:42:07] would have expected it in at least one of those [19:45:20] i think i would have too - it shows up under "DNS Zones" but i'm not quite sure what maps it to the instance. [19:46:45] talking to Andrew as well. the webproxy might have been deleted with the VM [19:46:48] I recreated it [19:47:01] pointed to port 3000 since the security group gave me the hint that's where it was [19:47:18] in theory it could work now.. just that it isnt yet [19:47:38] data is mounted, proxy is created, security group still there. puppet finished.. [19:48:14] i recall from previous restarts that there's something that takes a while to spin up [19:48:41] for some reason puppet is starting all the hound services again [19:48:59] yea, indexing does. but should see the frontend [19:50:18] not listening on port 3000 where it was expected.. something on 3002 though.. looking more [19:53:29] I will try the same steps but on bookworm instead of trixie. Since the hound services fail to start. [19:53:34] and that's what it was before [19:59:00] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Gouvernathor - https://phabricator.wikimedia.org/T429832 (10Gouvernathor) 03NEW [20:00:07] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Gouvernathor - https://phabricator.wikimedia.org/T429832#12042398 (10Gouvernathor) Well, apparently I'm now unlocked [20:03:12] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Gouvernathor - https://phabricator.wikimedia.org/T429832#12042408 (10brennen) 05Open→03Resolved a:03brennen Added you to #trusted-contributors for good measure. [20:13:02] added the missing mount on /srv/hound [20:13:07] but this would also do it: bash: /usr/bin/docker: No such file or directory [20:15:21] yeah, that seems relevant. [20:15:47] pull access denied for codesearch-frontend, repository does not exist or may require 'docker login': denied [20:17:14] ah, there is a README in the repo how to set it up:) [20:17:25] hopefully covering this too [20:58:48] brennen: https://codesearch.wmcloud.org/ :) [21:01:30] mutante: excellent. :) [21:02:01] :) well, that was an unplanned trixie upgrade now that we got out of that [21:02:20] because that .. worked with that minimal fix to install docker-cli [21:26:10] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 10VPS-project-devtools, and 2 others: devtools: Create new Phab/Phorge test instance on: Debian Trixie, Debian Bookworm - https://phabricator.wikimedia.org/T424055#12042784 (10Dzahn) The bookworm instance has been deleted in T428069#12042082 [21:26:32] 06Release-Engineering-Team (Priority Backlog 📥), 06collaboration-services: Upgrade phab (phorge) hosts to bookworm - https://phabricator.wikimedia.org/T372619#12042787 (10Dzahn) the bookworm test instance has been deleted in T428069#12042082 [22:23:00] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12042977 (10cscott) I think I just saw this in a patch to mediawiki-core as well? ` 18:03:21 npm warn tarball tarba... [22:23:45] is there a workaround for that 'npm failure on zip-stream` (T429749)? It's blocking a stack of patches to mediawiki-core [22:23:45] T429749: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749 [22:24:44] (03open) 10dduvall: dev: Define a `kubeconfig` make target for easier cluster debugging [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/59 [22:25:05] thcipriani said "removed the cached for the job" to fix it -- how do I remove the cache for a job? [22:26:59] (03update) 10dduvall: dev: Define a `kubeconfig` make target for easier cluster debugging [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/59 [22:30:07] cscott: is this the mediawiki-node24 job? [22:31:06] to remove the cache you remove the cache directory on the integration-castor06 host, you might not have access to that(?), but if that's the right job I can wipe it. [22:33:13] I think i did it. There's a "wipe out current workspace" button in https://integration.wikimedia.org/ci/job/mediawiki-node24/ws/ [22:33:57] the option is worded in a frightening manner, hopefully i didn't earn a t-shirt by clicking it [22:33:59] (03merge) 10dduvall: dev: Define a `kubeconfig` make target for easier cluster debugging [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/59 [22:34:34] I ... don't actually know what that button does :D [22:34:51] i am relieved that i'm not alone in this [22:35:14] but it does sound like that's the right job, so I'll clear the cache for it [22:35:35] !log thcipriani@integration-castor06:~$ sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ #T429824 [22:35:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:35:39] T429824: Gated extensions are blocked by corrupted npm tarball data - https://phabricator.wikimedia.org/T429824 [22:37:25] i think clicking the button did clear the cache; i saw some node24 jobs pass even before tyler's !log [22:38:11] hm, maybe i'm wrong: https://integration.wikimedia.org/ci/job/mediawiki-node24/39098/console just failed. [22:38:24] let's try that again, assuming that tyler cleared the cache now and things are golden [22:38:52] jenkins isn't very aware of this cache, it's pretty homespun [22:39:51] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12043045 (10Dzahn) "npm not finding a file" is pretty different from "Gerrit network problems". We have a separate... [22:52:24] https://integration.wikimedia.org/ci/job/mediawiki-node24/39102/console (in progress) seems to have the same corrupted npm files [22:53:50] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Gerrit: network errors in CI - https://phabricator.wikimedia.org/T429749#12043072 (10cscott) I don't agree. I believe we are getting corruption due to the network problems. The corruptio... [23:02:00] thcipriani: clearing the cache didn't seem to help :( [23:02:21] * thcipriani looks [23:03:28] heh, well, the cache is back with the same timestamps as before. I guess there was a successful job that was running when I cleaned it. [23:06:27] ok, doesn't look like mediawiki-node24 has started for anything in the G+S queue. So if I clear it now, it should stay cleared. [23:06:34] FIRING: InstanceDown: Project deployment-prep instance deployment-cache-text08 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:07:04] !log thcipriani@integration-castor06:~$ sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ #T429824 (again) [23:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:07:07] T429824: Gated extensions are blocked by corrupted npm tarball data - https://phabricator.wikimedia.org/T429824 [23:07:51] jobs for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1271913 (once they start) should be what repopulates the cache [23:08:42] fingers crossed! [23:16:51] RESOLVED: InstanceDown: Project deployment-prep instance deployment-cache-text08 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:18:51] ah crud, I should have looked at the CASTOR_NAMESPACE in the job. Cleared the wrong cache [23:19:01] CASTOR_NAMESPACE="mediawiki-core/master/mediawiki-node24" [23:19:40] !log thcipriani@integration-castor06:~$ sudo -u jenkins-deploy rm -rf /srv/castor/mediawiki-core/master/mediawiki-node24/ #T429824 [23:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:19:43] T429824: Gated extensions are blocked by corrupted npm tarball data - https://phabricator.wikimedia.org/T429824 [23:19:50] ^ this time for sure cscott :/ [23:20:55] (03open) 10dduvall: kubernetes: Allow pods/attach in launcher/nodepool role [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/60 [23:24:13] (03update) 10dduvall: kubernetes: Allow pods/attach in launcher/nodepool role [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/60 [23:28:23] thcipriani: yeah, i was going to ask if the caches for the 'test' pipeline and the 'gate-and-submit' pipeline are different, because i just saw a node24 job fail in the 'test' pipeline. [23:28:45] (03merge) 10dduvall: kubernetes: Allow pods/attach in launcher/nodepool role [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/60