[07:16:06] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) I switched GitLab oidc login back to produciton idp (https://gerrit.wikimedia.org/r/c/... [07:25:44] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.41.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T340247 (10daniel) ##### Risky Patch! 🚂🔥 * **Change**: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/809295/51 * **Summary**: ** This causes rate limit... [08:22:38] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for jebe - https://phabricator.wikimedia.org/T341559 (10JEbe-WMF) 7am UTC on Thursday (2023-07-27) works for me. [09:00:58] 10GitLab (Account Approval), 10Release-Engineering-Team: Requesting GitLab account activation for Fabfur - https://phabricator.wikimedia.org/T342521 (10Fabfur) [09:16:05] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [09:30:01] twentyafterfour: jbond: thcipriani: I use vim fugitive, which pretty much gives the same text UI as tig :) https://github.com/tpope/vim-fugitive#fugitivevim [09:55:32] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Direct 5% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341780 (10Clement_Goubert) We'll first make the move to 2% of traffic, then ramp up from there during the week. [10:46:42] FYI: I'm going to shut down the old releases hosts (releases1002, releases2002). In 48 hours if no complaints, I'm going to decommission them. They've been removed from service since Thursday (https://gerrit.wikimedia.org/r/c/operations/puppet/+/938889) [11:11:22] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Direct 1% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341463 (10Clement_Goubert) >>! In T341463#9014217, @Quiddity wrote: > Thanks for the draft, appreciated! I've [[https://meta.wikimedia.org/wiki/Tech/New... [11:11:32] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [11:11:44] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Direct 1% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341463 (10Clement_Goubert) 05In progress→03Resolved [11:14:13] 10Release-Engineering-Team, 10Scap, 10MW-on-K8s: Inform deployers of mw-debug deployment - https://phabricator.wikimedia.org/T341798 (10Clement_Goubert) Thanks @dancy <3 [11:58:14] 10GitLab (Infrastructure), 10collaboration-services, 10Patch-For-Review: Create alerting for GitLab CI failures - https://phabricator.wikimedia.org/T339370 (10Jelto) [12:08:44] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10SLyngshede-WMF) @Jelto I think you have the wrong client id. Should be: "gitlab_replica_oidc"... [12:13:18] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) We are using `gitlab_replica_oidc` on the replicas (`"identifier" => "gitlab_replica_o... [12:18:03] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10SLyngshede-WMF) When I attempt a login I get the following: ` 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) Yes we had the same error before, I reported it in T320390#8930839. After my vacations... [12:39:24] 10Gerrit: Clone with commit-msg hook command not interpreted correctly on zsh - https://phabricator.wikimedia.org/T342536 (10TheresNoTime) [12:39:51] 10Gerrit, 10Developer Productivity: Clone with commit-msg hook command not interpreted correctly on zsh - https://phabricator.wikimedia.org/T342536 (10TheresNoTime) [12:40:26] (tfw Sammy moves to Mac full-time for development) [12:41:36] 10Gerrit, 10Developer Productivity: Clone with commit-msg hook command not interpreted correctly on zsh - https://phabricator.wikimedia.org/T342536 (10TheresNoTime) Honestly, can't we just replace `git rev-parse --git-dir` with `.git`...? Does anyone really have a different git directory..? [13:44:03] 10Release-Engineering-Team (Seen), 10Content-Transform-Team: [Bug] nodejs10 typescript CI showing message that node10 is unsupported - https://phabricator.wikimedia.org/T269257 (10Aklapper) Adding #Content-Transform-Team as #Product-Infrastructure-Team-Backlog-Deprecated has been deprecated for a while, and as... [13:52:17] 10GitLab (Infrastructure), 10collaboration-services: Create alerting for GitLab CI failures - https://phabricator.wikimedia.org/T339370 (10Jelto) [13:52:24] 10GitLab (Infrastructure), 10collaboration-services, 10Patch-For-Review: GitLabCIPipelineErrors (tweak thresholds of new alert) - https://phabricator.wikimedia.org/T341927 (10Jelto) 05Open→03Resolved New alert thresholds are merged, see T339370#9037685. I'll close this task. [14:09:25] (03CR) 10Jaime Nuche: [C: 03+1] "This is awesome, thanks for adding it!" [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) (owner: 10Dduvall) [14:17:57] (03PS1) 10Hashar: Add job for integration/zuul/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/940950 (https://phabricator.wikimedia.org/T342346) [15:26:45] 10GitLab (Account Approval), 10Release-Engineering-Team: Requesting GitLab account activation for Fabfur - https://phabricator.wikimedia.org/T342521 (10brennen) 05Open→03Resolved [15:27:18] 10GitLab (Account Approval), 10Release-Engineering-Team: Requesting GitLab account activation for Fabfur - https://phabricator.wikimedia.org/T342521 (10brennen) GitLab account appears to have been approved some time ago. [15:31:16] 10Release-Engineering-Team, 10Scap, 10Data-Platform-SRE: "scap deploy"'s config-deploy should check for broken symlinks - https://phabricator.wikimedia.org/T342162 (10Gehel) [16:12:44] (03CR) 10Jforrester: [DNM] Docker: Provide node18 images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/894126 (https://phabricator.wikimedia.org/T331181) (owner: 10Jforrester) [16:20:07] 10Gitlab-Application-Security-Pipeline, 10Security-Team, 10SecTeam-Processed, 10Security: Application Security Pipeline Components for Gitlab - Phase 2 Work - https://phabricator.wikimedia.org/T342177 (10sbassett) [16:20:15] 10Project-Admins, 10Security-Team, 10SecTeam-Processed, 10Security: Create #security-bug tag to specifically classify security bugs within phabricator - https://phabricator.wikimedia.org/T342449 (10sbassett) [16:20:52] 10Project-Admins, 10Security-Team, 10SecTeam-Processed, 10Security: Create #production-risk-assessment Phabricator project/tag - https://phabricator.wikimedia.org/T342466 (10sbassett) [16:24:19] hashar: would you mind prepping commits for the next fresh release? (And thus testing how good my docs are :D) [16:26:28] James_F: hm.. not sure I understand the question. I do think we could roll out node18 images with npm8 on its own indeed. Individual repos can go there directly from node16/npm7, and indeed in general I would expect upgrades of node to be paired with npm upgrades in the future, similar to how upstream nodejs pairs with specific npm releases. for stuff that's on node14, like quibble, that would make sense to go via node16, but we might [16:26:28] not have to hold up everything for that. [16:27:12] on the other hand, if the browser test issue is specific to node16 and not npm7, we could switch the commits around and do: upgrade npm7 to 8 on node16, then add node18 with npm8, then switch quibble from node14 to node16-with-npm8 (the latter being blocked). [16:27:18] maybe that's a safer route, WDYT? [16:28:24] * Krinkle reads ref T256626 [16:28:30] right it's about node version, not npm version. [16:50:07] 10Project-Admins, 10Security-Team, 10SecTeam-Processed, 10Security: Create #security-bug tag to specifically classify security bugs within phabricator - https://phabricator.wikimedia.org/T342449 (10sbassett) 05Open→03Resolved Done: https://phabricator.wikimedia.org/project/view/6674/ [16:58:58] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for xcollazo - https://phabricator.wikimedia.org/T341377 (10xcollazo) >>! In T341377#9034636, @thcipriani wrote: > Hi @xcollazo thanks for requesting deployment training, apologies for the late reply. > > The late-UTC trainin... [17:14:48] Krinkle: No, that would mean running npm on node *14*. [17:14:56] Krinkle: Which might work or might be a problem. [17:17:09] James_F: That's not what I meant :) Right now we have npm7 on node14 and node16, and no node8 or npm8 anywhere yet (afaik). The current stack is: 1) quibble: node14-npm7 to node16-npm7, 2) node16: update npm to v8, 3) add node18. I'm proposing: 1) node16: upgrade npm to 8, 2) add node18, 3) quibble: node14 (npm7) to node16 (npm8). [17:17:56] Krinkle: We agreed that we definitely wouldn't have the node16 images diverge further from node14 and it's yet another thing to break. [17:18:12] Krinkle: Instead I propose just dropping the CI jobs that are still stuck on node14. [17:18:21] It's beyond ridiculous at this point. :-( [17:18:32] afaik all mw repos right now run quibble with node14 given some browsertests still break [17:18:44] Yes, those are the tests I'm proposing to kill. [17:18:50] but I agree we can disable quibble or rm -rf browsertests from extensions that don't pass on node16 [17:19:26] All qunit testing has been in node16 for almost a year, I think. [17:20:35] Of course zeljkof will probably have a view. [17:20:52] $ git grep node14 | grep -E 'quibble.*Docker' [17:20:52] dockerfiles/quibble-buster/Dockerfile.template:FROM {{ "node14" | image_tag }} as node [17:20:52] $ git grep node16 | grep -E 'quibble.*Docker' [17:20:52] $ [17:21:47] I meant real-qunit (i.e. mwgate) not Special:JavaScriptTest, but yeah, fair point. [17:28:38] right, yeah, the node16 images are used by various jobs for standalone jobs [17:59:06] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for xcollazo - https://phabricator.wikimedia.org/T341377 (10thcipriani) [17:59:11] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for xcollazo - https://phabricator.wikimedia.org/T341377 (10thcipriani) Done! See you then :) [18:00:34] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for jebe - https://phabricator.wikimedia.org/T341559 (10thcipriani) [18:00:50] 10Release-Engineering-Team (Deployment Training Requests): Deployment Training Request for jebe - https://phabricator.wikimedia.org/T341559 (10thcipriani) >>! In T341559#9036976, @JEbe-WMF wrote: > 7am UTC on Thursday (2023-07-27) works for me. Perfect, you should have an invite at that time. [18:28:04] 10Project-Admins, 10Security-Team, 10SecTeam-Processed, 10Security: Create #production-risk-assessment Phabricator project/tag - https://phabricator.wikimedia.org/T342466 (10sbassett) 05Open→03Resolved Done: https://phabricator.wikimedia.org/project/profile/6675/ [18:50:10] (03CR) 10Dduvall: Link to git blames for each of the stacktrace frames (034 comments) [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) (owner: 10Dduvall) [19:10:44] (03CR) 10Krinkle: Link to git blames for each of the stacktrace frames (031 comment) [releng/phatality] - 10https://gerrit.wikimedia.org/r/940265 (https://phabricator.wikimedia.org/T342400) (owner: 10Dduvall) [19:34:06] 10Gerrit, 10Developer Productivity: Clone with commit-msg hook command not interpreted correctly on zsh - https://phabricator.wikimedia.org/T342536 (10hashar) >>! In T342536#9037550, @TheresNoTime wrote: > Honestly, can't we just replace `git rev-parse --git-dir` with `.git`...? Does anyone really have a diffe... [19:39:38] 10Release-Engineering-Team, 10Developer Productivity, 10Gerrit (Gerrit 3.6): Clone with commit-msg hook command not interpreted correctly on zsh - https://phabricator.wikimedia.org/T342536 (10hashar) That is solved by #upstream with https://gerrit-review.googlesource.com/c/plugins/download-commands/+/365142... [19:40:18] TheresNoTime: thcipriani: so under zsh if one does: `curl http://example.org; echo done` that would with something like "echo timeout" [19:41:00] cause zsh notices you are using `curl` it would then helpfully escape the trailing `;` [19:41:47] which I guess is the use case of helping when one copy paste a url to the command line such as: `curl http://example.org/search?term=magic;&rofl=true` [19:41:56] which would break on the ; and & [19:42:08] so zsh escapes them for you [19:42:20] it will be fixed in Gerrit 3.6.5 ;) [19:42:37] I am off [19:54:47] oh no [19:55:22] what is the opposite of `git submodule absorbgitdirs`? :] [19:55:58] * thcipriani waits for punchline [19:56:09] I went with deleting the submodule `.git` file, move the directory it was pointing to as `.git` [19:56:12] and I was so happy [19:56:24] then I get: `17:59:12 fatal: cannot chdir to '../../../src': No such file or directory` [19:56:31] git config --unset core.worktree [19:56:34] solves it [20:01:11] and that is how I made my own `git submodule adsorbgitdirs` [20:02:28] or maybe resorb I don't [20:04:02] (03PS2) 10Hashar: Add job for integration/zuul/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/940950 (https://phabricator.wikimedia.org/T342346) [20:04:36] tldr, I need the build to have full access to the deploy repository holding the source code [20:04:50] since the source code might refers to the git directory which is in the deploy .git [20:04:58] we will see. I am off for real [20:36:02] (03PS3) 10Hashar: Add job for integration/zuul/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/940950 (https://phabricator.wikimedia.org/T342346) [20:36:28] cause I had to fix the build before sleeping [ https://integration.wikimedia.org/ci/job/integration-zuul-deploy-python2-buster/31/console ] [20:36:34] * hashar vanishes [20:37:56] Hm.. looks like a bunch of jobs failed in the past hour due to timeouts / full disk. [20:38:04] that might explain the 1h queue backlog [20:46:17] https://integration.wikimedia.org/ci/job/mwgate-node16-docker/52226/console [20:46:17] 15:11:01 + git init [20:46:17] 15:40:56 Build timed out (after 30 minutes). Marking the build as failed. [20:47:48] it's slowly recovering now I think after retries [20:48:02] presumably some kind of autoamted action happened, e.g. depool or prune [20:50:53] hrm, fetch from zuul timed out after 30 mins? [20:52:04] more recent jobs seem like they're doing ok. I don't see any notifications for automated depools in backscroll here [20:57:20] recent runs here also seem healthy https://integration.wikimedia.org/ci/job/maintenance-disconnect-full-disks/ [21:19:43] dduvall and/or bd808, now that I have a local build of Horizon in a docker container... how do I get it into our registry so that puppet can"Exec[docker pull of docker-registry.wikimedia.org/wikimedia/openstack-horizon:2023-07-24-072400-dev for horizon]"? [21:20:42] andrewbogott: typically we let CI do that. In your case that would be via the magic of https://gitlab.wikimedia.org/repos/releng/kokkuri [21:21:09] ooh, documentation! [21:21:59] Bad news, I can't tell what the verb is in the first sentence of the documentation. "Source files from this project into your own project's .gitlab-ci.yml file [21:21:59] using GitLab CI includes." [21:22:11] "source" [21:22:15] huh [21:22:26] as in "source a shell script" [21:23:30] I guess I need to read more [21:23:46] Striker uses the older pipelinelib process to build and publish via the config at https://github.com/wikimedia/labs-striker/blob/master/.pipeline/config.yaml. Docs at https://wikitech.wikimedia.org/wiki/PipelineLib, but you probably only need to read up on kokkuri [21:24:48] andrewbogott: looking at the CI config for scap may help -- https://gitlab.wikimedia.org/repos/releng/scap/-/blob/master/.gitlab-ci.yml#L31 [21:25:36] i'd start from the examples in the kokkuri readme, the scap one seems somewhat more complicated which is not needed here [21:26:20] ^ [21:26:30] So, 'source files from this project into' could just be 'include files from this project in' [21:26:49] although I suppose technically something slightly different is happening there [21:26:59] andrewbogott: another requirement not mentioned in the docs is that your project will have to be added to https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner in order to use the trusted runners [21:27:27] only trusted runners are allowed to push to docker-registry.wikimedia.org [21:27:48] good to know! [21:29:04] So those examples will magically know to look for .pipeline/blubber.yaml? [21:31:32] andrewbogott: that is the default `blubber.yaml` path, yes. you can see all the GitLab CI variables here https://gitlab.wikimedia.org/repos/releng/kokkuri/-/blob/main/includes/images.yaml#L18 [21:35:17] "Please provide the id of your project (can be found under Settings, General)" that's in gitlab someplace? I'm looking, looking for something called 'settings' there [21:35:31] I'm sure it's staring me in the face [21:36:26] oh, it was hidden because gitlab logged me out [21:37:09] sneaky gitlab. I've had that confuse me for a bit before too [21:51:05] bd808: does your docker container mount anything on the cloudweb host? I'm looking for where docker runtime config goes. [21:54:56] andrewbogott: no, Striker does not mount any files from the host. I think you may need to add some new magic to service::docker to support it. [21:55:27] profile::wmcs::striker::docker is the module that striker uses service::docker from [21:55:32] ok [21:59:37] andrewbogott: it looks like there is some default mounting magic in modules/service/templates/docker-service-shim.erb -- `-v <%= etc %>:/etc/<%= @title %>` [22:00:42] That says approximately "mount $etc from the host to /etc/$title in the container" [22:01:44] And $etc is defined as /etc/$title/ earlier in that same template [22:02:45] You might want a way to change that dir name from $title to something else I guess? [22:08:30] maybe although that is precariously close to what I need [22:09:52] I'm being summoned to (weirdly early) dinner but that gives me something to work with later! Tomorrow I should be ready to switch this on in codfw1dev and watch it crash+burn :) [22:10:41] * bd808 approves of the early bird special dinner time [22:44:14] !log Changing index on the ce_question_aggregation DB table in beta wikishared # T342479 [22:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:44:16] T342479: Wrong uniqueness constraint on ce_question_aggregation - https://phabricator.wikimedia.org/T342479 [23:03:40] 10GitLab, 10Release-Engineering-Team, 10User-brennen: Enable GitLab's support for OAuth application integrations - https://phabricator.wikimedia.org/T341738 (10brennen)