[00:04:36] 10GitLab, 10Patch-For-Review, 10Release-Engineering-Team (GitLab-a-thon 🦊), 10User-brennen: Replicate select published images from GitLab container registry to WMF prod registry - https://phabricator.wikimedia.org/T308080 (10brennen) [00:13:30] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (GitLab-a-thon 🦊), 10cloud-services-team (Kanban): Assess GitLab-provided docker container registry as a default for docker-in-docker build processes - https://phabricator.wikimedia.org/T307537 (10brennen) > For "dock... [01:50:29] 10GitLab: gitlab-restore: version detection fail / restore fail - https://phabricator.wikimedia.org/T308089 (10Dzahn) [01:51:56] 10GitLab: gitlab-restore: version detection fail / restore fail - https://phabricator.wikimedia.org/T308089 (10Dzahn) [08:03:52] 10GitLab, 10Release-Engineering-Team, 10wikimedia.biterg.io, 10User-AKlapper: How to identify affiliation of indexed GitLab accounts - https://phabricator.wikimedia.org/T306770 (10Aklapper) Would love to have input from folks who better know GitLab here. Currently in Bitergia's Hatstall I only see: `Name... [13:18:35] 10GitLab, 10Patch-For-Review: gitlab-restore: version detection fail / restore fail - https://phabricator.wikimedia.org/T308089 (10Jelto) 05Openβ†’03Resolved a:03Jelto Thanks for opening the task! I can confirm, puma was stuck and can't be stopped which blocked the restore. There were multiple issues: A... [14:10:30] 10GitLab (Project Migration), 10Release-Engineering-Team (Doing), 10User-brennen, 10User-dduvall: Write a GitLab "Migrating a Project" runbook / manual based on Blubber migration - https://phabricator.wikimedia.org/T307538 (10hashar) [14:10:36] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing): Investigate alternatives to docker-in-docker for container image creation in GitLab - https://phabricator.wikimedia.org/T307599 (10hashar) [14:10:42] 10GitLab, 10Release-Engineering-Team (Doing): Investigate buildkitd instances as image builders for GitLab - https://phabricator.wikimedia.org/T307810 (10hashar) [14:20:30] oh f** I messed up the projects [14:21:01] 10GitLab, 10Release-Engineering-Team (GitLab-a-thon 🦊): Investigate buildkitd instances as image builders for GitLab - https://phabricator.wikimedia.org/T307810 (10hashar) [14:21:12] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab-a-thon 🦊): Investigate alternatives to docker-in-docker for container image creation in GitLab - https://phabricator.wikimedia.org/T307599 (10hashar) [14:21:22] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab-a-thon 🦊), 10User-brennen, 10User-dduvall: Write a GitLab "Migrating a Project" runbook / manual based on Blubber migration - https://phabricator.wikimedia.org/T307538 (10hashar) [14:21:50] Because git-lab-a-thon is the first column on the dashboard https://phabricator.wikimedia.org/tag/release-engineering-team/ [14:22:06] and I automatically asumed it was the INBOX one being the first ... [14:48:22] sorry for the mess [15:13:55] 10GitLab, 10Release-Engineering-Team, 10wikimedia.biterg.io, 10User-AKlapper: How to identify affiliation of indexed GitLab accounts - https://phabricator.wikimedia.org/T306770 (10brennen) My first thought here is that username is always LDAP / CAS `uid`, which maps both to email and to groups that might i... [15:14:22] 10GitLab (Misc), 10Release-Engineering-Team, 10wikimedia.biterg.io, 10User-AKlapper: How to identify affiliation of indexed GitLab accounts - https://phabricator.wikimedia.org/T306770 (10brennen) [15:24:50] brennen: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/790778 how difficult would it be to get an additional volume/partition for the image store? [15:25:29] (we just had a meeting with _joe_ and the lack of quotas enforced for the gitlab registry were a concern for him) [15:25:33] i think that might be a good question for mutante [15:26:11] but when we discussed expanding the filesystem earlier, consensus was it would be better to wait for physical hardware than hack extra space into the existing setup [15:26:12] using a separate partition would at least prevent intentional or inadvertent DoS [15:26:38] (this came up in the context of us originally wanting to mirror a bunch of gerrit repos for this sprint) [15:26:50] hmm, what about ext4 quotas i wonder [15:27:04] ah, i see [15:27:21] hmm, yeah, i wonder if we could just sort of limit things at the filesystem level [15:27:58] Quotas a per-user or per-group.. they can't be applied on a single directory as far as I understand. [15:28:44] really... that's very limited [15:29:03] hmm.. but I see something about "projects" in the quota docs.. that's new to me... [15:29:04] * dancy keeps reading [15:29:30] unless the registry runs as a separate user or group [15:31:01] digital ocean also has a container registry service... [15:32:55] we could instead use that and try https://docs.gitlab.com/ee/administration/packages/container_registry.html#use-an-external-container-registry-with-gitlab-as-an-auth-endpoint [15:33:58] i'm very unsure how complicated that gets [15:34:28] *sigh* [15:34:33] maybe we need to take a step back here [15:34:39] yeah, maybe. [15:35:28] so, we need to get a built image into the registry. what exactly is doing the pushing? [15:35:44] if we go with kaniko, it's the user provided pipeline process [15:36:02] if we go with buildkitd, it's the buildkitd daemon (isolated from the runner and pipeline) [15:36:32] we definitely can't expose the current wmf prod cred to pipeline processes. that's a non-starter [15:36:57] can we expose it to the buildkitd instances though? [15:37:18] I think so. [15:37:29] if they live on a 3rd party cloud? [15:37:32] is that crossing a line? [15:38:06] imo, a 3rd party cloud is not inherently less secure than jenkins :) [15:38:31] so my vote would be that it's ok [15:39:57] however, if it's not ok. plan b(1) could be: set up a registry on DO k8s and have buildkitd auth and push to that, then replicate to wmf prod elsewhere [15:40:03] what if we had them push to a DO container registry and went ahead with the shim on contint for publishing them, on the theory that this is similar enough to the gitlab registry we might enable later? [15:40:04] yeah [15:40:19] (/me buts in with: the current creditial process is silly simple: only root can access the creds, and there's a sudoer rule used by a builder process to run a script that does the pushing which reads those credentialsβ€”so the builder process doesn't have access to the creds) [15:40:49] right [15:41:11] (/me also butts in with: let's ask ServiceOps folks what they think in standup) [15:41:17] the builder process in jenkins is congruent with the pipeline process in gitlab i think [15:41:48] in the scenario we're talking about with buildkitd, the "builder" (pipeline) process still has no access to the credential [15:41:51] only buildkitd does [15:44:29] got it, only providing historical context in case folks weren't aware of the current model---entirely unix permissions. [15:45:02] * dduvall nods [15:45:05] the other question about pushing from a third party that serviceopsen folks here would be able to answer that I don't have any mental model of is network access [15:46:28] oh interesting. does the registry currently only allow pushes from wmnet. ? [15:47:06] another q for mutante or jelto when they have a chance [15:48:30] i have a vague memory of this being the case, but I don't know if it's still valid or accurate [15:48:33] Pushes are limited to addresses listed in profile::docker_registry_ha::registry::image_builders [15:48:50] was just looking at that [15:49:37] My favorite comment from operations/puppet/modules/docker_registry_ha/templates/registry-nginx.conf.erb: `# Fuck you docker.` [15:49:52] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/role/common/docker_registry_ha/registry.yaml [15:49:53] haha [15:51:35] relatable. [16:04:16] looks relevant :) [16:19:51] we're discussing registry stuff etc in https://meet.google.com/eve-esmx-wea if anyone's interested. [16:57:26] i keep typing "registy". wonder if my r key is sticking... [17:02:01] https://www.irccloud.com/pastebin/2Xx1i2TC/ [17:02:34] guess I will look at Kaniko / https://gitlab.wikimedia.org/jnuche/kaniko-poc ;) [17:02:41] gotta learn what it is about [17:03:15] TL;DR rootless and daemonless container building. [18:06:06] Taking a break [18:43:12] hmm - anybody know offhand why i'd get a "no route to host" for `ssh brennen@gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud`? [18:43:29] the host seems up, i'm a projectadmin, and i can get to other instances like phabricator-prod-1001. [18:44:47] brennen: yea, click "soft reboot" in Horizon and it should be back in 2 minutes or so [18:45:01] mutante: thx [18:45:13] a couple of us ran into this. that's how I know [18:45:17] doesnt mean I know why that happens [19:03:26] dancy: nice, but does that mean we'd need a docker runner that's `seccomp:unconfined` and `apparmor:unconfined` for it to work in gitlab [19:03:27] ? [19:04:23] e.g. https://gitlab.wikimedia.org/repos/releng/blubber/-/blob/1f949df5956ac28deba3505001814abdceaad91a/.gitlab-ci.yml doesn't seem to work [19:05:48] https://gitlab.wikimedia.org/repos/releng/blubber/-/jobs/17419#L24 [19:11:28] hmm.. yeah that makes sense. I tested w/ a normal docker daemon. [19:32:44] I confirmed that kaniko cannot process the #syntax line from blubber's .pipeline/bluber.yaml. [19:40:18] ok [19:40:34] i'm going to look at the feature set that kaniko currently supports and see what's lacking [19:41:06] but i think at this point i'm leaning towards a proposal of using buildctl + isolated rootless buildkitd [19:41:22] Nod.. we know that definitely works. [19:41:58] do we know where we'd like to provision buildkitd as a part of the general ci infra? [19:42:23] e.g. can we add it to the aforementioned terraform stuff for secure runners? [19:45:29] brennen: do you happen to know where ^ lives? [19:46:32] oh, i see https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner [19:46:47] https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/tree/main/terraform [19:48:12] i think that gitlab-cloud-runner is intended for _un_trusted runners? [19:48:34] hmm, ok [19:48:49] well, we still need to build images there i think, but perhaps not publish them [19:49:05] so where does the config live for trusted runners? [19:49:09] or do we not have that yet? [19:50:03] I'm trying to find something running that fits the description of a trusted runner but I don't see any [19:50:31] i do believe they're running - https://wikitech.wikimedia.org/wiki/GitLab/Gitlab_Runner/Trusted_Runners [19:50:36] `mv untrusted trusted` :) [19:50:38] gitlab-runner1001 & 1002 [19:51:02] ah, ok. so puppet i'm guessing [19:51:12] yeah [19:51:21] `::profile::gitlab::runner` [19:52:11] this is where it's unfortunate that this morning was too meeting-saturated to drag j.elto into that breakout... [19:52:13] btw this is what I used: https://gitlab.wikimedia.org/admin/runners?search=trusted Perhaps we should add a tag or some other searchable parameter [19:52:45] so if we're going with buildkitd then the possible next step might be to add buildkitd to puppet and get some more ganeti hosts? [19:53:06] brennen: yeah, that'd be nice to have some sre input here [19:53:13] https://gitlab.wikimedia.org/admin/runners?tag[]=protected [19:53:14] i know everyone is swamped with meetings today [19:53:25] protected & physical seems like the current set, i think [19:53:31] but yeah, a specific tag would be a good idea [19:53:41] i think [19:54:23] yeah, i think adding buildkitd to puppet probably lines up with what j.elto's been thinking about those being the place for these builds to happen [19:54:35] or re: next step for buildkitd, do we set up buildkitd on another DO kubernetes cluster and be ok with those being used by both trusted and untrusted runners? [19:54:41] ok [19:54:51] it would be much nicer to admin on k8s [19:55:09] yeah. :\ [19:55:39] well, hmm, why don't we target wmf prod k8s? [19:56:02] or will that start an uprising? [19:56:15] (since it's ci related) [19:56:55] (giant cyclical conversations of years long frequency) [19:57:00] hehe [19:57:53] speaking of puppet, does anybody have a moment to help me figure out https://gerrit.wikimedia.org/r/c/operations/puppet/+/790778 ? [19:58:06] i'm getting: [19:58:09] > Error while evaluating a Resource Statement, Class[Gitlab]: has no parameter named 'registry_enabled' (file: /etc/puppet/modules/profile/manifests/gitlab.pp, line: 173, column: 5) on node gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud [19:58:16] sure. spoffice?' [19:58:21] yup, there in two shakes [19:58:42] gah, actually, i've got a meeting [19:58:53] * dancy shakes a fist [19:59:01] i'll try to make it like 10 min. :) [20:01:06] alrighty. i'm all by my lonesome [20:01:22] but contented [20:17:56] (sorry, 1-on-1s running long because I'm involved :)) [21:21:35] If y'all are still looking for a registry for prototyping, WMCS ended up using quay.io for some things after Docker went pay to play. [21:23:05] on the gitlab side of things, we got the one on gitlab.devtools.wmcloud.org turned on [21:23:10] (just now) [21:25:14] dduvall: Ping on https://gitlab.wikimedia.org/dduvall/gitlab-buildkitd-eval/-/merge_requests/1 [21:26:32] merged :) [21:26:41] thx [21:44:39] dancy: since we're waiting for the gitlab container registry, maybe i'll work on packaging up a "standard" image build helper [21:44:56] OK. [21:48:02] in addition to abstracting the build command, it could also include an after_script section that notifies the replicator job, if polling isn't enough on its own [21:48:24] notifies in the case where an imaged is pushed that is [21:48:47] nod.. that'll be ncie. [21:48:49] *nice. [21:57:32] It looks like Gitlab's registry will not provide a list of tags for an image w/o prior authentication. Pulling an image by a known tag/digest works though. [21:57:37] dancy: you're probably using something nicer than bash to do the polling but it looks like the container registry uses bearer token auth [21:57:41] :) [21:57:54] https://www.irccloud.com/pastebin/Lxy7M2p9/ [21:58:38] interesting that pulling by known tag works [21:58:44] i guess that's fine [21:58:54] nod.. its fine if we can get notifications of pushes. [21:59:01] won't work for polling [22:00:15] the bearer token retrieval above is an additional auth step but it could work if we can bind proper credentials for the jenkins job [22:00:33] Nod. Probably the best bet [22:01:18] So we'll need a gitlab account for that service user. [22:25:44] i vaguely remember gitlab having some concept of bot users, but that might be a purely internal thing. [22:29:54] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab-a-thon 🦊), 10User-dduvall: Create Blubber repo on GitLab, archive Gerrit repo - https://phabricator.wikimedia.org/T307533 (10jeena) [22:30:03] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab-a-thon 🦊), 10User-dduvall: Implement linting and unit tests for Blubber on GitLab CI - https://phabricator.wikimedia.org/T307534 (10jeena) 05Openβ†’03In progress https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/2 [22:48:30] we should come up with a good standard for modeling "more than one reviewer on a thing" [23:31:03] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (GitLab-a-thon 🦊), 10cloud-services-team (Kanban): Assess GitLab Container Registry as a default for container build processes - https://phabricator.wikimedia.org/T307537 (10brennen) [23:35:07] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (GitLab-a-thon 🦊), 10cloud-services-team (Kanban): Assess GitLab Container Registry as a default for container build processes - https://phabricator.wikimedia.org/T307537 (10brennen) 05In progressβ†’03Stalled We ena... [23:35:15] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab-a-thon 🦊): Build Blubber images on GitLab - https://phabricator.wikimedia.org/T307536 (10brennen) [23:35:23] 10GitLab (Administration, Settings & Policy), 10Release-Engineering-Team (GitLab-a-thon 🦊), 10cloud-services-team (Kanban): gitlab: consider enabling docker container registry - https://phabricator.wikimedia.org/T304845 (10brennen) [23:56:25] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab-a-thon 🦊): Investigate alternatives to docker-in-docker for container image creation in GitLab - https://phabricator.wikimedia.org/T307599 (10brennen) To capture some earlier discussion with @dduvall and @dancy, I think we've pretty much landed on...