[15:07:17] gehel: we talked about that question at the checkin and i think you are far more likely to know than any of us. java IDEs are sort of a foreign country for me. :) [15:09:13] brennen: IntelliJ is just as good for PHP, or JS, or Python :) (well, PHPStorm, WebStorm, PyCharm and the like). [15:09:27] but ok, I'll try a few and see if I can find anything that works for me [15:09:28] gehel: s/java// [15:09:49] i live in vim so the whole landscape of IDEs is mostly pretty obscure to me. [15:10:30] :) fair enough! [15:22:51] if you locate something good specifically for gitlab, it'd be great to have documented on-wiki [15:23:40] though thinking about it more i guess i'd expect that any decent git integration plugin should be fine here, but i'm guessing maybe you're looking for integration that exposes stuff like merge requests and CI jobs? [17:26:18] darn. looks like there are still networking issues on trusted runners https://gitlab.wikimedia.org/repos/releng/gitlab-runner-test/-/jobs/21691 `Could not resolve host: gitlab.wikimedia.org` [17:26:32] i'll reopen the task [17:29:17] 10GitLab (CI & Job Runners), 10Patch-For-Review, 10Release-Engineering-Team (GitLab-a-thon 🦊), 10User-brennen: Deploy buildkitd to trusted GitLab runners - https://phabricator.wikimedia.org/T308271 (10dduvall) [17:29:26] 10GitLab (CI & Job Runners), 10serviceops-collab: DNS/networking not working on Trusted Runners - https://phabricator.wikimedia.org/T311241 (10dduvall) 05Resolved→03Open I ran into this issue again today while attempting to build an image via buildkitd. See https://gitlab.wikimedia.org/repos/releng/gitlab... [17:29:52] dduvall: ah, I hope it's just because I restarted docker on one host but not all of them [17:30:11] looking [17:30:18] is there a way to route a pipeline job to a specific trusted runner? [17:30:26] mutante: thx! [17:32:25] I was wondering that too. so far I just found out if I click the rebuild button it uses the same one from last time [17:34:52] arrg.. that's 2004 [17:35:04] that's the one where it worked :/ [17:36:18] I had even installed nslookup inside the container just to double confirm it.. sigh..why [17:39:00] what does /etc/resolv.conf say within a container? [17:39:48] dduvall: You could add a tag to a specific runner and then set your job to target runners w/ that tag. [17:40:00] dancy: ah, good idea! [17:41:57] well.. WTF [17:42:06] the content of /etc/resolv.conf in the container.. CHANGED [17:42:23] it should include `nameserver 127.0.0.11` [17:42:24] but nothing in the firewall fix would have done that [17:42:48] dduvall: it does now. it did not the other day when things worked [17:43:05] 127.0.0.11 is correct when using a custom docker network [17:43:22] so maybe our firewall rules need to be tweaked to work with that [17:45:21] root@2cb8d178d4b2:/# host gitlab.wikimedia.org [17:45:21] gitlab.wikimedia.org has address 208.80.154.145 [17:45:29] DNS lookup working inside the container [17:46:05] via which nameserver though? [17:46:27] for the docker network to function correctly, it needs to be able to resolve names via 127.0.0.11 [17:46:44] otherwise we won't be able to resolve the buildkitd container via a name [17:46:59] docker's internal DNS is at 127.0.0.11 [17:47:41] (i'm open to scrapping the whole docker network thing if we can figure out another way to consistently address buildkitd, i just couldn't figure out another way) [17:50:16] dig gitlab.wikimedia.org @127.0.0.11 [17:50:23] ;; ANSWER SECTION: [17:50:23] gitlab.wikimedia.org. 300 IN A 208.80.154.145 [17:53:37] makes no sense yet that everything worked and then stopped working without touching it [17:53:44] but of course I will look more soon [17:53:58] and yea, works from 127.0.0.11 too [17:54:56] Solar flares [17:55:12] that's really weird [17:55:28] so what made it work? [17:56:05] firewall change and restarting docker to create a fresh container [17:56:17] i have a feeling we might be messing with the wrong docker iptables chains. the docs say to only use DOCKER-USER https://docs.docker.com/network/iptables/ [17:56:50] 10GitLab, 10Release-Engineering-Team: A new maintainer needs access to the generated-data-platform group - https://phabricator.wikimedia.org/T311657 (10gmodena) [17:57:15] ah, ok. i submitted https://gerrit.wikimedia.org/r/c/operations/puppet/+/809650/ but not sure if it's necessary now [17:58:14] without being able to login and see all the chains, we're relying on your gracious help mutante :) thank you! [18:58:37] 10GitLab, 10Release-Engineering-Team: A new maintainer needs access to the generated-data-platform group - https://phabricator.wikimedia.org/T311657 (10XCollazo-WMF) 05Open→03Resolved a:03XCollazo-WMF Thanks for creating this @gmodena. @Eevans took care of it. Closing. [20:01:08] dduvall: thanks for the code upload. just deployed it and restarted the job. "Job succeeded" [20:01:23] just.. that I saw this before, heh [20:01:57] I also had to restart docker itself [20:20:19] 10GitLab (CI & Job Runners), 10serviceops-collab, 10Patch-For-Review: DNS/networking not working on Trusted Runners - https://phabricator.wikimedia.org/T311241 (10Dzahn) deployed the latest change, restarted docker on gitlab-runner2004, restarted the job and got a "Job succeeded". (just that it also succeed... [20:41:26] mutante: np. thanks for the deployment! [20:42:00] now i'm getting `The "docker-registry.wikimedia.org/dev/buster:1.0.1" image is not present on list of allowed images` (i need dev/buster for curl to install buildctl temporarily) [20:42:15] brennen: looks like the allow pattern still isn't sorted? [20:43:11] i'm pretty sure we determined in our testing that it needs to be `docker-registry.wikimedia.org/**/*` [20:45:50] I restarted docker on all runners via cumin, fwiw. [20:45:56] yeah, wtf https://gerrit.wikimedia.org/r/c/operations/puppet/+/805247/2/modules/gitlab_runner/templates/config-template.toml.erb was merged and deployed but i'm seeing [20:46:01] and that change was merged to make the pattern **/* [20:46:14] https://www.irccloud.com/pastebin/Fy6jp89T/ [20:46:21] :/ [20:46:44] * dduvall head-desks [20:49:28] mutante: can you verify the trusted_images runner config on gitlab-runner2004.codfw.wmnet? [20:49:54] er, `allowed_images` [20:51:41] the image on the registry: https://docker-registry.wikimedia.org/dev/buster/tags/ [20:51:44] config on runner: [20:51:45] allowed_images = [ [20:51:45] # Everything in Wikimedia registry: [20:51:45] "docker-registry.wikimedia.org/**/*", [20:51:45] "docker-registry.discovery.wmnet/**/*", [20:52:12] let me try something and manually add a line [20:52:15] while puppet is disabled [20:52:18] is it possible the runner wasn't restarted after the config change? [20:53:46] yea, that's possible [20:53:47] Active: active (running) since Mon 2022-06-13 22:31:42 UTC; 2 weeks 1 days ago [20:54:04] that's before the merge.. restarting now [20:54:15] doing all of them right away [20:54:44] fantastic. thx! [20:55:03] now to figure out what needs to change in puppet to notify the service... [20:55:27] first just try if the build works now [20:55:33] * dduvall does [20:55:49] looks at puppet code meanwhile [20:56:24] huh. still failed with the same error [20:56:29] ran on gitlab-runner2004.codfw.wmnet [20:56:43] 10GitLab, 10Release-Engineering-Team: Add XCollazo-WMF maintainer to generated-data-platform GitLab group - https://phabricator.wikimedia.org/T311657 (10Aklapper) [20:56:47] what. is. happening. [20:57:01] dduvall: do it again. a new container now [20:57:46] failed [20:59:05] try stop/start maybe? [20:59:20] I notice there are 2 files. config.toml and config-template.toml [20:59:32] gotta see how those related to each other [21:00:12] stopped and started. done [21:00:47] failed again. yeah, maybe something up with the template [21:01:46] a comment in `gitlab_runner::config` says the template is used during registration [21:02:05] but i wonder if it is or isn't re-read upon service restart [21:02:35] dduvall: I temp. added ""docker-registry.wikimedia.org/dev/buster:1.0.1" to the list directly [21:02:38] as a test [21:02:55] like literally the full name without wildcards [21:03:01] and stopped/started [21:04:27] i still get a failure and that image is not shown as an allowed_image in the error message [21:06:07] this is very odd [21:06:23] is it possible there's some other configuration somewhere overriding the runner config? [21:06:54] figuring out how to delete it and make it register again [21:07:13] because i don't see other recent additions to that list either, i.e. the `docker-registry.discovery.wmnet/**/*` entry jelto added [21:07:42] the register command runs unless => "/usr/bin/gitlab-runner list 2>&1 | /bin/grep -q '^${runner_name}'", [21:09:02] /usr/bin/gitlab-runner list does not show the runner_name [21:09:11] which would mean it would have to register [21:09:50] that's not good [21:10:21] I would like it more if it did list the runner name [21:10:33] because then I could delete it in web UI [21:10:37] and just run puppet [21:10:39] i'm pretty sure it used it [21:10:41] to get it to re-register [21:10:42] used *to* [21:10:57] /usr/bin/gitlab-runner list [21:10:57] Runtime platform arch=amd64 os=linux pid=2637292 revision=f761588f version=14.10.1 [21:11:00] Listing configured runners ConfigFile=/etc/gitlab-runner/config.toml [21:11:07] unless it's the wrong user... [21:11:21] oh. wait, found "unregister" command [21:11:27] let me try that [21:12:04] FATAL: could not find a runner with the name 'gitlab-runner2004.codfw.wmnet' [21:12:41] while it is in the list in the web UI as a runner ready to accept jobs [21:12:59] so, as i understand https://docs.gitlab.com/runner/register/#example, the config template should only be read at registration and the actual config should have everything from the template in it after registration [21:13:22] so the way the config template and config are puppetized will probably need to change [21:13:58] i.e. we can't make subsequent changes to the template and have them automatically propagate to the runner config [21:14:07] _after_ registration [21:14:36] re: "could not find a runner" :( [21:14:39] dduvall: one mystery solved [21:14:47] sudo -u gitlab-runner /usr/bin/gitlab-runner list [21:14:57] gitlab-runner2004.codfw.wmnet [21:15:03] ah [21:15:24] this is different because it's a protected runner [21:15:27] not running as root [21:15:38] gotta do the commands as that same user [21:15:53] Unregistering runner from GitLab succeeded [21:15:54] before you re-register, would you mind just diffing the config against the template? [21:15:57] Updated /home/gitlab-runner/.gitlab-runner/config.toml [21:15:58] oh :) [21:16:01] oh. in the home [21:16:04] notin /etc [21:16:06] interesting [21:17:24] we have a config and a template in /etc [21:17:31] and just a config in /home [21:17:45] diff between template in /etc/ and config in home is ... large [21:17:49] and does the config under /home now have everything from the template in /etc? [21:18:11] no, but we have not re-registered yet [21:18:17] k [21:18:53] re-enabling puppet to let puppet do it [21:19:04] this config template pattern that the gitlab folks have created seems... bad [21:19:07] because it should now [21:19:33] Notice: /Stage[main]/Profile::Gitlab::Runner/Exec[gitlab-register-runner]/returns: executed successfully (corrective) [21:20:01] allowed_images = ["docker-registry.wikimedia.org/**/*", "docker-registry.discovery.wmnet/**/*", "centos/*:*", "debian:*", "fedora:*", "opensuse/*:*", "ubuntu:*", "python:*", "ruby:*", "rust:*", "rustlang/rust:nightly", "registry.gitlab.com/gitlab-org/**/*"] [21:20:14] :) [21:20:22] dduvall: ^ this is now straight from the config.toml in the user home [21:20:35] k. i'll retry the job [21:21:27] hmm "This job is stuck because you don't have any active runners online or available with any of these tags assigned to them: protected" [21:23:39] it's tagged with protected where I see it [21:23:59] seems like it got confused because it was unregistered [21:24:00] same [21:24:19] can you start a new job vs clicking the rebuild button? [21:24:33] i can try that [21:24:46] but i don't see it listed as "available" under https://gitlab.wikimedia.org/repos/releng/gitlab-runner-test/-/settings/ci_cd#js-runners-settings [21:25:02] i do see it as available under repos/releng ci settings [21:25:23] i see it here https://gitlab.wikimedia.org/groups/repos/releng/-/runners [21:28:14] so i restarted the pipeline, and it still got stuck on trusted-buildkit-job https://gitlab.wikimedia.org/repos/releng/gitlab-runner-test/-/pipelines/5265 [21:28:27] but the `trusted-build-job` job ran [21:28:36] which has the same tags... [21:29:00] same tags, same rules [21:44:10] dduvall: sorry but real life events at the office and the part that my car is in valet.. means I cant continue right now. actually at the office event [21:44:21] almost forgot to get my car out in time.. [21:44:32] oh no! glad you got it [21:44:39] no worries. i'm fried for today [21:44:49] maybe i should have gone to the event :) [22:50:02] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10SecTeam-Processed, and 2 others: Re-implement semgrep ci includes - https://phabricator.wikimedia.org/T307962 (10sbassett) **fake-gitlab-bot:** - Initial commit for repo, mostly app structure - https://gitlab.wikimedia.org/repos/securi... [23:01:43] (this is me noting that i will attempt to process the scrollback in here when my brain's working again)