[04:41:59] I have a Kubernetes job that is writing "Killed" to stderr and then dying [04:42:17] It doesn't appear to be using exceptional amounts of memory or anything [04:42:20] How can I debug this? [04:42:28] (This is on toolforge btw) [04:55:27] !help [04:55:27] If you don't get a response in 15-30 minutes, please email the cloud@ mailing list -- https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication [05:19:29] I guess everyone is asleep :( [05:19:37] No worries, I'll try again later. [07:11:40] tto: you can check the tool namespace graphs (https://grafana-rw.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1), or the kubernetes events (advanced, `kubectl get events -w` will "watch" the events while they happen) [07:20:22] derenrich: yes please, ask for more space if you need it, though changing the default right now might be a bit premature (https://grafana.wmcloud.org/d/m9V1RQs4k/harbor-overview?orgId=1 shows only 1 project over 90%). If you are curious :), our current infra has the drive replicated 3 times and distributed on a ~40 nodes storage cluster, 45TB right off the bat might get us too close to cluster full [07:20:22] (https://grafana.wikimedia.org/d/DpbFWWCGk/wmcs-ceph-eqiad-capacity?orgId=1) [08:36:52] !log tools.admin configure `health-check-path: /healthz` in service.template T365562 [08:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [08:36:56] T365562: toolforge: admin tool /healthz returns 503 from time to time - https://phabricator.wikimedia.org/T365562 [09:41:35] my photo   https://vu.fr/LTxOb [12:59:02] Hi, I want to add a private secret to a standalone puppetmaster in gitlab-runners wmcs project. But when I try to commit I get: [12:59:02] gitlab-runners-puppetserver-01:/srv/git/labs/private$ sudo -u gitpuppet git commit [12:59:02] Local commits are not allowed in this repository. Please go to frontend puppetmaster and commit [12:59:02] How do I add secrets properly? In the git log I can see that this happened before as well [13:02:09] jelto: the hook that prevents local commits should not be there, you can safely remove it (and Puppet should not add it back) [13:07:06] great thanks! [13:48:01] I have another question. Is it possible to add private hiera config for a single wmcs host only? I tried to add a unique profile::gitlab::runner::token for runner-1030 on the gitlab-runners puppetserver (gitlab-runners-puppetserver-01) but the host is not picking up the new secret. [13:48:01] I tried putting the token to /srv/git/labs/private/hieradata/labs/gitlab-runners/hosts/runner-1030.gitlab-runners.eqiad1.wikimedia.cloud.yaml and /srv/git/labs/private/hieradata/hosts/runner-1030.gitlab-runners.eqiad1.wikimedia.cloud.yaml [13:48:01] Is this possible and what hiera location should I use ? [13:54:11] jelto: I think that should be the project-local puppetmaster [13:55:07] in particular, a local patch for the host-level override in the labs/private copy of the project-local puppetmaster [13:55:33] yes I'm doing the changes on gitlab-runners-puppetserver-01.gitlab-runners.eqiad1.wikimedia.cloud in the directories mentioned above (srv/git/labs/private) [13:56:24] why the file in host/ has the full fqdn? is that expected? I would have tried with the short hostname first [13:57:00] i.e, `/srv/git/labs/private/hieradata/hosts/runner-1030.yaml` [13:57:23] ah good catch, I'll try that [13:58:17] have you looked at /etc/puppet/hiera.yaml to see which paths should work in theory? [14:03:02] adding the token in /srv/git/labs/private/hieradata/hosts/runner-1030.yaml also doesn't work. [14:03:02] And /etc/puppet/hiera.yaml is not mentioning any host-specific private hiera files, only the common.yaml files [14:04:56] so that's probably the cause [14:11:38] can I just add my own config there or is there a standardized way where to put the host secrets? so something like: [14:11:38] - name: Private host hierarchy [14:11:38] path: hosts/%{facts.networking.hostname}.yaml [14:11:38] datadir: "/etc/puppet/private/hieradata" [14:15:36] yeah, something like that [14:15:53] but I honestly don't know why that is not in the main repo, configured for all local puppetmasters [16:01:55] yeah for whatever reason i'm getting linker errors when trying to execute ffmpeg after using `Aptfile` to install ffmpeg. i'm guessing something isn't quite right in the Aptfile buildpack. might need to investigate further [16:04:29] ouch… which tool name is this in? [16:15:35] okay, I can reproduce it in `docker run --rm -it --entrypoint=bash tools-harbor.wmcloud.org/tool-video-answer-tool/tool-video-answer-tool:latest` [16:16:25] hm, the `$LD_LIBRARY_PATH` looks suspicious [16:16:41] ffmpeg isn’t finding `libpulsecommon-15.99.so`, and it does exist in `/layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-15.99.so` [16:16:44] derenrich: yep, it's quite possible, as it does not really install the package, it just extracts it, any path mangling/symlinks done by post-scripts might not be ok [16:16:57] and the LD_LIBRARY_PATH includes [16:16:58] :/layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/pulseaudio/layers/fagiani_apt/apt/usr/lib/i386-linux-gnu/layers/fagiani_apt/apt/usr/lib: [16:17:04] that feels like there’s a colon or two missing in there [16:17:10] between “pulseaudio” and the next “/layers” [16:17:26] yep, that feels like a missing colon [16:17:43] which would be a bug in the apt buildpack, I guess? [16:17:45] might be easy to fix [16:17:52] hopefully ^^ [16:24:01] derenrich: can you try rebuilding again? I just sent a fix [16:24:19] (in a meeting will do in a sec) [16:24:24] (thx) [16:25:08] np, feel free to open a bug if it did not fix it (I'm going out in a bit, so might not be around later) [16:54:01] any recent changes to the way sudo is configured on cloud VPS? [16:54:09] dcaro: nice, thanks! (fix looks sensible to me ^^) [16:54:15] sudo: PAM account management error: Unknown error -1 [16:54:27] PAM and Unknown error sounds .. interesting [16:55:37] dcaro: yeah still not working. wrote up a ticket https://phabricator.wikimedia.org/T365633 [16:59:21] mutante: no that i'm aware of, where are you seeing that? [17:01:01] taavi: gitlab-prod-1002.devtools - just tested on an unrelated VPS in another project (wikistats) and doesn't happen there. I guess we must have misconfigured something somehow. I have never seen it before though. [17:01:25] currently I am working around it by using actual root [17:01:46] May 22 16:53:30 gitlab-prod-1002 sudo: pam_sss(sudo:account): Access denied for user dzahn: 4 (System error) [17:03:00] the interesting part seems to be that it's "system" and "unknown" and not just deined [17:03:55] maybe it could be somehow related to this using the local project puppetmaster [17:04:04] didn't get to look more yet [17:04:10] sssd-sudo.service is logging some weird errors [17:09:25] currently upgrading gitlab-ce there, as root [17:12:03] taavi: I found this.. cat /etc/sudoers.d/T205463-disable-sudo-password-prompts [17:12:40] 2018 ticket? heh https://phabricator.wikimedia.org/T205463 [17:14:40] well, that also exists on a machine without this problem. so disregard I guess.. just that it kind of seemed to match the "a password is required" [19:16:20] Hi! I struggle with setting up the environment. I have this error : "The virtual environment was not created successfully because ensurepip is not [19:16:20] available. [...]" and the only solutions I found needed * perms. Could someone help me? [19:34:49] Echecs: can you share more about where and how you are trying to create your python venv? [19:39:44] in my main folder on toolforge with bootstrap_venv.sh (see : https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python) [19:40:02] bd808 [19:41:21] Echecs: and you had first entered a `webservice shell` before attempting to run that? If so, what container did you select? If not this could be your problem. [19:42:40] or were you running it as a `toolforge jobs` command as shown on that page? If so, same question about the selected container [19:48:51] I didn't do webservice shell, I try with it rn [19:49:09] you will probably want `webservice python3.11 shell` [19:49:34] attempting to run any of this directly on login.toolforge.org or another bastion is expected to fail [20:48:39] it seems to work, thanks!