[03:39:59] I'm trying to start a k8s job, but I get: [03:40:08] indexer-tasks-job-d2mjp 0/1 ContainerCannotRun 0 91s [03:40:23] How do I find out what went wrong? [03:40:39] I tried printing the log for the pod, but don't get any output. [03:52:00] you can try `kubecl describe job indexer-tasks-job` [03:52:51] ah, that's useful [03:52:53] thanks. [03:56:22] hmm, looks like this gets me even more verbosity: [03:56:23] kubectl get pod indexer-tasks-job-9slr6 --output=yaml [03:57:49] I guess you don't get anything in the log until the pod is up and running. [15:22:59] !log tools scaling up the grid with 10 buster exec nodes (T277653) [15:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:23:03] T277653: Toolforge: add Debian Buster to the grid and eliminate Debian Stretch - https://phabricator.wikimedia.org/T277653 [15:44:15] AntiComposite: turns out, part of my problem was that /usr/bin/bash doesn't exist on my pod. I thought I was being clever by using absolute paths to avoid $PATH issues. On the pod, it's only /bin/bash. On the bastion host, either one works. [15:44:39] well that's new and different [15:45:51] roy649_: which container are you using? [15:46:09] docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest [15:46:10] because /bin should be a symlink to /usr/bin [15:46:33] if I do: [15:46:34] command: ["ls", "-l", "/bin/bash", "/usr/bin/bash"] [15:46:38] I get: [15:46:50] k logs indexer-tasks-job-zstws [15:46:50] -rwxr-xr-x 1 root root 1168776 Apr 18 2019 /bin/bash [15:46:50] ls: cannot access '/usr/bin/bash': No such file or directory [15:47:01] does the python39 image have the same results? [15:47:49] Let me give that a try. I've been using 3.7 for development, so want to stick with that, but I'll do the experiment to see what happens. [15:47:49] `#!/usr/bin/env bash` is probably the most portable way to ask for a bash shell [15:48:04] I've got a3.9 shell up, works as expected [15:48:25] and lrwxrwxrwx 1 root root 7 Aug 15 04:00 bin -> usr/bin [15:48:55] Yeah, 39 gives [15:48:56] k logs indexer-tasks-job-vjdrp [15:48:56] -rwxr-xr-x 1 root root 1234376 Aug 4 20:25 /bin/bash [15:48:56] -rwxr-xr-x 1 root root 1234376 Aug 4 20:25 /usr/bin/bash [15:49:01] Hello everyone, I have such a question, in the article https://phabricator.wikimedia.org/T228824 I found the solution I needed at https://librenms.wikimedia.org/ports/ifType=vcp/format=list_basic/ [15:49:01] When following the link, authorization is required at https://idp.wikimedia.org/login , I registered, but now when I log in, I get the error "Service access denied due to missing privileges." [15:49:03] How to get there? :) [15:49:43] yeah /bin on 3.7 is not a symlink [15:49:59] there's some stuff in /bin and some other stuff in /usr/bin [15:52:35] did Debian change their directory structure between buster and bullseye or something? [15:53:22] I think they made the /usr merge the default on new installs, yes [15:53:39] (/usr merge meaning, more stuff lives in /usr, /bin /lib and a few other root dirs become symlinks) [15:53:50] that would do it then [15:53:55] https://wiki.debian.org/UsrMerge [15:54:07] wow, this is even stranger: [15:54:08] ls -li /bin/bash /usr/bin/bash [15:54:08] 786 -rwxr-xr-x 1 root root 1168776 Apr 18 2019 /bin/bash [15:54:08] 786 -rwxr-xr-x 1 root root 1168776 Apr 18 2019 /usr/bin/bash [15:54:18] that's on tools-sgebastion-11 [15:54:19] @galant_13: I believe that access to librenms requires that your Developer account is a member of the "nda" group. https://wikitech.wikimedia.org/wiki/Volunteer_NDA has some info on that process. This is a hat that you need to show a need for to collect rather than just a set of forms to fill out. [15:54:36] they have the same inode number, so must be hard links, yet ls shows the link count as 1. [15:55:08] oh, I see [15:55:16] lrwxrwxrwx 1 root root 7 Mar 27 2021 /bin -> usr/bin [15:55:57] how come every time I finally understand how the universe works, somebody changes it? [15:55:58] looks like the usrmerge package is installed on that system [15:56:18] promotes beard growth [15:56:30] oh wait, no it isn’t? [15:56:37] so why is that a symlink then 🤔 [15:56:50] (`dpkg -s usrmerge` says it’s not installed; `apt show` shows it but not as installed I think) [15:57:16] cat /etc/debian_version [16:01:03] * Lucas_WMDE prefers /etc/os-release ;) [16:01:16] I use /etc/os-release too. [16:01:33] Hi, SQL question, where is central auth DB configured? it is not on meta as far as I can see [16:01:43] matanya, centralauth_p [16:01:58] Thanks! [16:20:41] https://www.debian.org/releases/buster/amd64/release-notes/ch-whats-new.en.html#merged-usr looks like usrmerge is the default in new buster installs [16:21:45] the python 3.7 image is based on buster (and 3.9 is based on bullseye) [16:22:42] ah, so on fresh Buster installs, the symlinks are installed without the usrmerge package? [16:22:47] looks like the upstream debian buster Docker image doesn't have merged usr though [16:23:00] so it's not anything we did :) [16:28:37] aha https://github.com/debuerreotype/docker-debian-artifacts/issues/60 [16:28:57] there's your answer [16:29:20] test telegram bridge [16:29:58] Aniket: it hasn't fallen into the river yet [16:33:18] Hi, Everyone [16:33:19] I need help for following error [16:33:21] ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device [16:33:22] Which is occuring on installing libraries on tool server using pip (tensorflow specifically). [16:33:24] I was following https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python [16:34:06] what tool account, and do you happen to know how large tensorflow is [16:34:19] well, here's the other half of my problem. When you install celery into your virtualenv, it installs as: [16:34:30] #!/mnt/nfs/labstore-secondary-tools-project/spi-tools-dev/wp-search-tools/venv/bin/python3 [16:34:41] which doesn't exist on the pod [16:34:56] toolforge tool account [16:35:06] I guess that's pip being more clever than it should be [16:35:14] are you installing it inside the pod? [16:35:28] taavi was that for me? [16:35:39] yes [16:35:41] sorry [16:35:55] I'm doing the "pip install" on be bastion host. [16:35:59] s/be/the/ [16:36:30] try doing it inside a kubernetes pod with a similar/same image (`webservice shell` should be similar enough) and see what happens? [16:36:39] I don't see a spi-tools-dev/venv, I see a spi-tools-dev/www/python/venv [16:37:29] but yes, if you're planning to run Python from a kubernetes container, you should do all your pip/venv commands from a webservice shell, not on the bastion directly [16:38:12] /data/project/spi-tools-dev/wp-search-tools/venv [16:39:25] ok, I guess that makes sense. [16:40:10] I'm actually running two different things; one is the webservice, the other is a batch job. [16:40:23] They each have their own venvs. [16:40:36] Maybe it makes more sense to spin up another tool account for the batch job? [16:40:45] shouldn't make a difference [16:49:57] Aniket: is your tool called "imagesimilarity" ? [16:50:14] yes [16:50:25] @Aniket: That sounds related to taavi noticing earlier that the root partition on tools-sgebastion-07 filling up... [16:51:14] bd808 I'm following above chats then [16:51:26] Aniket, try now, I just cleaned some space in the server [16:53:19] ok arturo I'm trying it now [16:54:14] ok, let us know how that goes [17:24:12] !log tools.notwikilambda kubectl delete deployment update # T299934 [17:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL [17:24:49] arturo its work not got storage error [17:24:49] but got this one distutils.errors.CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 4 [17:24:51] is it also related to server cause this solution https://stackoverflow.com/questions/41492878/command-x86-64-linux-gnu-gcc-failed-with-exit-status-1 [17:24:52] suggeting to install some tools using apt [17:25:06] !log tools.notwikilambda git -C ~/public_html/w/extensions/PluggableAuth/ checkout 8b278afaca # T299934 [17:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL [17:28:19] I don't know if it is ok to install dependencies using apt on toolforge tool account [17:34:21] !log tools.notwikilambda git -C ~/public_html/w/extensions/PluggableAuth/ checkout cdf73daf11^ # T299934 [17:34:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL [17:44:45] !log tools reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo [17:44:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:49:55] @Aniket: You cannot do it yourself, but https://phabricator.wikimedia.org/project/profile/3978/ is the place to ask for new packages to be installed. If this is needed inside a container on Kubernetes it becomes less likely that the request will be approved, but without more details it is all hard to say what is needed and how possible it is. [19:15:15] Hi how can I query centralauth from labsdb replicas? I don't see centralauth as one of the db's there [19:15:29] centralauth.analytics.db.svc.wikimedia.cloud doesn't exist [19:18:38] it does for me, https://phabricator.wikimedia.org/P19076 [19:18:40] where are you trying to reach it from? [19:19:20] From matanya@tools-sgebastion-07 [19:19:48] 'sql centralauth_p' should do the job [19:20:38] Yes, I found the typo, thank you all [19:20:41] (at least when you want query from the console) [20:08:20] hello, why is there no link to download result at https://quarry.wmcloud.org/query/61836? [20:23:06] urbanecm: It looks like the download link isn't rendered until the result is loaded. [20:24:45] thanks Dylsss. Checking network tab shows https://quarry.wmcloud.org/run/610083/output/0/json, and https://quarry.wmcloud.org/run/610083/output/0/tsv seems to have the output. [20:25:31] When I run "kubctl get pods", I sometimes get: [20:25:32] runtime: failed to create new OS thread (have 44 already; errno=11) [20:25:32] runtime: may need to increase max user processes (ulimit -u) [20:25:32] fatal error: newosproc [20:25:40] followed by pages of stack dumps. [20:25:46] Am I just exceeding some user quota? [20:26:24] yeah, go is very hungry for OS threads and our bastions are configured to not like things that are hungry for OS threads [20:26:41] you can try using something like `GOPROCS=1 kubectl ...` if it helps [20:27:02] ok, I'll give that a try, thanks.