[00:26:36] weird, https://k8s-status.toolforge.org/namespaces/tool-meta/pods/meta-f47f54984-4bb8k/ reports running but https://meta.toolforge.org/ says no webservice [04:44:07] I've noticed that there are not many limits on how much mischief you can get up to when you're non-root in a buildpack image [04:44:39] I was having trouble debugging apache so I installed strace inside the container, using dpkg -x like how fagiani_apt does it [04:44:58] seems to work [04:46:21] I should write a fake apt-get for muscle memory compliance [04:47:19] although I guess fakeroot already exists [05:44:45] Hi! I'm looking into an issue where the meta tool on Toolforge (which I maintain) hangs. It seems to possibly be a filesystem issue? The tool has a `cache/wikimedia-wikis.dat` file which I can't edit or delete; e.g. `rm -rf wikimedia-wikis.dat` hangs forever (with the service stopped to make sure it's not accessing the file or something). [05:46:11] (It just started today; I got a report on Meta about it: https://meta.wikimedia.org/wiki/User_talk:Pathoschild.) [08:56:05] AntiComposite: Your processes seem to be getting stuck in D state, might be a leftover from the last outage a couple days ago [08:56:21] !log tools restart tools-k8s-worker-50 due to D some stuck processes [08:56:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:58:19] oh sorry, just read Pathoschild xd, yep, it seems that file is having trouble, not only on one worker node, looking [09:01:12] Pathoschild: I think there's some issue on the block device level on the NFS server (probably related to the outage also), I moved the whole cache directory and created an empty one so the tool can restart [09:03:30] Pathoschild: the tool seems back online, I'll take care of the leftover undeletable data [09:11:21] TimStarling: yes, the limits come from both, the non-root processes running in the container, and the container itself, that only you (well, your tool) is using, and will get scraped whenever restarted. The rest of things are just like any other non-root user environment with internet access, you can workaround almost anything (and power users like yourself are able to debug things in depth, and that's ok). If you find any security [09:11:21] issue (ex. you can affect other user's containers/etc.) please open a security task [13:06:35] !log tools reboot tools-sgecron-2 due to high load [13:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:26:57] @dcaro: Thanks! Everything seems to be working now. [17:47:25] btullis: I've run across a Stretch vm in the 'search' project that is all kinds of broken: rel2.search.eqiad.wmflabs. Any chance you could just delete that so I don't have to understand how it survived? [17:48:06] !log paws jupyterlab upgraded to 4.1.0 T357027 [17:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [17:48:10] T357027: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T357027 [17:49:07] inflatador, bearloga, same question. Can we delete rel2.search.eqiad.wmflabs? Surely it's been emailing you about broken puppet for about a year. [17:50:06] andrewbogott I don't think I've been getting these emails. I think ebernhardson and dcausse might actually use that server...guys? [17:51:58] inflatador: you might also take a look at the members of that project and do a purge, seems like there can't be that many ex-staff still engaged in that project :) [17:56:53] andrewbogott agreed, I can also add a .nopuppetchecks assuming I can get into that host [17:57:12] Please don't add .nopuppetchecks, that host needs to be deleted [17:57:18] It's running an unsupported OS [17:57:28] If it can't be deleted today then I'll make a task [17:57:42] hmm, checking what that host is [17:59:39] looks like this hosted discernatron, certainly not used. It has a database though will see if i can dump that [18:01:44] andrewbogott I created a task: https://phabricator.wikimedia.org/T357162 . Feel free to add anything you need there [18:01:52] thx [18:03:01] !log tools updated the default security group, removing the 0.0.0.0/0 rule allowing port 22 access everywhere, replaced it with a 172.16.0.0/21 rule [18:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:55:09] k8s question – does anyone know if it’s intentional that I can’t get the jobs/status of my own jobs? [18:55:19] or should I just file a task to have that added to whatever default permissions we have? :) [18:55:30] I’m getting this error when trying to read the status of a job via k8s: https://paste.toolforge.org/view/3a362aef [18:56:58] (`read_namespaced_job()` in Python works btw, only the `_status` version doesn’t) [19:01:29] doesn't seem familiar and https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/blob/main/deployment/chart/templates/rbac.yaml.tpl does list a few other /status resources so probably just an oversight. please do send a task and/or patch [19:02:02] will do, thanks [19:07:30] created https://phabricator.wikimedia.org/T357172 [19:08:40] !log devtools deleting instance phabricator-prod-1001 (shut down a couple days ago, buster instance replaced by phabricator-bullseye instance) T356530 [19:08:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [19:08:44] T356530: clean up phabricator test host situation - https://phabricator.wikimedia.org/T356530 [21:27:08] Hi, i need help...i have created a tool in toolforge but i don't know the next step [23:33:16] Hi all, happy Friday. After using the build service to deploy, is there a way to run one-off scripts without using the jobs framework? [23:42:11] imdeni: something like `toolforge webservice --backend=kuubernetes buildservice shell` might get you what you are interested in. That would start a pod using the default buildservice image for your tool and attach to it in an interactive session. [23:43:51] With my gitlab-account-approval tool I end up in a bash shell in the /app directory inside the container when I run that [23:45:16] from there I can do things like `launcher dry-run` to start the command labeled "dry-run" in my Procfile [23:45:20] That sounds like exactly what I was asking for. Thank you! [23:49:40] Hmm. It looks like that does not have the dependencies installed: `ImportError: Couldn't import Django. Are you sure it's installed and available on your PYTHONPATH environment variable? Did you forget to activate a virtual environment?` [23:51:01] I haven't dug into how the python builder installs dependencies. I wouldn't be surprised to find that there is a venv in use somewhere. [23:52:21] Gotcha. I'll take a look around. [23:55:26] imdeni: is there a /app/.heroku/python/bin/python3 in your container? [23:56:20] If https://github.com/heroku/heroku-buildpack-python is the builder we are using for python that seems like the venv entrypoint [23:57:32] Or maybe just /app/.heroku/python/bin/python