[02:58:06] !log commonsarchive rebooting VM commonsarchive-mwtest -- oom [02:58:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Commonsarchive/SAL [03:02:04] !log commtech rebooting commtech-2 -- oom [03:02:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Commtech/SAL [03:10:44] My k8s job keeps dying with an error message: [03:10:56] [2022-01-26 03:06:27,605: ERROR/MainProcess] Process 'ForkPoolWorker-5' pid:13 exited with 'signal 9 (SIGKILL)' [03:10:56] [2022-01-26 03:06:27,626: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 1.') [03:10:56] Traceback (most recent call last): [03:10:56] File "/data/project/spi-tools-dev/wp-search-tools/venv/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost [03:10:57] human_status(exitcode), job._job), [03:10:57] billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 1. [03:11:15] Is it possible I'm running into some sort of usage quota? [03:45:49] https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-spi-tools-dev&from=now-6h&to=now&refresh=5m [03:55:03] it looks like you might be hitting the allocated memory limit for that container, but you have room to increase that limit still [03:58:49] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Quotas_and_Resources you can try increasing the memory limit, see if that helps [04:18:05] ok, that sounds reasonable. I didn't know about the grafana dashboard; that's good to know about. [04:18:08] thanks [04:33:41] AntiComposite it's not clear how to interpret the grafana data. It looks like my quota limit is 8 GiB, and my actual usage peaks at just under 600 MiB. [04:34:00] quota limit is the maximum your namespace can be allocated [04:34:19] Ah [04:34:21] (namespace = tool account) [04:34:38] So I guess "Memory Allocated Requests" is the current limit? [04:35:06] allocated limit is the total your namespace has asked to be able to use [04:35:43] however, the actual limits are usually enforced by the per-pod limits, so it's not super useful [04:36:56] the default limits are 0.5 CPU and 512MiB [04:37:02] OK, I need to poke around more. It's possible I've got a memory leak. Or it might be that I'm running into a really large page that I'm trying to load. [04:38:56] requests determine how much CPU time and memory space the scheduler will reserve for that pod only [04:38:58] If I watch memory use with kubectl top pod, I see it get up to about 300 MiB and then things blow up, but it could be that I'm spiking up over 512 and top pod just isn't fast enough to see the real spike [04:39:58] I need to play around a bit more to know for usre. [04:39:59] sure [13:55:33] !log tools scaling up the buster web grid with 5 lighttd and 2 generic nodes (T277653) [13:55:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:55:36] T277653: Toolforge: add Debian Buster to the grid and eliminate Debian Stretch - https://phabricator.wikimedia.org/T277653 [14:17:53] !log integration created flavor g3.cores8.ram24.disk20.ephemeral60.4xiops T299704 [14:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/SAL [14:17:56] T299704: Request increased quota for integration Cloud VPS project - https://phabricator.wikimedia.org/T299704 [15:56:40] !log devtools bump quota, RAM from 32 to 40, cores from 16 to 20 (T299561) [15:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [15:56:43] T299561: Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 [17:26:14] !log devtools bump quota, floating IP from 1 to 2 (T299561) [17:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [17:26:17] T299561: Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 [17:27:34] thanks arturo :) [17:27:54] mutante thanks you :-) [18:05:38] Hey cloud folks! Some of my cron jobs on the grid engine are failing because they're unable to find the Python executable in my virtual environment. Example error: /var/spool/gridengine/execd/tools-sgeexec-10-1/job_scripts/8436610: line 26: /data/project/suggestbot/venv/opentask/bin/python: No such file or directory [18:06:07] Looks like this started a couple of days ago. Is there anything I should do on my end to fix this? [18:06:37] Nettrom: your job is now being executed in the buster grid. that's very unfortunate. I'm sorry, that's a mistake [18:07:18] but a venv...? it should work.... [18:07:37] anyway, the problem is that somehow the cron server is scheduling stuff by default in the buster grid? -_- [18:07:45] if that venv is built against stretch and its python 3.5, it won't work with buster and python 3.7 [18:08:29] I think I will depool all buster grid nodes to prevent this, but I need to double check [18:08:43] is jsub defaulting to `-release buster`? [18:08:54] I'm happy to make any changes (e.g. build a new venv) to make this work, not a problem. And no, I'm not using jsub, these are done using qsub [18:09:20] https://tools-static.wmflabs.org/bridgebot/5c477ffa/file_11115.jpg [18:09:51] chico: that indicates a venv rebuild is required, no? as taavi suggested [18:10:15] ah, I guess qsub (provided by sge) does not specify the debian release as jsub (provided by us) does [18:10:58] arturo: yeah, that or run it somewhere python3.5 exists. Venv rebuild seems more sensible [18:11:19] I wonder if buster is default just because it sorts after stretch [18:11:27] s/after/before/ [18:11:38] that would be...entirely in-character for sge [18:11:46] so Buster and Stretch are both available now? If so, I can update https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Specifying_an_operating_system_release [18:12:06] we're not quite ready to announce it (yet) [18:12:19] hah :) I'll hold on that then [18:12:24] Nettrom: we're in the middle of that work. Yes, soon. What you are experience I think is an early mistake [18:12:44] when the time arrives we will distribute https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation [18:13:01] either it defaults to that, or it just schedules it to somewhere without caring about the release at all and picks a node with the most resources available (which happens to be a buster node since no-one else is using them) [18:16:51] arturo: I like how the timeline on that page still says 2021-xx-xx ;) [18:17:11] Lucas_WMDE: :-P too many things going on man [18:17:16] :D [18:17:28] deep in our hearts we all know it’s still March 2020 anyways [18:17:45] ^^ that is way too true [18:18:04] unsolicited spam: we're hiring https://boards.greenhouse.io/wikimedia/jobs/3790172?gh_src=7d31c48b1us [18:18:35] alright, adding `-l release=stretch` to the options in my shell script (~suggestbot/project/opentask/opentak.sh) solves the problem since it forces it to the Stretch grid [18:19:02] make sure you'e got a -hard before the -l [18:19:17] otherwise it'll only best-effort put you on the Stretch grid [18:20:11] Nettrom: just curious, is there a specific reason you are using qsub directly rather than using our jsub helper script? [18:20:37] bd808: because I use a shell script to define the various options on a per project basis [18:20:52] I find it neater to have things specified in those files [18:21:10] AntiComposite: thankfully kubernetes does not do that :D "sorry, I you wanted a python 3.9 image, but you get an ancient version of perl instead because it's faster to download from the docker registry" [18:21:18] Nettrom: Most qsub options transparently pass though jsub with jsub adding useful defaults [18:24:39] bd808: Replacing `qsub` with `jsub` in crontab should "just work"? Might be worth testing that when moving to Buster, then [18:26:09] Nettrom: yeah, most of the time is should just work. I apparently didn't explicitly document what I left out, but https://github.com/wikimedia/labs-toollabs/blob/5baa25904dee53adf17829c5c79394aa9ada0c46/jobutils/bin/jsub#L324-L495 shows what is supported but not advertised when you do `jsub --help` [18:27:20] !log tools depooled grid node tools-sgeexec-10-1 - cookbook ran by arturo@nostromo [18:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:27:50] bd808: thanks for the link! I'll take a look at it and most likely start making some changes then :) [18:27:52] !log tools depooled grid node tools-sgeexec-10-2 - cookbook ran by arturo@nostromo [18:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:28:09] !log tools depooled grid node tools-sgeexec-10-3 - cookbook ran by arturo@nostromo [18:28:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:30:46] !log tools depooled grid node tools-sgeexec-10-4 - cookbook ran by arturo@nostromo [18:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:31:02] !log tools depooled grid node tools-sgeexec-10-5 - cookbook ran by arturo@nostromo [18:31:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:32:36] !log tools depooled grid node tools-sgeexec-10-6 - cookbook ran by arturo@nostromo [18:32:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:32:52] !log tools depooled grid node tools-sgeexec-10-7 - cookbook ran by arturo@nostromo [18:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:33:08] !log tools depooled grid node tools-sgeexec-10-8 - cookbook ran by arturo@nostromo [18:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:33:23] !log tools depooled grid node tools-sgeexec-10-9 - cookbook ran by arturo@nostromo [18:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:33:39] !log tools depooled grid node tools-sgeexec-10-10 - cookbook ran by arturo@nostromo [18:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:48:24] I can't pay a lot more attention to the grid today, real life is requesting me, sorry! [18:48:28] be back tomorrow [18:57:47] !log admin restarting mariadb on cloudcontrol1004 [18:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:09:37] !log admin bootstrapping a fresh galera node on cloudcontrol1004 [19:09:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:14:58] In the pages-meta-history XML dumps, is there any way to know how many pages were dumped into each file without actually reading the whole file? [20:15:32] I know I can get a rough estimate from the pXXXpYYY parts of the filenames, but that's often a gross over-estimate. [20:22:26] roy649_: you can use zcat to get the content without "really" unpacking it and then count how many lines with in it. it's fairly quick. example: [20:22:31] <mutante> zcat tawiktionary-20220120-stub-meta-history.xml.gz | grep title | wc -l [20:22:34] <mutante> 378929 [20:25:03] <roy649_> I need the pages-meta-history files, not the stub files, though :-) [20:26:48] <mutante> actually, make that "zgrep" to combine cat and grep into one. [20:27:16] <mutante> roy649_: eh, I don't even know where to get those because they seem to be "skipped" on https://dumps.wikimedia.org/hiwiktionary/20220120/ etc [20:28:05] <mutante> do they also have one "title" line per page though? [20:29:39] <roy649_> The point is, I don't want to have to decompress them. They're like 1 GB (compressed). I was hoping the job that created them kept track of how many pages it put in each and logged that somewhere. [20:33:11] <mutante> I see. still just took me 3 seconds to do that with an example file of like 120m [20:33:26] <mutante> you can optionally make a ticket and ask the dumps person for more details [20:33:32] <mutante> maybe they do keep track [20:33:39] <mutante> I wouldn't know that part, it's possible they do [20:36:19] <roy649_> ok, thanks [23:05:06] <roy649_> is there an elasticsearch CLI installed on the toolforge bastions? [23:08:12] <bd808> roy649_: yes, `curl` :) [23:08:44] <roy649_> :-)