[04:51:08] I'm trying to figure out why an instance is down or very slow [04:51:30] preferably without someone rebooting it because I mostly want this to not happen rather than to fix it each time [04:52:24] I was logged in, typing SSH commands, and it got very slow, like I typed "top" about 10 minutes ago, and it echoed that back but hasn't managed to show any results yet [04:53:43] ok, it just recovered, let's talk about what happened then [04:54:56] on the host I could see QEMU running as normal, using 250% CPU, which shouldn't be a problem since it has 4 cores allocated [04:55:14] also the host was not busy, there was no problem with CPU or IO [04:59:03] ok, now that I'm back in I see that oom-killer did run -- I thought it didn't because memory usage was fine in prometheus in the last available data point, but something just used memory very quickly [04:59:55] it took 25 minutes between the server becoming completely unresponsive and oom-killer appearing in /var/log/syslog [05:12:45] how can linux be so bad at this? I mean, I have a linux laptop so this is a familiar problem -- with no swap it will lock up for 20 minutes before running oom-killer if I use too much memory [05:13:01] so I press sysrq-f which fixes it instantly [05:13:34] but I can't be hanging around on the servers pressing sysrq-f every time they go down [06:51:43] I put MemoryMax=85% in a systemd unit override file and that seems to be working [06:52:06] but this is not a satisfactory answer to my rant [06:52:30] default OOM policy should not be hang the whole system for 25 minutes [08:50:59] TimStarling: can you open a task and add the CloudVPS tag to it? [08:51:12] !log tools.cewbot truncate 678G log file T358555 [08:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cewbot/SAL [08:51:17] T358555: cewbot k8s-20230418.fix-redirected-wikilinks-of-templates.out is unreasonably large - https://phabricator.wikimedia.org/T358555 [08:52:33] o_O [09:04:43] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [09:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [09:14:54] commons delinker hasn't worked in 3 days, needs a restart or something [10:24:46] Hi, my cron jobs (tool:rebot) stopped february 15th, could somebody tell me what happened? [10:27:00] @Pau: https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation [10:28:24] Thanks, I'll try to migrate. If I'm not able to do it, could somebody help me? [10:29:55] this channel is a good place to ask if you have specific questions. [10:33:15] ok, I'll sure have some issues, as I am a newbie. Thanks in advance [10:35:53] This was my crontab: 0 23 * * * /usr/bin/jsub -N robot -once -quiet -release buster sh /data/project/rebot/robot.sh [10:35:54] 0 01 * * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/discussions.py [10:35:55] 0 02 * * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/catpet.py [10:35:57] 0 03 * * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/sensecat.py [10:35:58] 0 04 1 * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/disambig.py [10:36:00] 0 04 2 * * /usr/bin/jsub -N robot -once -quiet -release buster sh /data/project/rebot/robottot.sh [10:36:01] 0 04 3 * * /usr/bin/jsub -N robot -once -quiet -release buster sh /data/project/rebot/commonscat.sh [10:36:03] 0 04 4 * * /usr/bin/jsub -N robot -once -quiet -release buster sh /data/project/rebot/oficial.sh [10:36:05] 0 04 5 * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/catgra.py [10:36:06] 0 04 6 * * /usr/bin/jsub -N robot -once -quiet -release buster python3 /data/project/rebot/ee.py [10:36:14] I don't understand what must I do to migrate to Kubernetes [10:38:33] How do I create a Docker image? [10:43:33] @Pau is your code in a public git repository? [10:44:07] no [10:44:33] I only have some very basic pywikibot scripts [10:45:23] @Pau you can try following https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts [10:46:58] Okay, thanks. I'll try it [10:47:11] I would encourage you to publish your scripts though, your code must be licensed with an open source license, and should be publicly published unless there's a really really good reason not to (part of the TOC of toolforge) [10:47:52] you can get free hosting for git repos in gitlab.wikimedia.org (you can create the repo from the toolforge UI https://toolsadmin.wikimedia.org/tools/id/rebot under git repositories) [10:48:08] s/TOC/TOU/ xd [10:49:30] I have published in the bot page of wikipedia, because I don't know how to use Git [10:49:47] I have read https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts [10:50:37] but it does not say how to schedule more than one job and how to translate things like " 0 04 2 * *" [10:52:44] This is it: https://ca.wikipedia.org/wiki/Usuari:Rebot [11:01:41] I get this error: "/usr/bin/toolforge-jobs:15: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html" [11:04:30] @Pau Parles català? Fa molt de temps ja que no parlo de manera regular xd [11:04:49] That error is usually related to the virtual environment [11:05:38] wait, that is from toolforge-jobs directly [11:07:14] @Pau you can dismiss that warning [11:08:54] Sí, xerr mallorquí. Tu, d'on ets? [11:09:56] He creat un fitxer Dockerfile. Ara he de fer un yaml per a cada tasca? [11:10:56] o abans he fer docker images? [11:12:07] !log wikisource on wsexport-prod02 install cron and add cron job for killing ebook-convert orphans [11:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [11:13:12] Jo soc de barcelona :) [11:13:22] Idò! [11:13:29] no has de fer docker images, no les necessites [11:13:36] ah [11:13:57] tot el que necessites es els scripts al home de l'eina [11:14:10] *son [11:14:32] Ja hi són. Te pareix que em podries fer una guia ultrabàsica de tres punts del que he de fer? [11:17:09] un segon, estic a un meeting :) [11:18:45] has de posar el scripts al home, i crear els jobs amb `--mount=all --command="./elmeuscript.py"` [11:19:23] i `--image tool-pywikibot/pywikibot-scripts-stable:latest` [11:21:51] quina és l'ordre que he d'usar davant aquests paràmetres? [11:29:33] no importa :), la commanda sería com `toolforge jobs run --schedule "00 04 2 * *" --command="./elmeuscript.py" --image="tool-pywikibot/pywikibot-scripts-stable:latest" --mount=all nomdelmeujob` [11:29:41] @Pau ^ [11:31:00] Ok [11:31:19] Avui capvespre miraré de fer-ho i si no em surt, ja us tornaré a molestar [11:31:31] ;) [11:34:37] Moltes gràcies! [11:50:16] 👍 pots obrir una tasca a https://phabricator.wikimedia.org/ en cas de que no hi hagi ningú al chat :) [12:13:03] !log superpes@tools-sgebastion-10 tools.stewardbots Restarted StewardBot not feeding on IRC and SULWatcher which had quit from IRC [12:13:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [13:37:24] Hello! Could anyone restart the `wikiquantos` webservice? It got froze and the maintainer is currently offline. [13:41:14] !log taavi@tools-sgebastion-11 tools.wikiquantos toolforge webservice restart [13:41:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiquantos/SAL [13:42:18] Thanks! =D [19:11:18] ahh still down [19:15:16] stemoc: https://toolsadmin.wikimedia.org/tools/id/commons-delinquent lists the folks who might know what to do [19:17:42] not that, croptool having backend issues, won't oauth, probably cookie issues, works now.. [19:19:03] commonsdelinker could use an upgrade tho, same for its alt filedelinker [19:57:36] ooh, another wikibugs "bugfix" for something that's been broken for longer than I've been here [20:07:44] the color change/update? [20:09:49] the project colors matching what's in phab, and not just always being blue [20:45:49] !log bd808@tools-sgebastion-11 tools.wikibugs Restarted wikibugs-phab job to pick up fixes for T1175, T1176, and T1177 [20:45:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:47:46] taavi: wikibugs is a "long now" project ;) [23:33:18] !log bd808@tools-sgebastion-11 tools.wikibugs Changed the $HOME/libera git checkout remote origin to https://gitlab.wikimedia.org/toolforge-repos/wikibugs2 (T357850) [23:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL