[09:46:14] Hi there! Does anyone have clues on why the K8s container is recreating after an exception happens on Toolforge? [09:47:33] After a job failed - message:  -- Pod 'adsbot-english-paper-test-8njr2'. Phase: 'pending'. Container state: 'waiting'. With reason 'ContainerCreating'. [09:47:33]   -- Pod 'adsbot-english-paper-test-8njr2'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-08-04T13:34:21Z. [10:07:03] Feliciss: I imagine if the exception is not handled, the container will terminate and K8s will attempt creating a new one. Would you expect the container not to terminate, or would you expect K8s not to create a new one when it terminates? [10:07:46] I expect K8s not to create a new one when it terminates. dhinus [10:11:18] That should be configurable with 'backoffLimit' in the K8s job, but I'm very new to Toolforge so I'm not sure where you could set that :) [10:12:04] Feliciss: I assume you're talking about the jobs framework? [10:12:36] Yes. It's jobs framework on K8s. taavi [10:12:53] you're not the first person to get confused by that and I honestly agree that for cronjobs it's not really expected to just retry it if it fails [10:13:03] I'll put up a patch to change that [10:14:32] it also happens in normal jobs framework, not only cronjobs. [10:20:27] I put up https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-api/+/820665/ [10:21:52] Thanks. taavi [10:23:37] Does this affects in any way the `--continuous run execution? It should not [10:27:24] Since many toolforge-jobs exits with != 0 for maxlag ecc. and I presume most users expect them to just keep continuous...ly as default, even if in fail state [10:29:07] (Well what I'm saying is true for both schedule and continuous) [10:29:58] no, the continuous mode is unaffected [10:32:36] Will affect toolforge-jobs --schedule? [10:32:37] If they exit 1 (e.g caused by maxlag or whatever) at the moment they are repeated, but then they will not? [15:40:00] !log tools.php-security-checker New tool created and set up: T296967 [15:40:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.php-security-checker/SAL [15:40:03] T296967: Move php-security-checker.wmcloud.org to Toolforge - https://phabricator.wikimedia.org/T296967 [15:59:59] !log tools.bridgebot Double IRC messages to other bridges [16:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [18:46:27] hi, anything known wrong with trove? libup for some reason can't connect to its database [18:46:35] sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'rtriy5eslxo.svc.trove.eqiad1.wikimedia.cloud' ([Errno 111] Connection refused)") [18:47:08] I haven't changed anything in months (related to DB)... [18:52:45] andrewbogott: ^^ [18:53:52] legoktm: rabbitmq has been acting up for several days, and rabbit is involved in controlling the client node. I probably won't be able to get to it today... [18:54:08] ack! I will file a bug just so I can point people to it [18:54:22] is there a separate bug about the rabbitmq issues? [18:54:30] thanks. I might be able to get in a one-off fix but we're really out in the weeds in terms of having these things talk to each other right now :( [18:54:44] T314522 has some documentation about the rabbit thing [19:12:45] T314679 was filed about db issues too [19:12:45] T314679: tools.tools-info has problem connecting to database - https://phabricator.wikimedia.org/T314679 [19:14:58] that looks like a different issue [19:15:00] T314680 [19:15:00] T314680: LibUp down because of database connection issues - https://phabricator.wikimedia.org/T314680 [19:57:56] The problem in T314679 is stale code/unmaintained tool. It's interesting that the task was created today, but the app would have been broken since April 2021 (~16 months). [19:57:57] T314679: tools.tools-info has problem connecting to database - https://phabricator.wikimedia.org/T314679 [20:31:49] !log tools.directory shutdown broken webservice. maintainer seems to have been gone since 2014 [20:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.directory/SAL [20:43:54] !log packagist-mirror Added bd808 (self) to project to process T314677 [20:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Packagist-mirror/SAL [20:43:58] T314677: Redirect https://php-security-checker.wmcloud.org/ to https://php-security-checker.toolforge.org/ - https://phabricator.wikimedia.org/T314677 [21:04:42] Hi. Is Docker installed on WMCS/Toolforge, or can I install it or request it? I want to pull a docker image and run it (in grid engine, kube, whatever can run it) [21:06:19] (bd808 or taavi if you're around ^) [21:07:02] proc: no, Toolforge does not offer Docker. We do offer Kubernetes, but only using the containers that we maintain. [21:07:32] A cloud vps project can choose to install Docker on a vm of course [21:07:49] can I request a cloud vps project for a bot? [21:08:23] basically just looking to move a bot from personal hosting into WM hosting. It was a pain to setup/configure on toolforge last time I tried, hoping a functional docker image will be less painful [21:16:53] proc: you can request one yes. we try to convince most folks with bot sided projects to use Toolforge instead, but there can be valid reasons that does not work. [21:17:44] There is ongoing work to improve Toolforge with a buildpack based system for creating custom Docker containers. That project has quite a way to go before it is ready for beta users though. [21:18:09] *bot sized projects [21:21:11] proc: mostly curious, what programming language is your bot written in? [21:23:27] it's a few different apps, the bulk is Ruby and *most* of the Ruby tasks eventually worked on Toolforge after some help from legoktm, but not all. The parts that use a puppeteer/a browser (due to missing MediaWiki APIs) are in JS and there's a third app in Go [21:24:57] the puppeteer stuff was the most fiddly, I remember spending some time last year (with you I think) trying to get it to work but couldn't get it consistently working [21:26:54] If you're comfortable maintaining a Debian VM then probably rolling your own docker image is going to be easier yeah [21:31:07] > The parts that use a puppeteer/a browser (due to missing MediaWiki APIs) -- me runs in the other direction from the idea of a bot doing html scraping [21:33:08] !log packagist-mirror Deleted php-security-checker.wmcloud.org proxy (T314677) [21:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Packagist-mirror/SAL [21:33:11] T314677: Redirect https://php-security-checker.wmcloud.org/ to https://php-security-checker.toolforge.org/ - https://phabricator.wikimedia.org/T314677 [21:33:23] !log redirects Added php-security-checker.wmcloud.org proxy (T314677) [21:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL [21:44:06] !log packagist-mirror Removed bd808 (self) from project [21:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Packagist-mirror/SAL [22:46:37] !log tools.bridgebot Updating to https://github.com/42wim/matterbridge/releases/tag/v1.25.2 [22:46:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [22:47:26] bd808: thank you for doing the redirect! [22:47:45] yw