[05:35:55] Who is the maintainer for superset.wmcloud.org? Something is misconfigured with the proxy I think [05:36:10] https://superset.wmcloud.org/ [05:43:22] the WMCS team - mostly Rook, I think [05:47:35] thanks, don't know how long that has been down but I'll probably create a ticket at some point [08:41:32] the k8s cluster there seems non-responsive (though I might be doing things wrong, the docs are not very detailed), so please open a task yes [11:12:36] Yes please open a ticket for superset [11:46:22] Probably don't need the ticket any longer. superset appears back [12:44:42] !log taavi@tools-bastion-12 tools.wikibugs toolforge jobs restart irc [12:44:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [12:59:12] !log bsadowski1@tools-sgebastion-10 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss [12:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:47:23] !log bd808@tools-sgebastion-10 tools.wikibugs-testing Build new container from changes in MR!27 and restarted web, irc, gerrit, and phorge tasks [16:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs-testing/SAL [16:57:13]  My application runs on tools-sgebastion-10, and today (and part of yesterday) it's apparently been hanging a lot.  Recent error log record numerous restarts and kill with signal 9.  Anyone know what might be happening there (or if there have been any recent changes there)? [17:04:41] JMarkOckerbloom: what sort of process are you trying to run on the bastion that is leading to troubles? [17:05:18] I ask because there are things like Wheel of Misfortune that actively look for rogue processes to kill there [17:12:45] This is the ftl service (a CGI script implemented in perl).  It's been running for a number of years, but today and yesterday it seems to be hanging and getting restarted a fair bit. [17:18:00] JMarkOckerbloom: if it is running on a bastion for a number of years it has been in violation of Toolforge rules for several years. Webservices, bot jobs, etc should always run on Kubernetes (and formerly gird engine). The bastions are for humans to start distributed tasks and do light file editing. [17:21:42] It looks like ftl is actually running its webservice on Kubernetes [17:21:56] but it is restarting a lot... [17:22:51] I wonder if there is something about the new default health checks that is causing problems there? [17:24:11] the output in $HOME/error.log doesn't mean much to me, but it also doesn't look like crash logs [17:26:36] JMarkOckerbloom: this is probably worth a Phabricator task. The `kubectl get pod` output shows 10 restarts in the last 138 minutes so something is up for sure. [17:27:11] I have to run to an IRL meeting, but I can poke around a bit later in my day to see if I can spot anything in particular. [17:48:30] Thanks!  (I'm in meetings as well, which means my responses may be delayed, but I appreciate the help.) [17:49:36] (and yes, to clarify: it's running on Kubernetes; the bastion is what I sign in on. Sorry for the confusion earlier.) [21:54:51] JMarkOckerbloom: I started T361652 where I also dropped a tiny bit of a pointer to the thing to look into -- why is the liveness probe failing? [21:54:52] T361652: "ftl" tool's perl5.32 webservice pod being frequently killed due to liveness probe failures - https://phabricator.wikimedia.org/T361652 [23:56:23] !log bd808@tools-sgebastion-10 tools.ftl Restarted webservice to pick up new service.template defined RAM and CPU limits. (T361652) [23:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ftl/SAL