[01:47:30] <GreenC>	 I need help how to migrate this GridEngine cron job to Kubernetes equiv
[01:47:33] <GreenC>	 o /data/project/botwikiawk/stdout.wcc -v "AWKPATH=.:/data/project/botwikiawk/BotWikiAwk/lib" -v "PATH=/sbin:/bin:/usr/sbin:/usr/loca
[01:47:34] <GreenC>	 l/bin:/usr/bin:/data/project/botwikiawk/BotWikiAwk/bin" -wd /data/project/botwikiawk /data/project/botwikiawk/wcc.awk
[01:48:38] <GreenC>	 It's unclear how to specify the location of the stderr and stdout files because I need them to be somewhere else than the home directory. Also unclear how to include the environment variables like AWKPATH.
[01:49:48] <GreenC>	 Sorry the above didn't copy-paste right trying again:
[01:51:42] <JJMC89>	 Currently you cannot change the stderr/out files. T304421 is the request to change that.
[01:51:43] <stashbot>	 T304421: Allow customizing the out/err files with toolforge-jobs - https://phabricator.wikimedia.org/T304421
[01:54:56] <GreenC>	 OK re: stderrout. (can't copy paste maybe too long for chat). Can environment be specified like AWKPATH?
[01:55:18] <GreenC>	 Or PATH
[01:56:33] <JJMC89>	 You might be able to do --command "AWKPATH=foo PATH=bar run-the-bot", but I haven't tried it.
[01:56:48] <JJMC89>	 I usually use wrapper bash scripts.
[01:57:41] <GreenC>	 let me try..
[02:00:39] <wm-bb>	 <jeremy_b> tell you who has an open file handle? (re @lucaswerkmeister: what is lsof supposed to do in this context?)
[02:01:16] <wm-bb>	 <jeremy_b> of course I do prefer just setting it right to begin with
[07:56:02] <wm-bb>	 <lucaswerkmeister> yeah, you’d need to run lsof on all Toolforge hosts to know if there are open references or not…
[07:56:33] <wm-bb>	 <lucaswerkmeister> (and even then – what do you do if you find an open reference with lsof? the same thing you should’ve done anyways, make a new file that has the right permissions to begin with as you say ^^)
[08:07:04] <arturo>	 JJMC89: thanks for your comments on T304893 that's valuable feedback. I'll get back to you as soon I have a bit of time to elaborate (currently focusing on other projects)
[08:07:05] <stashbot>	 T304893: Rethink job retries in case of failures - https://phabricator.wikimedia.org/T304893
[08:49:31] <arturo>	 !log admin-monitoring cleaning up a bunch of leaked VMs on "BUILD" status (T320232)
[08:49:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL
[08:49:35] <stashbot>	 T320232: nova-fullstack leaking VMs - https://phabricator.wikimedia.org/T320232
[09:03:23] <arturo>	 !log admin-monitoring restarted nova-fullstack.service on cloudcontrol1005 (T320232)
[09:03:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL
[09:03:29] <stashbot>	 T320232: nova-fullstack leaking VMs - https://phabricator.wikimedia.org/T320232
[09:45:31] <arturo>	 !log admin restarting rabbitmq-server.service @ cloudrabbit1002 (T320232)
[09:45:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[09:45:35] <stashbot>	 T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232
[10:19:16] <arturo>	 !log admin restarting nova-conductor in all 3 cloudcontrols (T320232)
[10:19:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:19:20] <stashbot>	 T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232
[10:24:40] <arturo>	 !log admin stopping rabbitmq-server.service @ cloudrabbit1002 (T320232)
[10:24:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:24:44] <stashbot>	 T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232
[11:33:21] <arturo>	 !log admin rabbitmq-server.service @ cloudrabbit1002 is again up and running (T320232)
[11:33:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[11:33:25] <stashbot>	 T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232
[13:02:10] <taavi>	 !log taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # T320240
[13:02:11] <stashbot>	 taavi: Unknown project "taavi@cloudcontrol1005"
[13:02:11] <stashbot>	 T320240: Archive/delete tool oncall - https://phabricator.wikimedia.org/T320240
[13:02:13] <taavi>	 !log tools taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # T320240
[13:02:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[13:40:38] <andrewbogott>	 !log admin dhinus is resetting rabbitmq cluster in an attempt to resolve a suspected (by Andrew) split-brain
[13:40:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[15:25:45] <valhallasw>	 Hi folks, I'm having some trouble logging in to login.toolforge.org & bastion.wmflabs.org with PuTTY:
[15:25:45] <valhallasw>	 Authenticating with public key "valhallasw for labs" from agent
[15:25:45] <valhallasw>	 Server refused public-key signature despite accepting key!
[15:26:04] <valhallasw>	 The key used matches the first one listed in my https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack , so the key itself seems correct
[15:26:12] * taavi pokes at the relevant logs
[15:27:00] <valhallasw>	 I'll try exporting the key from my password manager in the meanwhile, maybe that's what causing the issue (although it worked for years on my old computer)
[15:27:16] <taavi>	 Oct  7 15:21:36 tools-sgebastion-10 sshd[14364]: Failed publickey for valhallasw from x.x.x.x port 52001 ssh2: ED25519 SHA256:+Y9fBaLMPwPpkpoU8XHG4qZ9MdXzitQEjfNBgoS7kag
[15:29:07] <taavi>	 valhallasw: as far as I can see the key you're trying to use is an ed25519 key, but the one in ldap is an rsa one
[15:30:19] <valhallasw>	 Hmm, exporting and using it as-is does work. Maybe some version incompatibility between this version of PuTTY and of KeeAgent...
[15:30:38] <valhallasw>	 I do have a ED25519 key in KeeAgent but for a different host. Maybe it's mixing up the keys somehow
[15:31:31] <valhallasw>	 Thanks for checking the logs, @taavi :-)
[15:34:28] <valhallasw>	 Indeed updating Keepass/KeeAgent seems to have fixed it!
[17:43:39] <wm-bot>	 !log tools.bridgebot <lucaswerkmeister> commented out static-cleaner cronjob, created toolforge-jobs periodic job instead (T319609)
[17:43:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL
[17:57:45] <wikibugs>	 !log tools.wikibugs Updated channels.yaml to: 5c488e7d414102a949371d105d3ec8cb85872976 channels: route Grid-Engine-to-K8s-Migration to -cloud-feed
[17:57:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL
[18:48:14] <PeterBowman>	 hello, are email notifications in toolforge-jobs actually working? I can successfully send messages to myself via tools.tool-name@tools.wmflabs.org, which I think is the address being used by the framework under the hood, but the --emails option has no effect for me
[18:59:41] <JJMC89>	 I know the failure ones aren't. I reported that at T317998.
[18:59:41] <stashbot>	 T317998: No emails on pod failure - https://phabricator.wikimedia.org/T317998
[19:12:30] <wm-bb>	 <harej> Random question, but what storage format do Wikimedia servers use? ext4? btrfs?
[19:14:58] <Reedy>	 I think mostly ext4