[01:47:30] I need help how to migrate this GridEngine cron job to Kubernetes equiv [01:47:33] o /data/project/botwikiawk/stdout.wcc -v "AWKPATH=.:/data/project/botwikiawk/BotWikiAwk/lib" -v "PATH=/sbin:/bin:/usr/sbin:/usr/loca [01:47:34] l/bin:/usr/bin:/data/project/botwikiawk/BotWikiAwk/bin" -wd /data/project/botwikiawk /data/project/botwikiawk/wcc.awk [01:48:38] It's unclear how to specify the location of the stderr and stdout files because I need them to be somewhere else than the home directory. Also unclear how to include the environment variables like AWKPATH. [01:49:48] Sorry the above didn't copy-paste right trying again: [01:51:42] Currently you cannot change the stderr/out files. T304421 is the request to change that. [01:51:43] T304421: Allow customizing the out/err files with toolforge-jobs - https://phabricator.wikimedia.org/T304421 [01:54:56] OK re: stderrout. (can't copy paste maybe too long for chat). Can environment be specified like AWKPATH? [01:55:18] Or PATH [01:56:33] You might be able to do --command "AWKPATH=foo PATH=bar run-the-bot", but I haven't tried it. [01:56:48] I usually use wrapper bash scripts. [01:57:41] let me try.. [02:00:39] tell you who has an open file handle? (re @lucaswerkmeister: what is lsof supposed to do in this context?) [02:01:16] of course I do prefer just setting it right to begin with [07:56:02] yeah, you’d need to run lsof on all Toolforge hosts to know if there are open references or not… [07:56:33] (and even then – what do you do if you find an open reference with lsof? the same thing you should’ve done anyways, make a new file that has the right permissions to begin with as you say ^^) [08:07:04] JJMC89: thanks for your comments on T304893 that's valuable feedback. I'll get back to you as soon I have a bit of time to elaborate (currently focusing on other projects) [08:07:05] T304893: Rethink job retries in case of failures - https://phabricator.wikimedia.org/T304893 [08:49:31] !log admin-monitoring cleaning up a bunch of leaked VMs on "BUILD" status (T320232) [08:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL [08:49:35] T320232: nova-fullstack leaking VMs - https://phabricator.wikimedia.org/T320232 [09:03:23] !log admin-monitoring restarted nova-fullstack.service on cloudcontrol1005 (T320232) [09:03:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL [09:03:29] T320232: nova-fullstack leaking VMs - https://phabricator.wikimedia.org/T320232 [09:45:31] !log admin restarting rabbitmq-server.service @ cloudrabbit1002 (T320232) [09:45:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:45:35] T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232 [10:19:16] !log admin restarting nova-conductor in all 3 cloudcontrols (T320232) [10:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:19:20] T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232 [10:24:40] !log admin stopping rabbitmq-server.service @ cloudrabbit1002 (T320232) [10:24:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:24:44] T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232 [11:33:21] !log admin rabbitmq-server.service @ cloudrabbit1002 is again up and running (T320232) [11:33:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:33:25] T320232: nova-fullstack leaking VMs: rabbitmq may have problems - https://phabricator.wikimedia.org/T320232 [13:02:10] !log taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # T320240 [13:02:11] taavi: Unknown project "taavi@cloudcontrol1005" [13:02:11] T320240: Archive/delete tool oncall - https://phabricator.wikimedia.org/T320240 [13:02:13] !log tools taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # T320240 [13:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:40:38] !log admin dhinus is resetting rabbitmq cluster in an attempt to resolve a suspected (by Andrew) split-brain [13:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:25:45] Hi folks, I'm having some trouble logging in to login.toolforge.org & bastion.wmflabs.org with PuTTY: [15:25:45] Authenticating with public key "valhallasw for labs" from agent [15:25:45] Server refused public-key signature despite accepting key! [15:26:04] The key used matches the first one listed in my https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack , so the key itself seems correct [15:26:12] * taavi pokes at the relevant logs [15:27:00] I'll try exporting the key from my password manager in the meanwhile, maybe that's what causing the issue (although it worked for years on my old computer) [15:27:16] Oct 7 15:21:36 tools-sgebastion-10 sshd[14364]: Failed publickey for valhallasw from x.x.x.x port 52001 ssh2: ED25519 SHA256:+Y9fBaLMPwPpkpoU8XHG4qZ9MdXzitQEjfNBgoS7kag [15:29:07] valhallasw: as far as I can see the key you're trying to use is an ed25519 key, but the one in ldap is an rsa one [15:30:19] Hmm, exporting and using it as-is does work. Maybe some version incompatibility between this version of PuTTY and of KeeAgent... [15:30:38] I do have a ED25519 key in KeeAgent but for a different host. Maybe it's mixing up the keys somehow [15:31:31] Thanks for checking the logs, @taavi :-) [15:34:28] Indeed updating Keepass/KeeAgent seems to have fixed it! [17:43:39] !log tools.bridgebot commented out static-cleaner cronjob, created toolforge-jobs periodic job instead (T319609) [17:43:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [17:57:45] !log tools.wikibugs Updated channels.yaml to: 5c488e7d414102a949371d105d3ec8cb85872976 channels: route Grid-Engine-to-K8s-Migration to -cloud-feed [17:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [18:48:14] hello, are email notifications in toolforge-jobs actually working? I can successfully send messages to myself via tools.tool-name@tools.wmflabs.org, which I think is the address being used by the framework under the hood, but the --emails option has no effect for me [18:59:41] I know the failure ones aren't. I reported that at T317998. [18:59:41] T317998: No emails on pod failure - https://phabricator.wikimedia.org/T317998 [19:12:30] Random question, but what storage format do Wikimedia servers use? ext4? btrfs? [19:14:58] I think mostly ext4