[01:24:54] Something seems not working. I had a job that seemed to be hung (anomiebot-4-lhlxj), doing a `kubectl exec` into it and trying to tail a file hung. So I did a `toolforge-jobs delete`, and it went away from there, but the bot was still there, stuck in a "Terminating" state. When I resubmitted the job it got a new pod (anomiebot-4-n754c), but that one seems to be stuck in a "Pending" state now. [01:26:19] err, s/the bot was still there/the pod was still there/ [01:32:48] I tested it, and while `toolforge jobs list` says the job is continuous it terminated when the script completed. so it just seems to be an indication fault (re @MaartenDammers: I just submitted a one-off job. [01:32:48] $ toolforge jobs run myjob --image python3.11 --command ./mycommand.sh [01:32:49] It became a continuous j...) [07:20:48] !log tools.wikibugs restarted the gerrit job, was not reporting updates since last friday [07:20:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:15:18] AntiComposite: there's a bug on the cli side that reports the one-offs as continuous, but it's just the reporting, the job is still a oneoff [12:56:42] !log paws restarting the k8s workers 1, 2 and 4 [12:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [13:44:21] Re my thing from last night, it seems to have finally resolved the "Pending" at 04:27 UTC. 🤷 [13:52:24] it might be related to T404584, we are investigating [13:52:24] T404584: Address tools NFS getting stuck with processes in D state - https://phabricator.wikimedia.org/T404584 [16:22:09] !log tools reboot old bastions to kick long-living connections into newer ones [16:22:12] !log bastion reboot old bastions to kick long-living connections into newer ones [16:22:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bastion/SAL [16:24:11] !log admin update nova-fullstack to run on trixie image [16:24:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:51:21] Warning — Potential security breach! The host key does not match the one WinSCP has cached for this server: login.toolforge.org (port 22) This means that either the server administrator has changed the host key, or you have actually connected to another computer pretending to be the server. If you were expecting this change, trust the new key and want to continue [16:51:21] connecting to [16:51:21] the server, either select Update to update cache, or select Add to add the new key to the cache while keeping the old one(s). Please advise, as I was not expecting this change. Thanks [16:51:56] this was announced on the cloud-announce@ mailing list: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/I4M335NMS6CT23AT23P5PL4N3NUI2YMT/ [17:19:55] !log paws hard rebooting paws worker nodes one by one [17:19:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [17:31:34] I'm now having problem logging using SSH. I updated the ssh keys in idm and admin.toolforge before coming here and seeing the announcement. I'm receiving: "debug1: No more authentication methods to try. ederporto@login.toolforge.org: Permission denied (publickey)." [17:35:38] wait, what did you do in idm and admin.toolforge? you shouldn’t have to do anything there [17:41:09] I removed old or inactive ssh connections there. I put my new ssh public key in admin.toolforge and updated in idm. Following directions of https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Permission_denied_(publickey) (re @lucaswerkmeister: wait, what did you do in idm and admin.toolforge? you shouldn’t have to do anything there) [17:41:57] ok, I see [17:42:08] I thought maybe you misunderstood the announcement 😅 [17:42:31] no idea what’s wrong then, I think some other people know more about how to debug SSH connections [17:51:09] Try "`ssh-keygen -R login.toolforge.org"` (re @ederporto: I removed old or inactive ssh connections there. I put my new ssh public key in admin.toolforge and updated in idm. Following di...) [17:56:49] still the same output (re @albertoleoncio: Try "ssh-keygen -R login.toolforge.org") [18:17:49] Found it! SMH! I used Putty to create the SSH key pair, and it needs to be OpenSSH, and I forgot to convert it (and the documentation didn't reminded me of that!). Thank you all for trying [18:48:27] Trying to use mysql on the new bastion14 I get this: [18:48:27] $ mysql [18:48:28] ERROR 2026 (HY000): TLS/SSL error: Certificate verification failure: The certificate is NOT trusted. [18:48:30] Is this a known issue? How to solve it? [18:49:53] it’s a known issue but AFAIK it’s supposed to be resolved already via `disable-ssl=true` in `replica.my.cnf` [18:50:14] maybe that wasn’t added to your `replica.my.cnf` for some reason? [18:50:48] (or perhaps you added an extra config setion to the file, and when `disable-ssl=true` was appended to the end of the file, it ended up in the wrong section instead of in `[client]`?) [18:51:31] hm, your `~multichill/replica.my.cnf` looks normal to me at least [18:51:42] I seem to have two cnf files (re @lucaswerkmeister: (or perhaps you added an extra config setion to the file, and when disable-ssl=true was appended to the end of the file, it ende...) [18:52:08] huh [18:52:14] The .my.cnf from 2014 and another one. [18:52:57] I would try deleting (or renaming) the ancient one (the user+password is the same anyway) [18:53:24] only a handful of other users seem to have that and for half of them it’s a symlink to `replica.my.cnf` again [18:54:09] (and by “a handful” I mean 42, but still, that’s not a lot compared to all toolforge users) [18:54:22] I'm from the Toolserver days (re @lucaswerkmeister: only a handful of other users seem to have that and for half of them it’s a symlink to replica.my.cnf again) [18:54:51] So someone decided to use a different file setup at some point in time and didn't do the clean up? 😊 [18:55:18] no idea [18:55:31] taavi also has a .my.cnf and I *think* they’re not from the toolserver days ^^ [18:57:09] actually, wait, if you delete (or rename) `.my.cnf` then that won’t fix `mysql` anyway [18:57:24] is there a reason you’re not using the `sql` utility? (which adds `replica.my.cnf`) [18:57:37] I didn’t pay attention earlier to the command you’re running [19:42:45] @MaartenDammers: re: didn't do the cleanup, I guess yes and that was circa 2014 :) [19:44:47] Never heard of it, never used it, mysql worked fine for the last 20+ years? I'll have a look (some other day) (re @lucaswerkmeister: is there a reason you’re not using the sql utility? (which adds replica.my.cnf)) [19:44:58] Rusty technical debt :P (re @wmtelegram_bot: @MaartenDammers: re: didn't do the cleanup, I guess yes and that was circa 2014 :)) [19:47:21] Toolforge didn't exist 20 years ago. If anyone is expecting that their workflows from toolserver should be unchanged 20 years later I guess we are still using ssh. Most everything else is somehow different. [19:47:42] Anyway, thanks for the help, was able to update https://commons.wikimedia.org/wiki/Commons:Geograph_Britain_and_Ireland/Deleted_files [19:51:23] Bot did around 8M uploads, about 3K deleted so about 0,04% deleted. Not that bad 😊