[07:18:15] I'm probably being stupid again, but I can't ssh to toolforge since yesterday. All I get is "Connection closed by 185.15.56.62 port 22" [07:19:27] can someone tell me what exactly that means? [07:31:33] to be precise, the complete log ends with [07:31:34] debug1: Server accepts key: /home/a/.ssh/id_ed25519 ED25519 SHA256:[...] agent [07:31:34] debug3: sign_and_send_pubkey: using publickey-hostbound-v00@openssh.com with ED25519 SHA256:[...] [07:31:35] debug3: sign_and_send_pubkey: signing using ssh-ed25519 SHA256:[...] [07:31:35] debug3: send packet: type 50 [07:31:36] Connection closed by 185.15.56.62 port 22 [11:50:19] (FTR, the above messages were also reported at T393829 which was then marked as a duplicate of T393732) [11:50:20] T393829: Ssh to toolforge failing with "Connection closed by 185.15.56.62 port 22" - https://phabricator.wikimedia.org/T393829 [11:50:20] T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [11:52:20] !log root@tools-bastion-13:~# systemctl restart sssd-pam{,{,-priv}.socket} # all three failed with start-limit-hit / Start request repeated too quickly; T393732? [11:52:22] lucaswerkmeister: Unknown project "root@tools-bastion-13:~#" [11:52:38] !log tools root@tools-bastion-13:~# systemctl restart sssd-pam{,{,-priv}.socket} # all three failed with start-limit-hit / Start request repeated too quickly; T393732? [11:52:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:53:16] !log tools T393732 note: restart of sssd-pam.service actually failed, “may be requested by dependency only”; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) [11:53:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:57:57] right now it seems to be working for me again [14:07:39] Hi, I can't run `become `. Have we changed the way to login? [14:08:18] It says "sudo: a password is required" [14:08:22] try `dev.toolforge.org` instead of `login.toolforge.org`; [14:08:56] @kanashimi [14:10:29] !log tools root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, T393732? [14:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:10:33] T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [14:10:57] Thank you. It works. What happens to login.toolforge.org? [14:11:12] it’s been having issues for a few days (see the task above) :( [14:13:38] lucaswerkmeister: sigh, so my fix attempt from earlier clearly didn't work? [14:13:47] OK I see [14:13:49] seems so, yeah :( [14:14:07] I’ve just been blindly restarting the stuff in `systemctl --failed` in the hope that it helps at least temporarily [14:14:28] (though this time I actually wasn’t able to reproduce the error, `become` still worked for me. maybe it was cached somewhere) [14:32:52] https://phabricator.wikimedia.org/T393732#10809252, unfortunately any of those will probably have to wait until business hours on Monday [16:20:46] I'm getting Connection closed by 185.15.56.62 port 22 [16:22:02] !log tools systemctl restart sssd-{pam{,-priv},sudo}.socket # service-start-limit-hit, T393732? [16:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:22:06] T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [16:22:48] went right down again :( [16:23:17] you can try dev.toolforge.org instead, that might work better at the moment (re @marufhasan24: I'm getting Connection closed by 185.15.56.62 port 22) [17:33:58] !log tools root@tools-bastion-13:~# systemctl reset-failed sssd-{pam,sudo}.service && systemctl restart sssd-pam{,-priv}.socket # try to reset the rate limits this way (T393732) [17:34:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:34:02] T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [17:35:53] !log root@tools-bastion-13:~# systemctl restart sssd-sudo{,.socket} # looks like the reset-failed didn’t work properly, systemd didn’t even try to start the service again afaict (T393732) [17:35:53] lucaswerkmeister: Unknown project "root@tools-bastion-13:~#" [17:35:56] !log tools root@tools-bastion-13:~# systemctl restart sssd-sudo{,.socket} # looks like the reset-failed didn’t work properly, systemd didn’t even try to start the service again afaict (T393732) [17:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:43:01] (FTR, these restarts didn’t work, I left a comment on the task) [17:44:14] !status login.toolforge.org bastion unstable, dev.toolforge.org may work better [17:47:15] lucaswerkmeister: you need to start/stop the socket unit (sssd-sudo.socket), doing anything on the service itself is going to do nothing useful [17:47:42] I had restarted both (brace expansion) [17:48:15] but restarting just the socket didn’t seem to bring the service out of the rate-limited state [18:48:58] Looking for a FOSS website analytics solution that doesn't need a lot of storage space on the database, is easy to set up on Toolforge and that can be used with static webpages. Any recommendations? [18:56:52] or maybe there are statistics for 'tools-static' published somewhere already? [19:29:17] Hi, I might need some help. I changed my SSH on toolforge but I fail to connect to the server. [19:30:43] the main bastion server is having some issues at the moment; dev.toolforge.org might work better [19:31:11] oh that might explain [19:31:18] what about login.toolforge.org? [19:31:56] that’s the one with issues [19:32:32] any phab ticket related to it? [19:33:27] yes, T393732 [19:33:27] T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [19:34:14] thanks [19:46:52] by any chance, can we webrestart or even stop letaxobot.toolforge.org? [19:58:30] !log tools.letaxobot webservice restart (per request on behalf of tool maintainer, as the bastion is having issues atm) [19:58:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.letaxobot/SAL [19:58:47] seems to be responding again, yay [19:59:36] thanks lucaswerkmeister ; btw something is weird about this tool, like I have to webrestart it times to times or it fails at some point [20:00:47] hm, there’s a bunch of noise in error.log that doesn’t look related [20:01:05] possibly there was something else in the pod logs that is now gone due to the restart :S [20:02:18] a health check (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Health_checks) *might* help? but I’ve never tried them on a PHP / lighttpd webservice [20:02:29] (that’s a suggestion for later, once the bastion works again, not now ^^) [20:02:58] indeed, thanks I'll try later