[01:09:05] hey there cloud team! Montage users are reporting a spike in lock timeouts ("Lock wait timeout exceeded") leading to application errors (can't login, etc.). Any chance there's something going on or something that can be done? I've already restarted the service to no effect. [05:57:41] been getting database issues trying to access PetScan and Fountain too, both erroring out with something re. max_user_connections [06:01:35] i don't operate either of those tools but the single tool that i do operate which uses ToolsDB is fine somehow... [06:42:40] hi. my tool (fountain.toolforge.org) is having trouble connecting to the DB: `ERROR 1203 (42000): User s53098 already has more than 'max_user_connections' active connections`. I stopped the web service, `toolforge jobs list` returns nothing, yet I still get that same error if I try to connect manually via cli. what can I do? [06:49:32] is there any update on why toolsdb is down and when i would be fixed? [06:50:27] ie. https://phabricator.wikimedia.org/T409244 [06:59:32] Zache same with my tool [07:00:10] hi, I'll take a look at toolsdb [07:08:47] not super familiar with it and I'm new to the team, might take a while :D [07:15:52] ok a bunch of entries in show processlist show commits in status 'killed', investigating [07:17:38] doh, ok we're out of disk space [07:23:25] I'll use T409244 for tracking [07:23:26] T409244: Could not connect to database: User s51080 already has more than 'max_user_connections' active connections - https://phabricator.wikimedia.org/T409244 [07:29:11] I have the same problem (max connection) with userdb databases s51412__data and s51512__data [07:35:07] ack, I'm investigating the easiest way to recover [07:36:00] good luck! (and good night) [07:52:43] @jeremy_b cheers [07:54:41] !status toolsdb out of space [08:51:21] toolsdb should be up back and running, anyone having issues to reconnect? [08:52:52] connect is okay, but read-only [08:53:39] duh, let me change that [08:54:31] done [08:54:44] thx [09:04:19] !status ok [09:17:49] leloiandudu: seems like Fountain is still down? might need a small kick [09:18:46] yep I turned the service off [09:18:53] loading now! thanks everyone [09:19:16] !log jeanfred@tools-bastion-15 tools.heritage Deploy 7a24e4b, 4d38a0b, 665d6f6 (Localisation updates from https://translatewiki.net) [09:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:19:47] !log jeanfred@tools-bastion-15 tools.heritage Deploy 644f974 (Re-lock and upgrade all dependencies in Pipfile.lock) for T409003) [09:19:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:19:59] !log jeanfred@tools-bastion-15 tools.heritage Deploy ed99d15 (Switch from Pipenv to uv as project manager) [09:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:20:22] I see my backups haven't been running since 2024 oops. `mysqldump: command not found`. do we have a replacement? [09:24:14] not a good one, but see https://phabricator.wikimedia.org/T378882 [12:26:36] https://github.com/cluebotng/component-configs/blob/main/cluebotng.yaml#L92 works reliably for mysql dumps (note, those are public) for me in multiple tools (mariadb container, or your own if needing buildpack) [18:23:09] >18:17:02 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='logging-logstash-02.logging.eqiad1.wikimedia.cloud', port=9200): Max retries exceeded with url: /logstash-*/_search (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] No route to host')) [18:23:13] beta CI unhappy atm [18:28:11] bastion host appears to just be hanging, for about 10min or so, is nfs happy? [18:29:15] (just tried from a random server in case it was my wifi and ssh is hanging/failing repeatedly) [18:29:54] 👀 [18:29:59] works for me [18:30:09] (using login.toolforge.org -> bastion-15) [18:30:36] there's some ruby processes doing io though [18:30:54] that might be you Damianz xd, a vim too [18:30:57] they are stuck in D state [18:31:22] I was dumping some logs out a while ago... I see nfs nearly full just fired also [18:31:34] hmm, segfaults in the logs, though not right now [18:31:37] https://www.irccloud.com/pastebin/4lO60Q7W/ [18:32:01] nothing in journal [18:32:06] (on a quick look) [18:32:17] hmm, let me try from my german box [18:32:36] nfs is 85% [18:32:59] I wonder if my account is just stuck with ulimits [18:33:43] bah [18:34:11] So `damian` hangs after ssh tries to get a new tty, `damian-scripts` works directly... so guessing there's some stuck procs there [18:34:27] bd808: do you want to email cloud-announce or would I be able to (not sure how moderation is configured)? [18:35:05] yep, I think the D processes are hanging your user session [18:36:10] that file that was open is let's say quite large [18:36:51] guess I'll come back later and hope it got un-stuck, got the bot running without debugging again now [18:37:03] JJMC89: as long as you are subscribed to the list, any emails from you will go to the moderation queue [18:37:26] hmm, lsb_release command is stuck :/ [18:39:01] Damianz: you have 10 ssh sessions open [18:40:43] !log taavi@tools-bastion-15:~ $ sudo loginctl terminate-user damian [18:40:45] taavi: Unknown project "taavi@tools-bastion-15:~" [18:40:47] Those must be all hung, I've literally none open in terminal now... bounced to a different wifi to drop all connections a bit ago [18:40:47] !log tools taavi@tools-bastion-15:~ $ sudo loginctl terminate-user damian [18:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:41:01] ^ which seems to have gotten it unstuck [18:41:47] * Damianz tries to login [18:41:58] Yeah, now working [18:42:50] \o/ [18:44:05] Not sure how I managed to get 10, for sure 1 or 2 hung... probably a combination of trying to parse out a big log and dodgy wifi. Heading out in a min so you're all safe, for now (: [18:44:43] thanks taavi [18:54:49] ty taavi [19:13:16] JJMC89: sorry I was late seeing your ping. Thanks for pushing the policy vote and announcement forward! :)