[08:41:40] !log tools.wikibugs restart irc and gerrit jobs [08:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:43:51] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [08:43:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [11:04:12] !log taavi@tools-sgebastion-11 tools.wikibugs toolforge jobs restart gerrit [11:04:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [12:43:09] !log tools reboot tools-sgegrid-shadow due to high number of procs in D state [12:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:25:56] !log melos@tools-sgebastion-10 tools.stewardbots Restarted SULWatcher [13:26:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [13:57:48] !log paws add wikibase-cli T358649 [13:57:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [13:57:53] T358649: Add wikibase-cli to paws - https://phabricator.wikimedia.org/T358649 [14:55:29] !log loggerdiscordbot delete project in favor of new project "discordbots" T358337,T358427 [14:55:30] dhinus: Unknown project "loggerdiscordbot" [14:55:31] T358337: Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337 [14:55:31] T358427: Request creation of discordbots VPS project - https://phabricator.wikimedia.org/T358427 [14:56:33] !log admin delete project "loggerdiscordbot" in favor of new project "discordbots" T358337,T358427 [14:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:55:01] taavi: any notes on what I should look into regarding those irc and gerrit restarts for wikibugs? I haven't looked at the logs yet, but hope to soon. [15:56:02] bd808: the gerrit ssh connection seems to be the most unreliable part currently. [15:57:34] interesting. I've been wondering about trying Paramiko instead of shelling out. In theory that could give the code more visibility into protocol errors and network interruptions [15:59:06] The reboot of bridgebot that l.ucas did right after you rebooted the wikibugs irc process looks like network blip stuff, but that should be handled by znc now in wikibugs (at least in theory). [16:00:39] honestly I'm starting to wonder if we have some cloudvps-wide issue with long-lived TCP connections. I do see other IRC bots flapping relatively frequently too. [16:09:03] it does feel like the bots usually have problems around the same time, yes [16:09:49] including non-IRC bots apparently (stewardbots, reading EventStreams according to AntiComposite) [16:12:34] the stewardbots are IRC bots reading EventStreams, they're having problems staying connected to IRC and getting 429 errors from EventStreams [16:13:01] on the other hand, the CVNBots (which run from floating IPs on Cloud VPS) are fine [16:15:08] The N layers of software defined network to get from a Kubernetes container out to the internet probably doesn't help in reasoning about where to start looking either. [16:17:05] AntiComposite's anecdote about vps with its own ip vs other things is a potentially interesting thing to try and think about proving. [16:17:54] hm. I wonder if it's "not on toolforge k8s", or "has a floating ip" that's causing it to not have problems [16:20:16] or both! [16:20:31] they definitely follow a different network path [16:20:37] (floating IP vs not) [16:21:39] * bd808 has asked the network nerd snipe question and now waits for the aha! in 3-10 days [16:22:52] bd808: one thing we could try is to set up a 'special' k8s worker with a floating ip and pin wikibugs to that, and see if it's still having issues [16:26:02] can we think of a more general "how stable is a TCP/IP stream connected from X to Y" test that could measure different paths over time? [16:27:00] * bd808 sees that the wikibugs bouncer flapped during this conversation [16:27:55] I'm pretty sure cloudgw has something to do with all this [16:33:03] two nerds decided to meet tomorrow for a network debug session [18:12:31] !log lucaswerkmeister@tools-sgebastion-10 tools.lexeme-forms deployed e7a659802c (l10n updates: ar, io, lb) [18:12:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [18:13:31] !logs tools deploy toolforge-webservice 0.103.4 (T319797) [18:13:31] T319797: Migrate huggle from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319797 [19:40:52] Hi [19:41:46] I noticed I opened a ticket my name tool https://phabricator.wikimedia.org/T357555 [19:42:03] Is there something I should do? [19:43:01] komla: ^ just noticed in that description the https://grid-deprecation.toolforge.org link is wrong (gonna fix it) [19:43:40] Looks like they're wrong in many tasks... [19:43:55] ~16 of them [19:45:20] GergesShamon: The TLDR is yes, there is. Did you read the blog post in the ticket? [19:46:37] No (re @wmtelegram_bot: GergesShamon: The TLDR is yes, there is. Did you read the blog post in the ticket?) [19:46:59] I would suggest starting there :) [19:47:14] I basically no longer use the lighttpd-gergesbot job [19:49:16] Are you able to stop and disable the job then? [19:51:32] reedy: do you mean the description on the portal about the timelines? let me know so I can assist. [19:52:22] komla: no, 16 of them had "See: https://grid-deprecation.toolforge.org/t/abbe98tools" [19:52:25] I've fixed them now :) [19:52:37] https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-TASK-luxveownk4amfqe/ [19:53:19] Just incase you used a script or something to create them... there's possibly a bug :D [19:53:50] oh okay. Cool! Thanks! [20:10:49] I don't remember how I did it, do you know how to disable it? (re @wmtelegram_bot: Are you able to stop and disable the job then?) [20:25:54] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Using_the_webservice_command [20:26:07] `webservice stop` in theory should be enough to stop it.. [20:27:46] or `toolforge webservice --backend=gridengine stop` [20:44:40] @GergesShamon: if you have abandoned the tool and no longer need it for anything, you can also comment on the task and someone with admin rights can go and archive the whole thing for you. [22:07:58] musikanimal hey, I'm working on the CCS ticket T358541 , are you around for testing (my phab name is bking, I'm an SRE on the search team) [22:07:58] T358541: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541 [22:08:13] sure! [22:08:19] thanks for looking into it :) [22:09:37] musikanimal np, sorry it took so long to fix...let me know if it's working now. the curls given in the ticket seem to be working for me [22:10:48] inflatador: working now! thank you :D [22:11:25] anytime, again sorry it took so long...will update ticket shortly [23:31:25] inflatador: any idea why we might be getting duplicate results now? https://phabricator.wikimedia.org/T358541#9598745