[01:34:04] !log admin resetting eqiad1 rabbitmq in hopes of resolving neutron double message warnings [01:34:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:48:13] https://t.me/alexaappdesign [03:56:12] Anybody know anything about https://betacommand-dev.toolforge.org/ [03:56:39] It's returning a "No webservice" error [03:57:37] Apparently it's something that folks at https://en.wikipedia.org/wiki/Wikipedia:Did_you_know depend on [03:59:07] I don't see anything in toolhub [04:04:20] Oh, it looks like https://phabricator.wikimedia.org/T319587 provides the answer [04:04:47] sounds like it got shut down because it was never migrated off of grid. [08:15:38] Big thanks 🤩 (re @bd808: You need to use a venv. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Virtual_environments) [15:56:52] !log paws upgrade OpenRefine T356448 [15:56:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [15:56:57] T356448: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T356448 [15:58:26] roy649: it was not shut down (that will be the 14th), it might have failed by itself and the maintainer did not turn it back on [16:14:04] OK, thanks. I read through the phab ticket. TL;DR is "I'm taking my ball and going home" [16:16:34] Hmmm, GridEnginePocolypse is Valentine's Day. [16:16:55] yep xd [16:17:42] roy649: Is there any public discussion of the tool ? (I might be able to write a quick replacement if it's not too involved) [16:18:46] * (of what the tool does) [16:18:53] https://en.wikipedia.org/wiki/Wikipedia_talk:Did_you_know#QPQ_tool_is_down? [16:19:48] Just based on the fact that it's using SVN and CGI, I'm guessing this is really old legacy code. [16:20:42] I would not at all be surprised if it's in Python2. [16:21:16] Life moves on. We can't cling to obsolescent technologies forever. [16:22:05] I'm sure if you were willing to write a replacement, you would have some fans at DYK. [16:23:16] No promises right now, but I'll try and write one from scratch over this week [16:23:38] :) [16:27:15] Before you do that, my suggestion is you start a conversation on that WT:DYK thread about what exactly it should do. I don't use the tool myself, but there are discussion going on right now about possibly changing how we handle the QPQ requirement. [16:27:38] It would be silly to write some new code and then discover the requirements have changed out from under you. [16:29:51] Ah thanks for the heads up, I did not know that the rules were changing (I'll start a conversation asking what the tool did/would do) (re @wmtelegram_bot: Before you do that, my suggestion is you start a conversation on that WT:DYK thread about what exactly it should do. I...) [16:44:40] !log bd808@tools-sgebastion-11 tools.gitlab-account-approval Temporarily stopping bot job while I look into a malformed LDAP filter crash. [16:44:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-account-approval/SAL [17:14:30] Btw, where should I request that phetools be kept alive after the Grid Engine shutdown ? (I'm looking at migrating the important bits, but it seems like it will take a while) [17:16:01] (For starters it seems to be using a python file as a database) 🙈 [17:16:40] @sohom_datta: comment on the phab task about moving the tool off of grid engine. That can buy you another month. [17:17:12] For phetools, the task is https://phabricator.wikimedia.org/T319965 [17:24:13] !log bd808@tools-sgebastion-11 tools.gitlab-account-approval Rebuilt image to pick up fixes for T357328 [17:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-account-approval/SAL [17:34:06] !log bd808@tools-sgebastion-11 tools.gitlab-account-approval Bot restarted after deploying T357328 fix [17:34:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-account-approval/SAL [18:39:15] Hi! Are there any known issues with the NFS (in general or on the bastion hosts)? For me it hangs deleting a certain file, see https://phabricator.wikimedia.org/T357340 [18:43:37] CountCount: there was another user with a magically un-deleteable (even by roots on the NFS server directly) file a few days ago. Let me see if I can find the ticket... [18:45:51] T357098 [18:45:52] T357098: [tools.meta] can't delete file inside cache/wikimedia-wikis.dat - https://phabricator.wikimedia.org/T357098 [18:53:15] bd808: thx. [18:54:07] CountCount: I'm looking at your file now from the NFS server side. I will try to rm it for you from there. [18:56:06] CountCount: no joy, but I added some data to the task you created [18:59:33] bd800 thx for trying. It seems likely that the three issues that have been reported so far are just the tip of the iceberg and that there are many more files which are un(re)movable. [19:00:02] I will work around it for now by using a different binary name. [19:01:48] I would try rebooting the NFS server, but I know that functionally breaks all of Toolforge for an extended period of time [19:07:59] I understand. Though processes just hanging in local filesystem sycalls is not a good sign. [19:40:35] looks like more NFS issues - T357342 [19:40:35] T357342: Cannot delete directory from incolabot project on Toolforge - https://phabricator.wikimedia.org/T357342 [20:59:44] it looks like inbound requests to zoomviewer are being throttled -- how do I fix that? https://phabricator.wikimedia.org/T320210#9532676 [21:08:47] TimStarling: there is a global ratelimit on Toolforge ingress from , but it is currently set to `rate=50r/s` which I think is higher than what you are reporting? [21:09:19] The next level in would be the Kubernetes ingress for that tool. I am not aware of any deliberate rate limiting that happens there [21:11:28] and it would give 429 responses if the limit were hit, not queue them, right? [21:12:05] yeah, that seems to be the config in nginx [21:13:00] "Excessive requests are delayed until their number exceeds the maximum burst size in which case the request is terminated with an error." [21:13:16] says the nginx manual [21:17:04] `nodelay` would change that [21:17:37] we don't set burst, so I'm not sure that nodelay would change anything effectively [21:17:52] `limit_req zone=toolforge burst=<%= @rate_limit_requests %> nodelay;` [21:18:09] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/dynamicproxy/templates/urlproxy.conf#216 [21:19:10] oh that is confusing. I didn't notice that second limit_req [21:22:19] The second line was for T313131 apparently -- https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/b25f58e91f36921c69a0fd9d2e50ddd48f370ff5 [21:22:43] what is the correct fqdn to log in to the proxy? [21:22:59] it's a VPS, not a pod, right? [21:23:49] tools-proxy-05.tools.eqiad1.wikimedia.cloud and tools-proxy-06.tools.eqiad1.wikimedia.cloud per https://openstack-browser.toolforge.org/project/tools [21:24:34] I think -06is currently active [21:24:54] thanks [21:25:01] I'd expect the rate limit to start to be hit at above 100 requests with that config, and only send 429s, not delay responses. [21:25:50] T313131 [21:26:40] eh. it's a security taskso stashbot can't see it [21:27:01] * bd808 has a keyboard that is dropping spaces apparently [21:28:50] it's always the boring tasks that get interesting bug numbers [21:36:18] I'll see if I can reproduce it while skipping various layers [21:36:39] is there a server in this network with ab installed? [21:48:54] TimStarling: if you haven't just installed somewhere convenient for this test yet, I'd say it would be fine to put on the proxy hosts themselves. [23:01:46] wikibugs is again missing from -fundraising and also -feed like last time [23:02:14] shall I do the `load libera/k8s-jobs.yaml` kick? [23:11:15] bd808: well, I did that ^ but things still seem to be wonky. I'm seeing errors in the log file but not sure how to interpret them. [23:16:38] greg-g: Let me see if I can understand anything in the logs... [23:17:16] (what's the fancy command to log from the toolforge command line?) [23:18:05] greg-g: `toolforge jobs logs NAME` [23:18:35] heh "ERROR: Error: Job 'grrrrit' has file logging enabled, which is incompatible with the logs command" [23:18:58] so `tail -f grrrrit.{err,out}` then [23:20:00] greg-g: your load didn't seem to restart the job. Let me try a different way [23:20:45] !log bd808@tools-sgebastion-11 tools.wikibugs toolforge jobs restart grrrrit [23:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [23:20:55] one by one? [23:21:40] eh. I think I was just confused by AM/PM timestamps. wacky datetime formats [23:21:54] yah [23:24:23] and sorry for being a bother, apparently we're the only ones still relying on wikibugs on irc? :) [23:24:32] there it goes [23:25:15] greg-g: all the cool kids just let their bugs rot in the backlog ;) [23:27:40] bd808: so just doing the single job restart? [23:30:16] greg-g: I did `tail -f *.{err,out}` and noticed that the redis2irc job was whinging about getting connected to libera.chat so I restarted it again. I've seen a couple other irc bots have issues rejoining yesterday/today. I wonder if there is a bad host in the libera.chat pool? [23:30:33] ah, gotcha [23:32:04] my znc had problems with SASL auth not working on Saturday too until it had cycled through a few hosts [23:38:37] greg-g: you may have nerd sniped me into trying to update the ridiculously old python libs in wikibugs... the irc3 library version it has pinned is 9.5 years old :/ [23:40:40] oh my gosh