[08:03:25] !log tools.lexeme-forms deployed 474e48d752 (update Breton grammatical feature) [08:03:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [10:45:50] !log toolsbeta redeploy jobs-emailer into k8s (T341084) [10:45:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:45:54] T341084: [toolforge] Move all the components to the gitlab ci/cd flow - https://phabricator.wikimedia.org/T341084 [11:02:20] !log tools redeploy jobs-emailer 0.0.41-20230718103342-3dddcfb8 into k8s (T341084) [11:02:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:02:24] T341084: [toolforge] Move all the components to the gitlab ci/cd flow - https://phabricator.wikimedia.org/T341084 [14:09:10] Can someone help explain to me what is happening with the iabot tool. It's getting 503 service unavailable despite a web service being active. [14:09:16] Logs are flooded with [14:09:18] 2023-07-18 13:00:29: gw_backend.c.238) establishing connection failed: socket: unix:/var/run/lighttpd/php.socket.iabot-1: Resource temporarily unavailable [14:09:18] 2023-07-18 13:00:29: gw_backend.c.255) backend error; we'll disable for 1secs and send the request to another backend instead:load: 262 [14:09:18] 2023-07-18 13:00:29: gw_backend.c.262) If this happened on Linux: You have run out of local ports. Check the manual, section Performance how to handle this. [14:11:12] (Please ping me) [14:17:55] Cyberpower678, I've had that happen before, usually goes away with a webservice restart [17:46:02] AntiComposite: indeed, a restart clears that, but this is happening fairly frequently lately. Not sure why, as the logs don't mean much to me other than indicate that apparently, all ports are taken up. [17:46:40] So i have left my webservice as is in the hopes maybe someone here can gleam something meaningful. [18:01:28] Cyberpower678: sounds like the tool might be keeping too many connections/files/sockets open. What is "fairly frequently"? Can you try to run lsof in the running container? [18:13:47] (there is no lsof in the container, nvm that) [18:14:45] @chicocvenancio how do I do that? [18:14:53] Oh nvm [18:15:38] I do not know what sockets are being left open as the web UI does have execution timeouts. But this 503 hasn't gone away on it's own. [18:17:01] If the problem is entirely on my end, I will absolutely fix it, but right now, I don't know what's going on. [18:17:30] So if anyone can help here, I will give you a beer. :-) [20:17:13] Cyberpower678: complete speculation, but "Resource temporarily unavailable" is the error message for an EAGAIN error code which is raised when attempting to read(2) a file/socket/fifo with O_NONBLOCK set and no data waiting. My hunch would be that lighttpd is getting that error when trying to talk to the PHP fcgi container because all of the worker threads in the PHP have hung or crashed. [20:18:04] Well that sounds unfortunate. [20:18:45] It also sounds like either PHP is the issue, or my application is. But I have timeouts set on execution time for PHP, so they should terminate when they hang. Interesting. [20:45:38] bd808: I think the problem is a bit bigger. I can't restart the webservice. The command seems to error out. [20:46:21] try webservice stop && webservice start [20:47:11] That seems to work. [20:47:50] Thank you. Still need to figure out what is going on with these hung threads though. Is there something I can do to investigate this further the next time it happens> [20:57:15] Cyberpower678: the last time I tried to help debug this -- https://phabricator.wikimedia.org/T335923#8849521 [22:05:49] could that segfault be related to "not enough memory inside the container"? [22:06:08] that could cause such heisencrashes on otherwise correct code [22:32:31] I am getting a timeout and ultimately a 502 bad gateway when I try to hit my tool on Kubernetes PHP 7.4. I am not getting any PHP logs so I suspect the issue is not in my tool. [22:32:45] URL https://magog.toolforge.org/fileinfo.php [22:37:34] I am not able to hit a pure HTML page either [22:51:24] magog_the_ogre: the most "wow" thing I'm seeing right now is that the deployment for your magog webservice is 685 days old. The pod itself is 64 days old. [22:53:05] magog_the_ogre: I would personally do a `webservice stop` followed by a `webservice --backend=kubernetes php7.4 start` to see if that 'magically' makes things work again. [22:53:28] lol yeah I don't have much time to spend tinkering anymore [22:53:58] I wrote most of this code almost a decade ago [22:54:35] anyway that fixed it thanks