[08:04:31] !log toolsbeta tools-manifest 0.24, T290325 [08:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:04:37] T290325: tools-manifest broken on toolsbeta - https://phabricator.wikimedia.org/T290325 [08:08:28] !log tools update tools-manifest to 0.24 [08:08:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:56:53] is somethig up with the toolforge proxy? :/ im getting 502s [09:01:00] repeatedly? [09:01:04] occasional 502s are a known issue [09:01:55] everything seems to go a bit slow, and quite frequent 502s for me but nothign seems wrong with my tool [09:02:42] * majavah pokes at dashboards [09:03:36] (the Phabricator task I meant would be https://phabricator.wikimedia.org/T282732) [09:04:59] yeah, I see a few traffic spikes, and kubernetes ingress is known to not handle them very well [09:05:30] gotcha, cool, as long as I don't need to investigate why my tool isnt works :D [09:05:32] *working [09:05:52] What ingress are we using? [09:06:12] ingress-nginx [09:07:38] URL? [09:08:20] For mine you can see this quite a bit at https://backstage.toolforge.org/catalog I believe [09:08:32] Who knows, maybe i am the cause of the traffic spikes xD [09:08:43] but i doubt it, this tool is 2 days old [09:18:45] there was a spike on 5xx errors not so long ago [09:18:47] https://usercontent.irccloud-cdn.com/file/sWEZnKlR/image.png [09:26:12] just got one on the versions tool too [09:34:27] yup, still getting them :( [10:05:37] will keep an eye on this, if the issue continues I'll investigate deeper [15:23:06] addshore: +1, I also get more 502 than usual [15:55:16] arturo: the logs from the ingress cluster roll over so quickly (because there's so much of it) that I usually miss the logs from the time when these events happen. If you happen to see one happening, Doing a logs --tail 1000 on the ingress label would be good. The closest I ever came to a useful bit of data from that end was actual 500s from tools that really had errors. [15:55:38] The problem *might* secretly be the front proxy blowing up lua processing silently, though. [15:56:10] But yeah, if you see a spike like that, I'm eager to capture logs from the moment one happens :) [15:56:10] we do haproxy in tcp and not http mode, right? [15:56:22] Yes, I believe so there. [15:56:32] We use http mode for one endpoint in paws [15:57:10] It's because tls is terminated at the front proxy, iirc [15:57:28] In paws the termination is at haproxy [16:02:57] At some point my theory was that we were running out of nginx workers on the ingress-nginx level [16:08:17] will keep an eye [16:08:59] if workers is the problem, perhaps a simple solution is to add yet another k8s ingress node to the rotation [16:36:47] Definitely, scaling the ingress is pretty easy [19:31:22] !log tools.wikibugs restarted libera-phab to pick up new "In progress" status [19:31:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [19:33:13] !log tools.wikibugs restarted libera-irc to pick up new "In progress" status (didn't actually need to restart libera-phab) [19:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [19:35:51] and now it can't reconnect :/ [19:40:24] > 2021-09-15 19:39:24,962 - irc3.wikibugs - CRITICAL - connection lost (23385571927936): None [19:42:13] ^ me running it manually [19:45:05] the delay in logs coming across over NFS is really frustrating :/ [19:48:02] > Exception ignored in: