[05:59:21] The above message from andrewbogot.t still stands [05:59:29] btullis: you still have that puppet merge locked [05:59:52] <_joe_> marostegui: revert [06:01:06] Should we also add a timeout to puppet merges? [06:01:13] Having a lock for 12h is a bit crazy [06:02:22] there are now 2 pending changes there, andrew's and btullis [06:02:28] I am not comfortable mergning those two [06:03:20] I am going to revert both [06:05:04] andrewbogott btullis I have reverted both of the changes [08:34:29] I'm sorry all about that puppet merge. [08:36:28] <_joe_> btullis: it happens :/ [08:38:02] Should I create a task to discuss adding a timeout or something like that? [08:38:37] <_joe_> marostegui: the problem isn't a puppet merge lock, that can be removed [08:38:43] <_joe_> the problem is we need a revert [08:38:54] <_joe_> and that's hard to automate/I would avoid [08:39:21] <_joe_> we could add an alert that points to a runbook that explains what to do in this situation, IMHO [08:39:32] There's an alert about unmerged puppet changs [08:39:39] So maybe we can do that [08:39:52] <_joe_> yes [09:45:51] we could also add a ping on IRC in operations (or in pvt) after few minutes the lock is in maybe [09:46:11] I was wondering to add the same behaviour in wmflib for the interactive functions btw [12:12:19] Folks, office.wikimedia.org seems to be 502'ing. [12:12:19] Request from 2405:201:2:e990:9a5e:96ac:1c5f:1da8 via cp5017.eqsin.wmnet, ATS/9.1.4 [12:12:19] Error: 502, - at 2023-08-09 12:10:16 GMT [12:13:20] scherukuwada: can you try again? [12:13:25] (works for me fwiw) [12:13:55] there are ongoing deployments, that makes more like errors for a brief period [12:13:55] LOL works now. Didn't for several minutes right until I posted here. :-) [12:14:10] it's not you, you were unlucky [12:14:33] Funny, usually when I'm unlucky it's still me. [12:14:56] Thanks, ttyl! [12:18:48] have there been any changes to the job queues lately? there have been several CentralAuth renames in the last couple of weeks where the jobs have been delayed by days or completely lost [12:48:30] TY marostegui [13:11:08] <_joe_> ongoing deployments shouldn't cause a wiki to be unavailable [13:13:06] <_joe_> it was that we had a problem :) [16:37:19] taavi: ref T249745 [16:37:19] T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 [16:38:20] Job loss is normal unfortunately. Perhaps you can prove in Logstash that some originate from URLs that relate to renames? [16:56:29] thanks Krinkle. the timing of those log entries on meta doesn't seem to match with the broken renames :/