[05:07:31] <_joe_> elukey: happens every few months [12:04:05] fyi ripe85 is starting now https://ripe85.ripe.net/live/main/ [12:09:06] cheers jbond [13:24:58] anyone else having nickserv issues? I can't seem to login [13:29:01] thanks jbond! some nice DNS talks too https://ripe85.ripe.net/programme/meeting-plan/dns-wg/ [13:34:07] sukhe: yes ripe gets a lot of the speakers from DNS-OARC so they often have a good dns schedual [16:03:29] lists.wm.o timing out for me. https://lists.wikimedia.org/postorius/lists/wikimedia-l.lists.wikimedia.org/ [16:04:01] ping is fine. curl gets stuck at "TLSv1.3 (OUT), TLS handshake, Client hello (1):" [16:04:06] hm.. intermittent I guess [16:05:37] I was having super slow load times there yesterday too. Looks like it is slow for me right now as well... [16:05:41] that would fit: [18:02] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [16:15:18] it does seem to be intermittent and is not the first time fwiw [16:45:51] ebernhardson: I'm trying to understand https://wikitech.wikimedia.org/wiki/Incidents/2022-09-09_Elastic_Autocomplete_Missing more for review/summary. It sounds like "traffic switches to Codfw" is a side-effect of the train branch and/or the backported patch. Is that the case? I'm not sure I understand that part. [16:47:17] Krinkle: yes, when we migrate elasticsearch versions typically only the write side is compatible between versions, the read side has breaking changes. So we use the train as an atomic point to change the code to the new version of elasticsearch, and have some bit of code that switches traffic over as well [16:48:42] ebernhardson: ah you mean part of the elastic extension patches that rolled out (not backported) were to effectively make work the jobs the write to codfw nodes that previously failed on the prior branch. [16:48:57] so read traffic remained in eqiad as before? [16:49:15] or was there a config change that rolled out that switched read traffic at some point? [16:50:40] Krinkle: the config change ran ahead of time such that it would switch based on $wmgVersionNumber: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/824787/14/wmf-config/CirrusSearch-production.php#62 [16:51:07] ah, I see. That makes sense. [16:52:17] in this case that maintenance script does both reads and writes, and we missed verifying that that code would work in both directions. It could have been made work, we simply missed it in our testing [17:05:10] ebernhardson: ack, I've updated the doc a bit. Feel free to edit or comment here as you see fit. Hope I got it right :) https://wikitech.wikimedia.org/wiki/Incidents/2022-09-09_Elastic_Autocomplete_Missing