[01:11:39] m.utante: Is it this one? https://wikitech.wikimedia.org/wiki/Puppet#Private_puppet [08:34:50] o/ trying to troubleshoot a possible connectivity issue between wikikube and search.svc.eqiad.wmnet:9443, it's mostly working but getting random timeout failures, is it possible to list the hosts behind this cluster which I believe is named "production-search-omega-eqiad"? [08:37:17] tried "confctl select dc=eqiad,cluster=production-search-omega-eqiad get" but getting nothing [09:22:59] <_joe_> dcausse: what information do you want? [09:23:34] the info that you need can be found here https://config-master.wikimedia.org/pybal/eqiad/cloudelastic-omega-https [09:23:36] <_joe_> looks like we had some strange straffic patterns this morning [09:23:47] <_joe_> elukey: maybe he wants to know the live status of load-balancers [09:23:50] but yes what Joe asked is a good starting point, let us know what you need :) [09:23:54] _joe_: I think I found a discrepancy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1152020 [09:24:14] <_joe_> dcausse: ah damn :) [09:24:50] but I'm having other issues now, two clusters are red and not sure to understand why yet :( [09:25:31] I'll stop the bleeding and recover these indices, no users are affected for now (search is still served from codfw) [09:25:48] it's the update pipeline that's suffering and accumulating a massive lag :( [09:26:10] <_joe_> dcausse: :( [09:26:30] <_joe_> dcausse: https://grafana.wikimedia.org/d/000000422/pybal-service?orgId=1&from=now-12h&to=now&timezone=utc&var-datasource=000000006&var-server=$__all&var-service=search-omega-https_9443 also can be useful to know the status of a cluster from the POV of load-balancers [09:29:26] _joe_: yes thanks, that's where I found cirrussearch1110 actually :) [10:23:22] ongoing es issues FYI [10:26:50] I am going to disable writes on es7 [10:27:46] ok [10:27:59] I need a review here https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1152032 [10:28:25] done [10:28:56] thank you [10:29:15] deploying mediawiki [10:34:35] thanks, let us know when finished, CC federico3 [14:41:24] . [14:53:56] o [14:56:33] O [15:00:00] thx. I just wanted the timestamp :) [15:00:48] X [15:01:05] (that was the growing bubble eventually popping) [15:07:55] :) [15:21:57] Reedy: swfrench-wmf: Is the fancy captcha fix worth backporting? That is, could we do another trial run today if we do? [15:30:46] Krinkle: apologies, I have minimal context on the status of the captcha job (either the details of fix, or what needs verified in order to re-test it). just to confirm, is there a fix that's been recently merged? [15:33:07] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ConfirmEdit/+/1151795 [15:33:12] that shoudl disable the limit.sh wrapper [15:33:47] sorry, I mixed up you and claime. I thought you did this one a few days ago [16:00:33] Krinkle: ah, got it! that's great if the fix is that simple. I'll take a quick look later today to see if it's clear to me what needs to happen next. alas, c.laime is out, and they're likely the authority on this one. [16:26:03] Krinkle: We might aswell jfdi in terms of backporting [16:26:12] If it somehow breaks the existing workflow too, that's good to know [16:38:55] agreed [16:53:33] +1 - yeah, unless backporting that patch is a hassle in some non-obvious way, then going ahead and doing so at least provides signal as to whether it breaks anything as-is [20:34:46] no alerts during on-call (I traded 2 days with herron this week vs next week)