[07:23:30] inflatador: data transfers complete. i'll leave the adding to etcd and pooling for tomorrow [08:41:00] ryankemper: Thanks for the work on the Graph Split! It's good to see that we're not serving the full graph anymore! [13:52:02] \o [13:52:27] ryankemper NICE! I just checked the logs and the categories reload takes ~90m...just leaving that as a benchmark [13:52:30] .o/ [16:29:19] * ebernhardson wonders if we should be caching deepcat filters, would improve latency of scrolling on commons mediasearch [16:40:58] Apologies if this is a dumb question, but how would we implement deepcat caching? [16:43:15] inflatador: well, one way would be a http cache between mediawiki and blazegraph, another would be to use the application cache in mediawiki [16:43:40] it's essentially going from a category name to a set of categories contained, with an expensive call in the middle [16:44:43] quick CR for adding the new cirrussearch names to site.pp: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143865 [16:46:24] Another dumb question: do we cache WDQS queries now? [16:46:54] inflatador: i don't think so [16:49:51] I was just thinking whether or not a cache for deepcat would have broader utility. I also have no idea how you'd implement that anyway [16:52:04] it would probably just be some cache wrapping code in the mediawiki bits, i'm only really thinking of it because i'm looking into a bug report about deepcat timing out [16:52:21] curiously, i'm actually having a difficult time getting it to regularly time out. One category will sometimes get a timeout, but not reliably [17:08:38] we've had 2 alerts for eqiad omega today. They cleared almost immediately, but it's a little weird. Let me check their pybal config [17:09:38] can't tell from https://config-master.wikimedia.org/pybal/eqiad/search-omega-https , but my best guess is that I accidentally put a non-omega host in rotation [17:16:19] lets see, omega is 9400. in a quick test requesting 9400 from all the hosts in the config-master list repond back with cluster name of omega [17:23:08] hmm, maybe not then [17:36:27] yeah, I'm not seeing any problems there either...back to the drawing board, I guess [17:48:33] ebernhardson I changed https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143633 pretty significantly. One thing I'm not sure about it why relforge had`search.svc.eqiad.wmnet` as one of its cert domains, would it break anything if we get rid of that? ref https://puppet-compiler.wmflabs.org/output/1143633/3773/relforge1008.eqiad.wmnet/index.html [17:49:03] inflatador: hmm, no sounds like a mistake from when it was created [17:49:52] :q [17:51:17] all good, just wanted to make sure we weren't actually using it [17:51:55] inflatador: i don't think we will have search-.svc..wmnet, should we? [17:52:04] only search-.discovery.wmnet [17:57:33] ebernhardson there is a `search.svc.codfw.wmnet` SAN on all the CODFW hosts' certs currently. Are you saying it won't be necessary anymore once we're using discovery records? [17:59:08] I think discovery works by pointing `x.discovery.wmnet -> x.svc.${dc}.wmnet` [17:59:38] Looking at wdqs1022 I see `wdqs-main.svc.eqiad.wmnet` and `wdqs-main.discovery.wmnet` on its cert [18:01:38] lunch, back in ~40 [18:41:58] i think you're right, we do need them [18:48:52] back [18:51:23] it does make it a bigger task, we essentially need to https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service, but some parts are already in place [18:56:55] Oh boy, LVS ;P [19:11:34] i wonder how dumb it would be to make search-chi.svc..wmnet a CNAME for search.svc..wmnet and using the current lvs setup [19:11:48] probably asking for trouble :P [19:11:49] I had to redo that site.pp patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143888 [19:14:21] Anything that avoids touching LVS is a win ;) . I'm trying to find the presentation about PyBal's replacement (https://wikitech.wikimedia.org/wiki/Liberica ) . I vaguely remember something about opportunities to control it thru etcd, similar to k8s service discovery/autoconfig [19:15:53] I'll hit up Traffic next wk and see where they're at [19:16:02] awesome [19:21:57] I'd like to use envoy, but if the choice is between an mwconfig patch or multiple patch sets and multiple turnup calls with traffic for LVS, well... [19:23:05] on the other hand, we just had to do this for WDQS so maybe we should strike now before we forget everything again ;) [19:55:05] pondering things, i suspect a cname will work fine. [20:10:15] yeah, maybe it depends on how the gdnsd DYNA records work? https://man.archlinux.org/man/gdnsd.zonefile.5.en#DYNA ? [20:10:57] or maybe we use DYNC [20:11:15] probably worth asking b-black, but we'll probably be OK [20:19:02] I wonder if those omega alerts are related to some kind of rate-limiting or blocking. my shard checking script craps out after a few `GET _cat/shards` to search.svc.eqiad.wmnet from CODFW [20:47:59] ryankemper just updated the description on T388610 , but FYI 1053 and 1054 are having problems reimaging. I would've liked to use their capacity during the migration but we don't really need them, so we can decom [20:48:00] T388610: Migrate production Elastic clusters to Opensearch - https://phabricator.wikimedia.org/T388610 [20:49:35] I downtimed them both 'till next week [21:05:59] ryankemper I'm heading out a bit early. Feel free to work on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143897 if you have time. If not, have a great weekend!