[07:47:31] pfischer, dcausse: we have a SUP alert in #wikimedia-operations (MediaWiki CirrusSearch update rate - codfw on alert1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0]). Should we have a look? [07:48:11] looking [07:52:37] But that’s expected, isn’t it? If Cirrus (extension) no longer writes since the Cirrus Update Pipeline does. [07:53:27] Oh, that might make sense. Then we need to disable that alert [07:53:51] I'll silence it and let ryankemper / inflatador clean it up [07:54:15] yes this is my understanding as well, I think this alert should be dropped in favor of something coming from the SUP metrics [07:55:43] but because we're still transitionning I'm not sure how to avoid these false positives [07:56:34] https://grafana-rw.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?forceLogin&orgId=1&viewPanel=44&editPanel=44 actually shows an increased rate since a few days ago [07:57:26] but I'm not entirely sure what it is measuring [08:00:50] gehel: this graph is looking at codfw only and is a known issue that Erik is trying to address in the way MW configuration defaults are applied (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1032543) [08:00:55] It reports updates coming from the cirrus extension but only for CODFW (it’s hardcoded in the panel). We discussed this during the Wednesday meeting. IIRC, the increase happened after Erik’s config change [08:01:11] …what David is saying [08:01:36] I've silenced the alert until Tuesday, I'll let our SRE handle the follow up [08:02:22] long story short when we disabled the jobqueue on eqiad for the first batch of wikis the undesirable effect was that it re-enabled the use of the jobqueue on codfw for these wikis [09:09:31] dcausse: is T364837 completed? [09:09:39] T364837: Q125918173 missing from elastic@codfw - https://phabricator.wikimedia.org/T364837 [09:10:57] gehel: almost https://gerrit.wikimedia.org/r/c/operations/alerts/+/1031522 is still under review but no objections to close the ticket and just keep the discussion in gerrit [09:11:39] Nah, let's keep it! [10:24:13] lunch [12:11:09] lunch [13:12:36] o/ [13:45:45] low priority patch for fixing elastic pools if anyone wants to take a look https://gerrit.wikimedia.org/r/c/operations/puppet/+/1032784 [14:06:46] o/ [14:50:03] pfischer dcausse able to do the meet? [14:56:23] dr0ptp4kt: yes [14:59:44] \o [15:17:47] .o/ [15:30:32] workout, back in ~30 [15:39:23] o/ [16:07:14] back [16:14:58] going offline, have a nice week-end [16:36:51] ebernhardson: I feel the same way but for every SQL engine ever [16:59:44] pyspark vs 'real' pandas is also a huge nightmare [17:03:06] oh yea, i suppose i've been insulated enough to not notice so much. mostly mysql and spark, but the use case is so different there isn't much overlap [17:03:14] pandas on the other hand ... thats not even sql. It's more akin to perl [17:03:18] :P [17:28:19] I know, it's great [17:47:57] lunch, back in ~40 [17:59:33] dr0ptp4kt: i'm realizing while reviewing, this seems off: is_minerva_autocomplete_pv / num_actors_w_pageviews [18:00:03] i'm thinking it would be more like, num_actors_w_autocomplete_pv / num_actors_w_pageviews perhaps? [18:00:27] and then split be access_method, not directly looking at the minerva/desktop split but assuming it through the access_method column [18:08:48] Also naming is hard, i feel like almost everything here needs to be renamed. But first stab: https://superset.wikimedia.org/superset/dashboard/520/ [18:09:07] also feels like maybe there should be some monthly summary numbers to the right or something for each section [18:11:10] re: https://superset.wikimedia.org/superset/dashboard/520/ [18:51:59] back [19:13:46] ebernhardson i was scratching my head about that when i wrote that as well. the idea with the first ratio is that if the utility of autcomplete improves, that ratio will go up: because better relevancy and then probably more use as an information seeking behavior. it's likely that if the utility of autocomplete improves the second ratio would go up as well, although the link is more indirect - i suspect it would be by mere exposure [19:13:46] through a CTA to use the search bar that could drive that second ratio, as opposed to a slow upward climb as people find it useful. _maybe_ there's another way to think about this, though... [19:15:37] is_minerva_autocomplete_pv / num_actors_w_autocomplete_pv is a pretty direct measure of its utility. [19:19:29] as for use of the access_method column, if that seems most straightforward, i think that's fine - there's the mild risk of comingling Minerva and Vector 2022, but that's theoretically very mild. [19:20:00] hmm, lemme think a minute here about the things we're counting as it pertains to autocomplete... [19:25:04] bleep, comp crashed upon docking. Will be back [19:50:35] ebernhardson: i updated https://phabricator.wikimedia.org/T364600#9801594 . it now says "In addition to expressing the ratios, provide the raw counts used for those ratios." And then it's asking for two ratios in Part 1: (1) num_actors_w_autocomplete_pv / num_actors_w_pageviews and (2) num_autocomplete_pv / num_actors_w_autocomplete_pv . [19:50:47] as far as that dashboard, it wasn't showing for me. has SAVE been hit? [19:52:39] I started T364936 to move the one-liners out of our main Search page and into a repo. But the repo's also open to any one-liners for Elastic, WDQS etc. if you wanna add 'em [19:52:40] T364936: Move inline code snippets from https://wikitech.wikimedia.org/wiki/Search to a repo - https://phabricator.wikimedia.org/T364936 [20:04:11] dr0ptp4kt: hmm, oh i bet it's because it's not published and marked as draft. Lemme see how that gets switched... [20:14:52] ok published now, link (probably) works. I also saw how to give it a better url while poking around: https://superset.wikimedia.org/superset/dashboard/search-monthly-overview [21:05:16] ebernhardson just to confirm, we're using the new SUP everywhere in prod now, right? [21:06:57] hmm, jobqueue dashboard still shows updates happening in eqiad? https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?from=now-30d&orgId=1&refresh=5m&to=now&var-dc=eqiad%20prometheus%2Fk8s&var-job=cirrusSearchLinksUpdate [21:08:49] context is this noisy alert. Just wondering if we can get rid of it yet https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/icinga/manifests/monitor/elasticsearch/cirrus_cluster_checks.pp#42 [21:11:07] inflatador: yes eqiad should still have updates, we only tried to turn off writes for 25% of eqiad (and that didn't work, and is awaiting an mw-core patch to be merged/deployed) [21:12:18] ebernhardson ACK, ryankemper is working on a patch that'll disable the alert in CODFW but not EQIAD [21:12:43] dr0ptp4kt: for the two metrics, i can calaculate them but they seem a little odd on the same graph. the y-axis isn't really the same thing. I tried adding a second y-axis, but that's still odd (i suppose it's rare a second y-axis is really all that clear) [21:12:47] maybe add a second graph? [23:49:53] yeah, I think a second graph is the way to go. thanks, and have a good weekend!