[09:05:49] we're discussing a place to put notebooks, for now it's mainly the ones related to the wdqs graph split. Tempted to create a very generic "notebooks" project under gitlab:repos/search-platform, any objections to this? [09:08:58] No objections as a starting point. If it becomes too messy, we can always reorganize. [09:39:43] sonar started to complain on the sup java code: https://sonarcloud.io/project/issues?resolved=false&severities=BLOCKER%2CCRITICAL%2CMAJOR%2CMINOR&sinceLeakPeriod=true&types=BUG&id=wmftest_cirrus-streaming-updater&open=AY0YOIi340DOfg35nq6V [09:39:57] seems like a false positive but could be me missing something [11:42:08] lunch [13:59:44] FYI this graphite-based alert has been UNKNOWN for the last 22 days https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DNumber%20of%20requests%20triggering%20circuit%20breakers%20due%20to%20excessive%20memory%20usage [14:00:12] looks like the underlying metric either disappeared or was renamed [14:01:23] godog thanks for the heads-up, will make a task to investigate [14:01:46] inflatador: thank you! appreciate it [14:04:45] np, T355795 is up. Feel free to add/edit if I missed anything [14:04:45] T355795: Fix "requests triggering circuit breakers" Elastic alert - https://phabricator.wikimedia.org/T355795 [14:05:25] looks good! [14:15:16] inflatador: should we keep our 1:1 today? [14:15:42] gehel I don't think we need to [14:16:28] dcausse: looks like a false positive to me as well. Seems that the synthetic execution is getting throw off by the exceptions [14:18:29] I would probably extract the retry logic to its own class, which would make things more clear, both for humans and machines... [14:21:15] inflatador: ok, let's canccel for today [14:22:18] o/ [14:48:12] dcausse: would you be up to presenting the status of the WDQS Graph SPlit for our next DPE staff meeting? (January 29th) [14:48:33] gehel: sure [14:48:34] just a slide and few minutes, not a large presentation [16:03:21] OTW, just got into cowork space building [16:57:05] workout, back in ~40 [17:50:04] back [18:05:07] meant to ask, d.causse if group perms could be set on T352538_wdqs_graph_split_eval in hdfs - i'm looking at regexing things for iguana to pre-filter non-useful types of queries (non-useful for analysis purposes that make its way of thinking about performance skew), but also taking under consideration where queries succeed in both consolidated graph and just the main-side graph already and have the same result count. [18:07:13] * dr0ptp4kt (btw, am looking at the notebook to see if there are some already-accessible paths that may already have the stuff, plus to avoid dead-ends) [18:41:03] lunch, back in ~40 [18:52:24] dr0ptp4kt: I've run hdfs dfs -chgrp -R analytics-search-users hdfs:///user/dcausse/T352538_wdqs_graph_split_eval hopefully that's enough? [18:52:27] dinner [18:53:45] looks good, thanks! [19:11:02] back [19:14:57] hmm, a deploy on SUP will change the consumer envoy from 1 cpu to 200m cpu request, assuming that was done with --set '...' but not sure which values. Thought it would be `mesh.resources.requests.cpu=1` but doesn't seem to do anything [19:25:40] weird. I was thinking we were the only ones adding more resources to envoy, but that's not true is it? [19:27:41] i think it's trying to change back to the defaults from a custom adjustment. I'm going to put together a patch to align the git repo with whats currently deployed [19:35:36] ahh, 2 comments above was partially PEBKAC. The reason --set didn't work is because i was diffing producer and consumer, but only one is changed, so the diff was actually changing i just didn't notice that what changed changed [19:35:49] so it was working, but the diff was always there because something was changing [20:42:52] hmm, we got another WDQS alert about the streaming updater using too much object storage space [21:14:20] that one should definitely create a task instead [21:46:27] seems sensible [22:39:27] hmm, cloudelastic red [22:51:15] ebernhardson Y we merged the master change, having issues with order of operations [22:51:39] we;re in https://meet.google.com/fde-tbpf-wqh if you wanna join