[08:39:05] errand [09:19:12] pfischer: I'm reading the stand up notes. Which phab task is related to your work on the spark kafka writer? [09:19:30] gehel: one sec. [09:19:33] Oh, that should be T374341 [09:19:34] T374341: Add support for Spark producers in Event Platform - https://phabricator.wikimedia.org/T374341 [09:20:23] https://phabricator.wikimedia.org/T374341 - yeah, we have an another one but that suffices [09:21:07] gehel: forgot about the Growth adventures in the standup notes [09:23:00] gehel: added it to the etherpad [09:25:59] Status update published: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2024-11-08 [09:26:06] I'll add the Growth adventure [09:27:15] pfischer (and others): if you could add phab links to the standup notes, that's super helpful to me! [09:27:36] Sure, my bad. [09:28:43] I suspect the relevant tickets for that last update are T378983 and T378664 ? [09:28:44] T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983 [09:28:44] T378664: [wmf.1] refreshLinkRecommendations.php - Unable to deliver all events: 400: Bad Request - https://phabricator.wikimedia.org/T378664 [09:29:07] and maybe also T377150 [09:29:08] T377150: Config: enable CirrusSearchEnableEventBusWeightedTags - https://phabricator.wikimedia.org/T377150 [09:29:11] Yes, there where a few related to the errors that led to the fix [09:29:45] T377150 was the original one to enable it (which in turn caused trouble) [09:30:41] I've added those 3. Feel free to edit the update (and also post a follow up on the slack thread) if I missed anything! [11:04:14] lunch [14:56:22] \o [15:06:13] .o/ [15:10:29] o/ [15:27:49] Monday is a holiday in the US, so a lot of us will be out.. unless we forget—don't forget! [15:39:00] sigh... I'm getting throttled by the wdqs-internal endpoint when extracting wdqs query results... [15:39:40] Trey314159: thanks for the reminder! [15:44:23] dcausse can we disable throttling without doing a deploy? LMK if I can help one-off a host or something [15:45:30] inflatador: yes, thanks, I won't be the one running the analysis but yeah me might need a solution... will ping you once I know more [15:45:49] ACK, heading to into my office but will be back in ~20 [16:03:31] back [16:26:06] Going out for the week! Enjoy your weekend, and your Monday for those in the US! [16:33:51] things to think about: how do we want to manage access controls / users in opensearch: T379288 [17:32:15] T379288: Plan for access control with opensearch - https://phabricator.wikimedia.org/T379288 [18:08:13] I think I need to read more about what's possible with opensearch regarding acl first :/ [18:13:20] https://opensearch.org/docs/latest/security/access-control/permissions/ easy to get lost :( [18:13:25] it's not a particularly refined approach, but might be same elastic is doing. All the things that can be done in the cluster have actions with names associated to them, like the actions we created for ltr. The ACL's amount to a list of index patterns and a list of actions to allow [18:14:39] but there're no simple shortcut for "all" read operations with index_pattern: xyz*? [18:15:06] they have something, might be called "role group" or something like that, which is essentially a set of actions wrapped up into a name you give it. There are some default ones [18:15:15] well it's in the name tho, indices:data/read/* [18:16:26] unsure but I feel that it's not something we could get 100% right while we migrate and might be something we enable after the migration? [18:16:58] yea i was pondering similar, it could be mixing up too many things to do at same time. There is risk involved there that could be isolated [18:17:40] if we do turn them on, while not suggested we could start with giving the anonymous user full access and migrate things to accounts with more limited access [18:18:17] yes I think that's reasonable [18:18:31] cannot be worse than what we have today :) [18:18:36] indeed :) [18:19:04] the other quasi-related question, do we still want envoy? [18:19:17] opensearch does tls now, meaning we don't need tls termination, but we still get stats [18:19:32] ah [18:19:43] the o11y ones aren't using envoy, afaict [18:20:08] we'd still get telemetry from the mw envoy side anyways? [18:21:02] yea, we'd depend on clients using envoy which is true for SUP and cirrus [18:21:07] I'm not clear what's envoy is buying us on the elastic side other than tls but I'm all for removing unnecessary elements in the request stack [18:21:56] i think the only thing it buys us is maybe simplifying tls config (re-use whatever envoy does, vs provide it to opensearch), but probably not a big deal [18:22:00] and then the stats [18:22:56] +1 to do what o11y is doing and remove it [18:23:04] ok, sounds good [18:24:35] ok finally got my sparql queries executed without being throttled... running with a single executor which is obviously extremely slow .... :( [18:24:54] heading out, have a nice long week-end [19:41:18] How much RAM do you think your new Blazegraph servers will have? Mine currently have 384 GB [19:42:52] hare: not sure, i know at least some of our current servers are 128G [19:43:12] As is documented on Wikitech :) I definitely recommend as much RAM as you can afford [19:43:27] This is stupid but I found Blazegraph to be more stable once I simply gave it more RAM [19:43:40] hare: to the heap or the disk cache? [19:43:58] Heap. I'm not sure I've changed disk cache [19:44:32] disk cache isn't something you really change directly, all the unused memory on the system is used for the linux disk cache, in this case basically system memory - heap [19:45:13] I think in traditional Linux fashion anything not already used by the heap is used for caching, so having more RAM is good for that as well [19:47:22] right. How large are you setting the heap? [19:47:32] 256 GB. Probably too much, but it works! [19:48:53] interesting. Thats certainly large for a java heap. [19:49:24] Problem solving via incrementing numbers until the problem goes away [19:50:05] :) Modern g1gc is supposed to be much better than the old collectors at giant heaps, maybe it's not so bad anymore [20:02:39] errand, back in ~90