[09:01:35] pfischer: I'm looking at https://etherpad.wikimedia.org/p/search-standup Do you have a bit more context on "Adapted EventBus extension" ? [09:14:42] gehel: Sure, I had a CR open since I started working on passing more redirect information from within the EventBus extension. Later on we focused more on the schema itself. Once the schema changes were finalised, I had to adapt the implementation, that produces the events, so it complies with the schema. Does that answer your question? [09:15:11] pfischer: sounds good! Thanks! Would you have links as well? [09:15:36] gehel: Sure, one sec [09:15:39] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/913030 and https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/914867 ? [09:16:13] Indeed [09:16:19] all good! [09:25:07] dcausse: Despite all the work done for the redirects, we still cannot detect transitions from page to redirect when it’s caused by a user creating a revision manually. As a consequence the last known revision (that was not a redirect) would remain in ES unless we issue a delete request for any redirect with a previous_state, just to make sure we keep ES clean. [09:26:51] Does that sound like an option? Or do we live with those skeletons in the cupboard? [09:31:26] pfischer: (thinking out loud) when a page p1 is changed into a redirect to p2 we should send a delete for p1 and ask p2 to be rerendered (fetching its redirects) [09:32:49] when a redirect p1 switches from target p2 to p3, we might just know that p3 needs to refresh its redirects, we might no nothing about p2 and p1 might remain in the redirects array of p2 [09:33:09] checking but I believe that even CirrusSearch internals cannot do better [09:34:25] Good. Thanks! So we need a way to flat-map input event Rows to 1+ InputEvents. [09:34:29] when a redirect p1 pointing to p2 we emit a normal update for p1 (creating this doc in the index), p1 might remain in the redirects array of p2 [09:35:18] pfischer: yes, if you're in a process function you can emit multiple events (you have access to the collector) [09:35:35] a simple MapFunction would not work here [09:36:57] * when a redirect p1 pointing to p2 changes into a plain page we emit a normal update for p1 (creating this doc in the index), p1 might remain in the redirects array of p2 [09:37:40] Now it makes sense, thanks for clarifying. [09:39:16] dcausse: Are we fine with dropping any redirect NOT pointing to a local page? [09:39:55] pfischer: yes, this is what's done today so no reason to change this I suppose [09:41:08] we can always test few assumptions on testwiki to better assess what CirrusSearch is doing but I suspect it's leaking a couple redirects [09:41:41] so it should be on a best effort basis [09:44:19] Sure, reason for the question was: If we want to know how often such leaking transitions occur, we’d need either a side output or at least some logs. As long as we can run some aggregation over logs, we might as well count them for now. [09:47:26] pfischer: sure, whenever you encounter an "ambiguity" in the process it's good to at least measure it (see CirrusNamespaceIndexMap for how to push metrics) and if we have strong feeling that we might need a extra process doing some sort of reconciliation a side-output is great for that [09:49:46] Thanks! [09:52:55] lunch [09:56:50] Lunch [12:18:46] lunch [12:56:21] o/ [13:16:26] Weekly update is out (I'm late). Let me know if I missed anything: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-06-16 [13:35:52] wdqs2022 is able to start wdqs-updater now. Restarting categories and blazegraph services seems to have fixed it, but not sure why. Maybe something to do with the perms issue gehel mentioned in https://phabricator.wikimedia.org/T331300#8930322 [13:49:05] Interesting Observation: Given a redirect from A -> B: If I move B to C (with “leave redirect behind” checked), MW silently creates a revision of A, correcting its redirect from A -> B to A -> C. Sadly, no page_change event is fired. [13:54:49] pfischer: interesting, I did not know that MW would do such cleanup automatically, but if A gets a new revision without a page-change I'd consider this a bug [14:00:39] inflatador: you removed the chown command in https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/914018/24/cookbooks/sre/wdqs/data-transfer.py#b191 so most probably the reason? [14:03:54] dcausse good catch, I'll get a patch up [14:21:04] although I still don't understand why it wouldn't start after the permissions were corrected, but before I restarted categories and blazegraph [14:23:15] dcausse: went through a dead spot, sorry for the delay. I made a mistake. No revision gets created automatically. [14:24:07] A becomes a so-called double redirect then? [14:27:29] Correct. [14:29:05] Just put together a, hopefully complete, set of cases with the latest events. [14:55:37] thanks, that'll be very helpful [14:56:15] \o [14:56:34] o/ [14:58:42] one thing i'm wondering ... with these tumbling time windows is the streaming updater going to become extremely spiky in its output rate? [15:06:00] like, tumbling time window is going to use strict epoch based buckets, so every page id seen in the last 5 minutes will flush at the same time [15:06:22] and nothing will come out in the middle [15:07:43] although i'm not seeing an option in flink to do what i would want, maybe it would take too many timers, but i kinda want a 5 minute countdown to start when it receives an event and has to create a new window to put it in [15:12:33] ebernhardson: yes windows will create burst of events [15:13:17] should we care? I suppose for elasticsearch it's almost better if it's a bit bursty so it can batch more index updates into a flush [15:14:09] ebernhardson: yes that was my reasoning as well, since we do bulk requests it does matter too much? [15:15:50] yea seems likely, just seems a little awkward i suppose compared to what i was expecting, but should work fine [15:16:30] it will be deterministic, but perhaps a little odd that we dedup two edits at 11:58 and 11:59, but not 11:59 and 12:00 [15:16:33] using timers directly is possible tho, you give up on the higher level feature like windows but that's totally possible [15:17:22] i dunno that we gain enough, it's a bit awkward that the windows are rigid but i suppose it's only an optimization, it's not like we need to get fancy for it to be correct [15:18:29] i guess i could run some quick analysis to see if the dedpulication numbers are massively different [15:18:59] ebernhardson: perhaps what you want could be done with session windows? [15:19:29] dcausse: i did look at session windows, but it wasn't clear that we could set a maximum time on the window [15:19:50] dcausse: like, if some bot edits a page every 3 minutes for hours, my reading is the session window would stay open the whole time [15:20:33] ah indeed if there's no way to force the window to fire after a certain time that's not usable [15:21:37] could ponder writing a window assigner that does what we want, might be a slight extension to sessions who knows [15:21:44] i haven't looked at how those are done yet, can take a peek [15:21:53] sure [15:22:19] going offline, have a nice week-end! [15:22:29] take care [15:23:16] going offline too. Enjoy the weekend! [16:16:17] lunch, back in ~1h [17:42:53] back [18:18:27] any plans to restart elasticsearch instances? Intending to kick off a reindex on wikidata and commonswiki for T334194 [18:18:27] T334194: Optimize the elasticsearch analysis settings for wikibase - https://phabricator.wikimedia.org/T334194 [18:18:45] i suppose i usually do these on fridays so if they take 30 hours noone will be doing much maintenance work :) [18:20:06] and related to a few hours ago, i ran the analysis and using 5 minute windows that start when the first event comes in, basically deduping the next 5 minutes worth of events, dedup rate at 5min increases from 7.5% to 8.8% [18:33:40] ebernhardson not at the moment [18:35:26] inflatador: cool, i'll kick that off then [20:40:12] gehel re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/930870 I may need a couple more patches to get it going. I think we're going to need a hiera lookup somewhere? [20:41:10] actually, I reviewed this too fast. This should be taken care of by https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/role/common/wcqs/public.yaml#2 already [20:42:58] inflatador: what are the symptoms? [20:43:21] gehel I think you're right, but I need to test this against the new bullseye hosts just to make sure [20:43:57] You're right in that the public.yaml files probably enable the profile and we don't need to do anything else [20:44:21] As long as newly-provisioned bullseye hosts get the right java packages we're good [20:45:05] gehel confirmed, let me abandon this patch [20:45:06] inflatador: note that we closed T264181 with ryankemper yesterday [20:45:07] T264181: Migrate WDQS to profile::java - https://phabricator.wikimedia.org/T264181 [20:45:50] gehel ACK, I saw that earlier but then promptly forgot ;( [20:47:04] Seeing your patch, I thought that we had forgotten about WCQS. But that seems to not be the case