[00:19:30] :P [06:17:55] o/ [06:49:42] Thinking out loud (while working on redirect processing in our update pipeline): Do we need we really need a fetch request from CirrusSearch? In case of a page_change event, we’d have all information we need, to ask ES for a partial update of the ‘redirect’ property of the redirect-target document. That would reduce traffic to CirrusSearch and bring us closer to self-contained event processing. [06:57:10] pfischer: yes possibly? if you mean an update to redirect A that has a target to B -> only add A to the redirect arrays of B. We have to think about if this is feasible with the "set" noop_handler (https://github.com/nomoa/search-extra/blob/master/docs/super_detect_noop.md), esp. verify if duplicates are treated [07:01:46] and how complex values are treated, the redirect arrays is an array of complex values [{namespace:0, title:"title"}, {...}] [07:07:51] What scrip languages do we support on ES? I saw the example https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html that updates a tag array via painless script and that made me wonder if we could leverage this kind of script, too. [07:11:20] pfischer: possibly? I barely remember a discussion we had we Erik pondering if it would be sane to switch to painless instead of the noop handlers [07:11:57] main issue I suppose is that we can't combine existing noop features with painless, you must use a single engine [07:13:43] but if we never need to merge redirect updates with other updates I suppose painless could be used? [07:15:17] rewriting our superdetectnoop script engine with a custom painless is also possible I guess but might be a bit involved and risky [07:17:34] But do we use noop detection for the redirect property? I looked at the CirrusSearch code and that adds noop-detection only for `incoming_links`, the combined counter for redirects and other links. [07:19:01] pfischer: not today because we always send back the whole list of redirects, but doing a partial update like you mention might be doable by leveraging something like the "set" noop handler [07:20:04] (or a scripted update with painless I suppose) [07:35:23] quick clarification: when I say "we can't combine existing noop features with painless", I think a scripted update can only have one script engine set, you can't send a single update request that'll use both painless and the the super_detect_noop script, these will have to be 2 separate updates even if the painless script is targetting different fields than the super_detect_noop [07:42:09] Alright, understood. Hm. So we’d have to a) update redirects, b) in/decrement incoming_links counter, and c) detect noop and set noop (ctx.op = 'noop’) [07:43:08] I’ll familiarise myself with painless and see if we can do this with reasonable effort. [07:44:49] incoming_links is taken care of by another process so no need to deal with it [07:48:19] hm and also something I forgot, which might or might not be problematic with an incremental approach is that we want to prevent generating giant array, in the case bazilions of pages are redirecting to the page we don't want an array that's too big in elastic [07:51:49] currently CirrusSearch does limit the number of redirects it fetches for a given page: https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch/+/70a9b4cc18dadfd79cb0e8777b0dbe71b55a6cdf/includes/BuildDocument/RedirectsAndIncomingLinks.php#104 [07:52:21] Ah, alright. Thanks! [07:52:39] I’ll be out for breakfast, back in 45m [09:01:45] Un meeting is starting if anyone is interested [10:01:34] maven learning circle in https://meet.google.com/ibf-ghno-gbm [10:02:24] pfischer: ^ [12:12:36] gehel: should I’ve received an invite/is that a regular meeting? [12:13:37] pfischer: not a regular one. There is a similar one next Wednesday, and I see that you've accepted. [12:14:58] Alright, because I didn’t see anything in my calendar today. Sorry I missed it. I’m afraid, I won’t make it to our retrospective today, since we’ll be on the road. I’ll be working (offline), however. [12:52:31] o/ [13:53:04] addshore: if you are back, any chance you could help us get https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/402 merged? [14:33:31] \o [14:33:37] o/ [14:33:58] sigh... looks like we accept negative offsets from the action api :) [14:41:02] lol [14:41:46] * ebernhardson still dreams of type systems that are easy to use, and yet powerful enough to easily represent things like bounded integers [14:55:36] sigh canary events seem down since yesterday 18h [14:58:36] from 2023-06-21T19:00 to 2023-06-22T12:00 on revision-score (inclusive) [15:00:45] again? :( I was assuming i had to look into those airflow mails but hadn't yet [15:01:01] retrospective time: https://meet.google.com/eki-rafx-cxi [15:01:08] dcausse, ebernhardson, ryankemper ^ [15:50:17] ryankemper, inflatador: I'll skip the pairing session to day, conflicting meeting about hiring [15:50:31] gehel: inflatador: ack [15:56:37] ACK [15:56:46] Workout, back in ~40 [16:28:10] ebernhardson: would have a couple minutes for a airflow admin/UI questions? (https://meet.google.com/tgk-ynkc-hud) [16:38:19] sudo -u analytics-search /srv/airflow-search/bin/airflow-search tasks clear --downstream --only-failed --start-date 2023-05-04T17:00:00Z --end-date 2023-05-19T00:00:00Z --yes --task-regex wait_for_hourly_data ores_predictions_hourly [16:38:27] sudo -u analytics-search /srv/airflow-search/bin/airflow-search tasks run -fAlim wcqs_streaming_updater_reconcile_hourly wait_for_lapsed_actions_eqiad 2023-05-04T20:00:00Z [16:53:26] back [17:05:15] going to take an early lunch after all. Back in ~1h [17:24:51] dinner [18:20:48] hmm, flink seems to be using kyro for sub-field serializers even when providing type information. Have a test case failing because kyro tries to copy an ImmutableMap using Map.put [18:21:07] although this whole test harness thing is a bit complex, i wouldn't be surprised if i have it wrong somewhere :P [18:32:10] back [19:44:21] quick break, back in ~15 [20:06:59] back [20:46:13] ebernhardson: IIRC type information is only used to get fields/properties and their types to derive a structure/order in which properties are encoded in a Row. Did you register a custom serializer (https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/serialization/custom_serializers/)? [20:49:31] pfischer: hmm, no i don't think so. this is just providing the type information for InputEvent