[07:47:46] inflatador: np, ping me when you can but feel free to do the upgrade without me [09:57:46] there's the wcqs_streaming_updater_reconcile_hourly dag stuck on some sensors wait_for_lapsed_actions_codfw, the partition is here but the sensor does not seem active... [09:59:26] I see [('dag_id', 'wcqs_streaming_updater_reconcile_hourly'), ('task_id', 'wait_for_lapsed_actions_codfw'), ('execution_date', '2023-05-22T17:00:00+00:00'), ('map_index', '-1')] in the audit logs tho... [09:59:53] ah no it's me browsing the task... [10:02:59] or did the whole airflow scheduler stopped [10:03:01] ? [10:06:32] something I clearly don't get here :( [10:07:19] the sensor tasks for codfw partitions are blank never started [10:13:11] I see these tasks as being in the "removed" status for the run scheduled__2023-05-04T17:00:00+00:00 [10:14:21] so that suggests the dag "dynamically" changed somehow [10:18:22] we should loop over eventgate_datacenters which is wmf_props.get("eventgate_datacenters", ["eqiad", "codfw"]) [10:28:10] unsure that's the cause of the problem but I wonder how this could happen: https://people.wikimedia.org/~dcausse/airflow_task_removed.png [10:35:20] or for some reasons they got "removed" for the two runs that are greyed but were not "restored" afterward (remained blank)? [10:38:37] hm... marking the task as "success" did the trick and seemed to have "restored" the codfw tasks... so weird [10:40:14] must be something similar to this: https://stackoverflow.com/questions/53654302/tasks-are-moved-to-removed-state-in-airflow-when-they-are-queued-and-not-restore [10:42:18] maybe related: https://github.com/apache/airflow/issues/25744 ? [10:46:20] perhaps not, seems like we run 2.5.1 but that issue was fixed in 2.4... [10:46:26] lunch [12:33:36] o/ [12:37:51] other dags looping over DC partitions are using eventgate_partitions() rather than creating a separate task sensor per DC I should probably do that... [13:22:40] gehel btullis are we doing the hiring debrief now or in 15m? Your message says "at the beginning of pairing" but the time says :35 [13:26:37] inflatador: yeah, I moved it a bit to leave a buffer after my interview [13:26:44] I'm available now and I'll join [13:27:59] btullis never mind, we will meet at :35 [13:35:07] dcausse: hmm, that would be me breaking things but i didn't expect that :S eventgate canary events were broken so i had set the eventgate datacenters to only wait for eqiad (with airflow variables), recently i checked and those were fixed so i put it back to both dc's [13:36:31] ebernhardson: oh ok that makes sense, thanks! I think the problem is more on the airflow side not "restoring" these tasks imo [13:36:49] or me dynamically building the dag based on this var [13:37:13] working on a patch to do have a single sensor task waiting for both DC [13:38:16] yea ot [13:38:34] it's better in airflow if the dags themselves are not dynamic, only the tasks. airflow is indeed not great at shape-changes in the dag [13:38:47] yes... [13:56:43] errand [14:11:41] trying to decide what to do with the event merging in streaming updater...still not coming to great conclusions. As discussed merging is post-fetch and at that point we are back to the Row objects so everything to merge is a bit fuzzy, even if it's all contained to one class [14:42:23] * ebernhardson is sad to find out intellij undo doesn't go nearly as far back in time as expected [14:50:42] ebernhardson: would that help to create a dedicated model class? [14:50:56] pfischer: you might have ideas ^ [14:51:41] dcausse: i've been thinking about that, we already use an output schema and peter had mentioned generating classes from it. will poke it at least [14:51:58] sure [14:52:36] perhaps something hybrid with the fields we care about + a map for everything else? not sure... [14:55:07] not entirely sure either. This whole concept of merging is more complicated the more i think about it :P [14:55:55] like when it attempts to merge a higher revision id it should probably throw away the old event, except maybe not the weighted_tags field if that was merged in...probably will use the bit that processes the full window at once as an array but still a bit squishy [14:57:20] very true, we can perhaps do the obvious merges and let the events flow with proper ordering when unsure and add more optimizations when we have a better idea? [14:57:22] dcausse: one other thing that isnt clear, where are the serialization boundaries and are those tested by the test suite? I mean like datastream.map(foo).map(bar) could have a serialization between foo and bar, but probably doesn't [14:58:26] i suppose i'm wondering about how to ensure when i add some sort of Map fields to InputEvent it will serialize correctly (or verify InputEvent is never serialized) [14:58:57] so by default flink will serialize the events even when running in the same JVM to always run the serialization code, this can be optimzed by a config to skip this serialization step [14:59:27] for testing the serialization itself (esp schema upgrades) we should have a test [14:59:40] I think there's one, lemme check [15:01:07] yes there should be InputEventTest [15:01:21] do we have a use case for fields other than weighted tags being merged in? Maybe i'm overcomplicating by trying to have a generic merge and instead should carry a weightedTags field in InputEvent, and as you suggested just emit multiple events instead of dealing with more complex merges [15:01:41] retrospective happening in https://meet.google.com/eki-rafx-cxi cc: dcausse, ebernhardson, ryankemper, inflatador, mpham [15:01:45] oh, sec [15:01:48] oops [15:02:53] pfischer: we're in the triage meeting [15:52:12] the only merge that comes to mind is the "equals" noop_handler for wikidata labels/descs but it might not have to be treated within the pipeline yet [15:53:12] "equals" means that not passing the field you actually want to delete the data instead of keeping it around [15:55:35] or more precisely not keeping old language labels, as the noop will merge the map of language instead of overwritting it [15:57:33] hmm, but even those we wouldn't be collecting from multiple events. If we keyBy the (wiki, revision id) then the only merges should be weighted_tags? [15:59:08] yes [16:00:37] keyBy rev id might be difficult for some existing weithed_tags no? [16:00:57] hmm, i don't follow? [16:01:15] oh, some of the providers. hmm [16:01:25] yes image rec might not be able to fed us with the rev id? [16:01:34] revision scoring is of course revision based, but yea recommendations aren't so strictly tied [16:02:32] additionally we wouldn't want to throw out image rec's if the revision id is old. Which complicates merging them with a regular document update [16:02:45] yes [16:03:29] sounds like only revision based tags should be merged, the others should just throw through. I suppose that also makes sense since there won't be an edit in the window to merge with [16:03:36] but it should be more explicit in the code i guess. [16:03:46] s/throw through/go through/ [16:05:04] yes I suppose they could be merged with revision based ones but the chances are small that they'll overlap [16:07:40] wokrout, back in ~40 [16:40:09] back [17:00:48] ryankemper and ebernhardson : since mrG won't be at pairing today, do you want to move back? Ryan and I usually do 2 PM PT (also you're under no obligate to show up) [17:01:05] i can do earlier [17:04:11] \o [17:04:18] Either slot works for me [17:07:28] ebernhardson ryankemper ACK, let's keep it as-is then [17:42:21] lunch, back in ~45 [18:02:35] dinner [18:33:50] sorry, back [19:42:02] quick break, back in ~20 [20:11:07] back