[07:56:39] o/ [07:56:45] ottomata: thanks! [09:57:26] lunch [10:01:23] ebernhardson: 11% is the average on every window or the total of duplicated events over the time period or your whole dataset ? [10:15:24] lunch [10:23:38] lunch 2 [12:16:34] errand [13:24:26] dcausse: 11% is total, basically the number of unique (time bucket, wikiid, page_id) tuples over the dataset vs the total dataset [13:24:57] ebernhardson: thanks, so perhaps a single big window could drive most of the dups? [13:25:30] dcausse: seems like it, yea. we mostly have to decide how long we are willing to let events wait around [14:44:51] pfischer: didn't you also have a presentation ready? [16:20:57] gehel: Yes I did. I guess I filled in the form too late. Anyway, those slides are still there. Maybe another time. [16:27:07] * ebernhardson struggles to understand what flinks is going to do if there are sequential windows in a processing pipeline [16:29:48] i suppose i'm wonder what the time assigned to the output of the first aggregation is. first deduped event, last? or the time when the window closes? [16:44:15] * ebernhardson is just bad at reading :P Apparently its window end timestamp - 1. [17:50:49] dinner [18:48:37] inflatador: fyi, we just repooled wdqs2021 [18:48:44] s/2021/2012/