[10:04:06] pfischer: 1:1 ? https://meet.google.com/vkf-mkgd-ywo [11:24:15] lunch [12:58:34] Having IRC issues...hopefully sort out soon [13:10:26] OK back [13:26:21] Trying the fish shell today....seems nice so far [13:31:24] o/ [13:50:46] dcausse cancelled pairing today, still looking over some stuff [13:51:02] inflatador: ok [15:36:53] dcausse: sorry, last minute, but I've just forwarded an invite for the WDQS Scaling workshop in 30' [15:37:03] feel free to skip if you have something else [15:37:13] gehel: ok [16:50:26] back [17:43:43] dinner [17:44:04] lunch/errands, back in ~1h [18:38:11] * ebernhardson is tempted to unify the various subgraph query mapping/metrics into a single weekly and daily dag, it's a bit tedious to work out the relationships between them [18:45:18] it turns out subgraph_query_metrics_daily hadn't run since 2-28, it was stalled waiting on subgraph_queries. Somehow that dag (in airflow 1) reported success, but it has no outputs or logs for feb 28 through mar 7 (but logs exist prior to that date) [18:45:21] something fishy :S [18:45:53] i guess just roll back the new deployment start date to feb 28 and see how it goes [18:53:04] back [18:56:34] ebernhardson: could well be because of me, marking as success some dags but not fixing failed downstream dags, I gave up at some point to understand all the dependencies and failures (missing wikidata dump *and* missing canary events) [18:57:06] dcausse: ahh, is the actual dump for 2023-02-20 missing upstream? I was just about to start looking at that piece :) [18:57:15] yes [18:58:15] hmm, yea that makes things complicated....perhaps we could allow the subgraph query/mapping to use an older discovery.wikibase_rdf partition if the new one can't be created. Not sure how to encode that in the airflow dags though [18:58:46] yes if possible, IIRC I think they've done that for the imagereco dag [18:59:07] or perhaps allow using an older diff [18:59:26] in other parts we use the hive.max_partition helper, which basically looks at the table and simply uses the most recent partition of the table maybe that would be plausible here [18:59:29] I mean using an older dataset than week-1 to build the diff [19:00:27] have it wait a specified amount of time for the new partition to show up, but if it doesn't continue on and use the most recent partition [19:01:25] {{ prev_execution_date_success }} from https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/287 [19:01:34] might not be applicable for our usecase [19:02:45] hmm, in this case we don't have a previous in airflow 2, maybe i could set it in airflow 1 along with an end_date to let it rebuild those [19:03:45] getting all these things aligned is kinda hard :P Part of why i was thinking this might be better as a more unified dag for subgraph/metrics [19:03:58] possibly indeed [19:04:41] i suppose i could also lie to the system to make it easier, copy a different partition of discovery.wikibase_rdf into the 20230220 partition [19:05:04] as a one-off that doesn't seem terrible [19:06:57] it's also fine to let it fail? I guess nothing depends on past? [19:09:32] hmm, i suppose. i guess my starting point was choosing a point in time and trying to get the new dags running, i started it at mar 3'd and some weren't passing. I suppose we can ignore some [19:11:13] yes... but feel free to hack a discovery.wikibase_rdf partition if you want to see them running while activating them in airflow 2