[07:14:30] o/ [07:30:58] o/ [07:33:47] FYI: I moved our tickets into the next sprint/sub project: https://phabricator.wikimedia.org/project/view/8099/ [07:50:43] thanks! [09:31:06] dcausse: Have you seen Marco’s comment (https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1514#note_156671)? Would you know if we could wait for a DAG instead of a hive partition sensor? [09:34:40] pfischer: looking [09:36:16] I think I've seen cases where we can sense a dag running on another instance [09:36:20] checking [09:40:09] pfischer: I think that might be RestExternalTaskSensor, see wmde/dags/wd_query_segments/wd_query_segments_daily_dag.py if we keep the dag on our side [10:53:27] lunch [12:11:43] dcausse: thanks! [12:13:11] I’ll be on the road and have limited connectivity, but I’ll be back for Talk to Search Platform [13:13:00] o/ [13:25:44] \o [13:29:09] o/ [13:58:32] hmm, seems we used to have curl in the flink container but not anymore [13:58:45] should really find a better way...maybe should be setting up temp port forwards or something [14:00:15] I vagualy remember you used a python onliner to do connectivity check within the container? [14:00:29] but yes seems tedious [14:01:16] i did have a python oneliner, but it always seemed a terrible idea, then i noticed we had curl so changed it to at least be more direct, but then curl dissapeared :) [14:01:40] ok :) [14:02:15] it seems to be working ok anyways, maybe flink has resolved the problem they had before with not entering the finished state. We've updated a few times? [14:04:10] it's almost luck that it works though, the status check fails 30 times in a row, the backfiller exits. Then the backfiller restarts and loads the currently running set of backfills from k8s state and restarts the 30 failures [14:08:00] * ebernhardson will probably at least revert the curl change and go back to python [14:41:23] * ebernhardson realizes while looking at it that detecting a running backfill request doesn't distinguish between eqiad and cloudelastic, it just waits [14:41:49] :/ [14:43:44] needs another label or something, will ponder [14:46:10] somehow this feels much more fragile than i remember first time around...the intricacies of having the mwscript invocations being remote adds a lot of uncertainty [14:59:19] yes... [15:51:14] > seems we used to have curl in the flink container but not anymore [15:51:14] ebernhardson we are hoping to do T400296 sometime soon, and along the way possibly could consider adding some simple debugging tools like curl to the container? why not? [15:51:15] T400296: Flink images should be build on top of openjdk - https://phabricator.wikimedia.org/T400296 [15:51:36] if you want that maybe comment in the task? [15:55:31] ottomata: sure, thanks [16:01:23] I just added a comment about jattach too [16:01:28] Also, workout, back in ~40 [16:02:55] ottomata: fwiw, roots can also run things in the network namespace of the container with nsenter [16:04:06] if we were on k8s v1.25 we'd have support for ephemeral containers + `kubectl debug`, but, we aren't [16:37:37] walking dog [16:50:37] back [17:27:42] dinner [18:02:45] lunch/errands, back in ~1h [19:40:44] sigh, yea. My local test failed at `240m0.092s`, so exactly 4 hours. But it completely ignored me setting the timeout i was suspecting it was to 30s [20:48:29] sorry, been back awhile