[08:17:26] o/ [08:17:57] gehel when you have a sec, could you grant me access to the Q3 doc? No rush! [08:25:26] o/ [08:27:44] gmodena: done [08:40:00] dcausse thanks! [09:19:18] mjolnir is acting up again :/ [09:19:44] the feature selection task does not end [09:25:04] is the skein launcher stuck https://yarn.wikimedia.org/cluster/app/application_1734703658237_603740 ? [09:27:45] yes that's it, underlying app is application_1734703658237_603756 [09:33:38] tempted to give mjolnir its own pool, we're using "sequential" for all our busy jobs but with this one possibly stuck for several days it causes issues on others [09:34:00] (while we figure out how to fix the root cause) [09:34:27] there is no info in the logs. oof [09:36:38] yes, nothing... only TimeoutException ... [09:38:00] I'm not super familiar with skein tbh, maybe joal or aqu have some pointers [09:38:24] dcausse do we still need skein at all, now that the airflow instance has moved to k8s? [09:39:19] gmodena: no clue... but I'm not sure we moved the scheduler to k8s yet [09:40:17] ack [09:41:00] https://yarn.wikimedia.org/proxy/application_1734703658237_603756 [09:47:38] will open a task [09:48:43] I might pause mjolnir a bit to give room for other dags [09:49:17] dcausse happy to hop on a call if you'd like to rubber duck the problem [09:49:26] gmodena: sure [09:49:58] gmodena: https://meet.google.com/owi-bpkc-nsd [09:50:26] other than relatively high GC pauses, I don't see anything immediately off in workers metrics [09:50:31] with you in one sec [10:49:06] errand+lunch [12:47:34] dcausse looks like the second retry feature_selection-20180215-query_explorer-dbn-20180215-query_explorer-pruned_mrmr worked [12:47:48] Duration 01:49:40 [12:51:12] errand+lunch [13:15:05] gmodena: going to be fun debugging this :/ [14:12:14] o/ [14:14:39] o/ [14:17:03] How did my desk get even messier during the break? WHO DID THIS [14:22:30] dcausse eh. I hope we'll at least be able to repro the scenario where it got stuck. In hindishgt, maybe we could have attached to the live jvm process [14:22:35] next time :D [14:22:46] yes... :) [14:23:03] inflatador must be a 4th law of thermodynamics or something [14:25:05] LOL [15:05:21] dcausse I'm in pairing if you wanna join [15:05:46] inflatador: I'm still in https://meet.google.com/ysb-afqp-pct?authuser=0 with Trey will join pairing in a sec [15:07:40] dcausse np, not urgent [15:55:54] office hours starting in 5' at https://meet.google.com/vgj-bbeb-uyi [16:01:50] gmodena, Trey314159, we're in ^ [16:03:46] shoot. I think I am in the wrong Meet [17:04:20] workout, back in ~40 [17:49:09] back [18:32:56] d-causse re: pairing, wmde's docker-compose should give us some idea of how to run categories https://github.com/wmde/wikibase-release-pipeline/blob/main/deploy/docker-compose.yml [18:34:03] cc ryankemper ^^ [19:01:41] lunch, back in ~40 [20:09:04] back