[08:37:15] errand, back in 20' [09:36:55] pfischer: just to make sure I have the right context about T327381: the RDF jobs are now running in production with Spark 3 and we have confirmed that they run properly at least once, right? Or is it just that the job is tested but there is more work to complete the deployment? [09:36:56] T327381: Migrate RDF Tooling to Spark 3 - https://phabricator.wikimedia.org/T327381 [09:37:12] I assume that we'll at least have some minor work to deploy on Airflow 2 once ready. [09:41:53] dcausse: given your update in the standup notes, I've moved T328330 to in progress and assigned it to you [09:41:54] T328330: Create SLI / SLO on Search update lag and error rate - https://phabricator.wikimedia.org/T328330 [09:45:23] gehel: was using https://phabricator.wikimedia.org/T320408 for this but T328330 is closely related indeed [09:46:48] I just saw this. I'm adding T320408 as a subtask of T328330 [09:46:48] T320408: Monitor CirrusSearch update lag - https://phabricator.wikimedia.org/T320408 [09:47:46] makes sense, thanks! [10:03:52] hacked puppet-managed/05-PoolCounter.php on cindy to stop loading the PoolCounter extension, might be better to stop loading this role but was not brave enough to attempt a "vagrant provision" there [10:25:34] weekly update is out, both on asana and on wikitech: https://app.asana.com/0/0/1204055438761188 / https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-02-24 [11:00:36] Lunch [11:06:14] lunch 2 [13:58:48] gehel: I missed your message from this morning: Together with David, I verified, that at least on of the AirFlow DAGs ran successfully with ref-spark-tools:0.3.121 (Spark 3). Yesterday I successfully ran another DAG (import_ttl) that has been migrated to the new repository (based on AirFlow 2) [13:59:28] So that should be fully done, and we shouldn't have to come back to that task. Is that correct? [14:07:03] gehel: I’d say so, T327381 is done. [14:07:03] T327381: Migrate RDF Tooling to Spark 3 - https://phabricator.wikimedia.org/T327381 [14:07:11] great! [14:30:16] o/ [15:06:15] dcausse I think Andrew answered this for you, but just in case you wanted to respond (flink on DSE) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/891577/comments/05255c4b_970e4fca [15:07:29] inflatador: thanks for the heads up, responding [15:43:44] France is going out tonight with a few friends for her birthday. I'll be on kid duty. [15:43:53] And so, early stop for me. [15:43:58] Have fun! Enjoy the weekend! [15:48:18] gehel: bon week-end! [15:52:20] merchi! [16:02:46] \o [16:04:27] hmm, my touchpad isn't doing anything this morning. button clicks work but no moving :P Guess will try a reboot [16:06:06] yup that seems to have fixed it [16:09:10] o/ [16:09:42] hmm, fetch from gitlab is giving me auth failures this morning :S [16:11:32] working for me [16:11:56] using git@gitlab.wikimedia.org [16:12:22] same here, odd [16:13:50] key created feb 24, 2022...i wonder if there is some sort of expiration related [16:14:06] ah, yea thats exactly it [16:19:13] skipping unmeeting, have to prep for a long ride tomorrow, have a nice week-end! [16:19:21] enjoy! [16:19:52] workout, back in ~40 [17:48:57] back [18:14:35] ryankemper now I remember why I was using 'without-lvs' for the data xfers: wdqs1010 is not in any pools, so I couldn't xfer from it because the cookbook would fail. We don't need to revert the changes but we should move 1010 into production once the xfers are finished [18:39:06] lunch, back in ~40 [19:08:42] back [19:38:32] we didn't fully rip out the LVS logic either, ref https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-transfer.py#L219 . Will get a patch started [20:01:24] inflatador: doesn’t need to move into prod. Instead for that case we pass the no depool flag and manually depool the prod host being xferred to [20:03:04] ryankemper ACK, that works for the current xfers. Just thinking that since wdqs1010 has valid data now, we should actually move it into prod once we're done w/the xfers [20:10:24] Patch up at https://gerrit.wikimedia.org/r/891899 [20:11:24] hmmm, and I need to fix it already [20:11:37] logic is wrong ;( [20:13:16] OK, that should be better [20:36:43] inflatador: if args.depool is the correct way actually, since we want the cookbook to repool after a successful run [20:39:55] ryankemper OK, I got confused. Just for my clarification: 'depool' is implicit now, and will be set if no flag is supplied? [20:44:01] inflatador: exactly depool is default, and --no-depool is the flag we pass when we want it to not depool or pool whatsoever. (that may be why the previous naming scheme was without-lvs since it's slightly confusing that if the "yes depool" flag is on that also means "yes pool it afterwards too") [20:45:20] ryankemper thanks for clearing that up. Just added a new patchset that reverts to original logic [20:46:33] inflatador: I threw my +1 down. once small nit, the title should mention it finishing the rename rather than removal since we're not really removing the flag, just bringing its naming into alignment w/ the other files' changes [20:49:51] ACK, I changed the commit msg