[04:48:54] 06Machine-Learning-Team: Spark Job in airflow-devenv cannot access Hive Metastore because of Kerberos Authentication Failure - https://phabricator.wikimedia.org/T398907 (10kevinbazira) 03NEW [05:29:14] 06Machine-Learning-Team: Spark Job in airflow-devenv cannot access Hive Metastore because of Kerberos Authentication Failure - https://phabricator.wikimedia.org/T398907#10981944 (10kevinbazira) To isolate the issue, I ran a manual test to confirm whether direct Hive access works from a Kerberos-authenticated Spa... [06:33:57] Good morning [06:39:25] bartosz: o/ I am off today, sorry for the delay, I'll get to review your change tomorrow I promise :) [06:43:17] good morning [06:43:45] elukey: o/ Thanks and no worries, I’m finding some fun things to do in the meantime :D Enjoy your day! [07:01:57] morning folks! [07:54:04] klausman: staging testing patch for s3, if we can deploy this, https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1166543 we can wait for a day to observe and then plan for final deployment. [07:55:30] SGTM! [07:58:41] Thanks! [08:00:26] klausman: staging show diff in AWS_ACCESS_KEY_ID that's what we fix recently, right? [08:00:39] let me have a look [08:01:44] yes, it shortens by two characters (:prod -> :ro) [08:04:16] morning! [08:08:17] klausman: also codfw has `initialDelaySeconds: 450` diff. OK to keep it? [08:08:55] yeah, probably better to get back to what's in teh charts before we tweak the probing delays [08:09:33] But let's see how long staging takes to download everything (and if it works, fingers crossed) before pushing anything prod-side [08:10:00] sure [08:20:04] 06Machine-Learning-Team: Goal: Increase the number of models hosted on Lift Wing - https://phabricator.wikimedia.org/T353335#10982290 (10isarantopoulos) 05Open→03Invalid [08:26:53] 06Machine-Learning-Team, 10ORES, 06Growth-Team, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: User: Wikipedia recent changes list the edit highlighting by ORES has disappeared - https://phabricator.wikimedia.org/T346175#10982341 (10isarantopoulos) [09:19:52] 06Machine-Learning-Team: Investigate null scores being returned by revertrisk language agnostic - https://phabricator.wikimedia.org/T394910#10982614 (10isarantopoulos) 05Open→03Declined After discussing this with WME it seems that this is not a priority anymore and no additional context was provided. [09:20:33] * klausman out for an errand and early lunch [09:26:45] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk to kserve 0.14.1 - https://phabricator.wikimedia.org/T383119#10982669 (10isarantopoulos) a:05gkyziridis→03BWojtowicz-WMF [09:27:28] 06Machine-Learning-Team, 13Patch-For-Review: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#10982682 (10isarantopoulos) a:03BWojtowicz-WMF [10:39:09] 06Machine-Learning-Team, 06Research: Score probability evaluation for languages without enough data - https://phabricator.wikimedia.org/T398930 (10achou) 03NEW [10:40:21] 06Machine-Learning-Team, 06Research: Score probability evaluation for languages without enough data - https://phabricator.wikimedia.org/T398930#10982927 (10achou) [10:40:21] 06Machine-Learning-Team, 05Goal: FY2024-25 Q4 Goal: Productionize tone check model - https://phabricator.wikimedia.org/T391940#10982928 (10achou) [10:46:13] 06Machine-Learning-Team, 10EditCheck, 10Editing-team (Tracking): Verify cost of gathering peacock training/evaluation data for top 20 languages - https://phabricator.wikimedia.org/T388215#10982949 (10achou) 05In progress→03Resolved Resolved this task, as we have the following follow-up tasks: T398930... [11:09:40] 06Machine-Learning-Team: Create a notebook for tone check Airflow pipeline - https://phabricator.wikimedia.org/T398937 (10achou) 03NEW [11:10:40] 06Machine-Learning-Team: Create a notebook for tone check Airflow pipeline - https://phabricator.wikimedia.org/T398937#10983118 (10achou) [11:10:42] 06Machine-Learning-Team, 05Goal: FY2024-25 Q4 Goal: Productionize tone check model - https://phabricator.wikimedia.org/T391940#10983119 (10achou) [11:50:08] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Update kserve to v0.13.0 on ML clusters - https://phabricator.wikimedia.org/T380722#10983317 (10isarantopoulos) @klausman Shall we rename this task and switch to a newer version? A candidate could be the latest v... [11:53:14] 06Machine-Learning-Team, 05Goal: Q1 25-26 Goal: Operational Excellence - LiftWing Platform Updates & Improvements - https://phabricator.wikimedia.org/T398948 (10isarantopoulos) 03NEW [11:57:43] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Update kserve to v0.13.0 on ML clusters - https://phabricator.wikimedia.org/T380722#10983360 (10klausman) >>! In T380722#10983317, @isarantopoulos wrote: > @klausman Shall we rename this task and switch to a newe... [11:57:48] 06Machine-Learning-Team, 05Goal: Q1 25-26 Goal: Scaling Add-a-link to more wikis via production pipelines - https://phabricator.wikimedia.org/T398950 (10isarantopoulos) 03NEW [12:00:09] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Update kserve to v0.15.2* on ML clusters - https://phabricator.wikimedia.org/T380722#10983373 (10klausman) [12:02:22] 06Machine-Learning-Team, 05Goal: Q1 25-26 Goal: Operational Excellence - LiftWing Platform Updates & Improvements - https://phabricator.wikimedia.org/T398948#10983381 (10isarantopoulos) [12:02:23] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 2 others: Update knative-serving+net-istio to v1.12.x on ML clusters - https://phabricator.wikimedia.org/T380723#10983382 (10isarantopoulos) [12:06:56] 06Machine-Learning-Team: Inputs for tone check model prediction - https://phabricator.wikimedia.org/T397013#10983400 (10achou) [12:06:57] 06Machine-Learning-Team, 05Goal: FY2024-25 Q4 Goal: Productionize tone check model - https://phabricator.wikimedia.org/T391940#10983401 (10achou)