[04:48:17] 06Machine-Learning-Team, 10EditCheck, 10VisualEditor, 10Editing-team (Tracking), 07Epic: Expand language coverage for Tone Check - https://phabricator.wikimedia.org/T394448#11192434 (10SSalgaonkar-WMF) [06:48:37] morning folks o/ [06:48:46] good morning [08:04:17] morning :) [08:13:44] good morning! [09:02:23] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review: Data Persistence Design Review: Article topic model caching - https://phabricator.wikimedia.org/T402984#11192876 (10BWojtowicz-WMF) **Why do we need Cache** Machine Learning Team decided to add Cache mechanism to our article topi... [09:12:19] 06Machine-Learning-Team, 07Essential-Work: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#11192889 (10isarantopoulos) We should update the documentation in Wikitech on how to use the new script before we resolve this task. https://wikite... [09:17:00] 06Machine-Learning-Team, 07Essential-Work: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#11192906 (10BWojtowicz-WMF) Yes, I would keep this task open until the documentation has been updated. > shall we also remove the old one? I'd su... [09:21:06] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck, 10SRE-SLO, 10Editing-team (Tracking): Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#11192908 (10isarantopoulos) 05Open→03Resolved I'm resolving this task as the work to define the SLO and implemen... [09:21:08] 06Machine-Learning-Team, 05Goal: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11192915 (10BWojtowicz-WMF) >>! In T392833#11190327, @Dbrant wrote: >>>! In T392833#11188134, @BWojtowicz-WMF wrote: >> could we agree on... [09:32:04] 10Lift-Wing, 06Machine-Learning-Team, 07Essential-Work: Fix locust load test for edit-check - https://phabricator.wikimedia.org/T400460#11192944 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [09:34:07] 06Machine-Learning-Team: Fix CI/CD on ml-pipelines repository - https://phabricator.wikimedia.org/T404717#11192955 (10isarantopoulos) a:03gkyziridis [09:44:26] 06Machine-Learning-Team, 10Pilot-Flag, 06Project-Admins, 10Projects-Cleanup: Consider archiving #Pilot-Flag - https://phabricator.wikimedia.org/T404308#11192963 (10isarantopoulos) The project hasn't been active so we can archive it. [10:09:25] 06Machine-Learning-Team: Build and push images to the docker registry from ml-lab - https://phabricator.wikimedia.org/T394778#11193036 (10isarantopoulos) Following up on this task as it is quite important for the proper utilization of the new GPU hosts. >>! In T394778#10967994, @elukey wrote: > 2) Have a plac... [10:10:37] 06Machine-Learning-Team, 07Essential-Work: Incorporate notebook into Tone-Check data generation ml-pipeline - https://phabricator.wikimedia.org/T404722#11193037 (10kevinbazira) >>! In T404722#11188376, @achou wrote: > Hi! I want to add more information regarding the data used for training. I was checking the [... [10:17:55] 06Machine-Learning-Team, 10observability: Istio recording rules for Pyrra and Grizzly - https://phabricator.wikimedia.org/T351390#11193055 (10isarantopoulos) Since the issue with the gaps has been fixed I suggest we resolve this task. The ML team will continue using Pyrra for the time being unless a replaceme... [10:22:18] 06Machine-Learning-Team, 10EditCheck, 10Editing-team (Tracking): Investigate `edit-check` returning empty responses - https://phabricator.wikimedia.org/T400606#11193059 (10isarantopoulos) 05Open→03Resolved Since there have been no further reports on additional information regarding the time of the re... [10:49:04] * isaranto afk for the next ~30-40' [11:29:34] 06Machine-Learning-Team, 07Essential-Work: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#11193198 (10matmarex) (wrong Bartosz :) ) [11:39:42] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11193231 (10Michael) >>! In T401021#11190742, @achou wrote: > Here's my proposed... [11:48:14] 06Machine-Learning-Team, 07Essential-Work: Reimplement the model-upload script to take into consideration new use cases - https://phabricator.wikimedia.org/T394301#11193265 (10isarantopoulos) >>! In T394301#11193197, @matmarex wrote: > (wrong Bartosz :) ) right, sorry for the wrong ping! [12:16:32] FIRING: [3x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [12:21:28] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [12:27:03] klausman: are the above alerts related to the work you're doing in https://phabricator.wikimedia.org/T403047? [12:28:15] Yep [12:28:42] I'll push the admin_ng changes (after checking them closely) and then teh alerts should stop firing [12:31:41] isaranto: I see that edit-check has unpushed memory limits (or they were hand-edited) in staging: 8Gi/15Gi would be pushed, vs 30Gi/15Gi currently live [12:43:59] Similar 10Gi vs 15Gi in serve-codfw [12:44:22] ...and eqiad [13:19:25] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review: Data Persistence Design Review: Article topic model caching - https://phabricator.wikimedia.org/T402984#11193553 (10Ottomata) Nice! > lang Text Partition Key Language code for the page (e.g., 'en', 'fr', 'es') Suggestion to stan... [13:22:51] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11193561 (10Ottomata) > Only if we also store the scores in CirrusSearch. The re... [13:33:05] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11193631 (10dcausse) >>! In T401021#11193231, @Michael wrote: > > Only if we al... [13:42:54] klausman: I'm in meetings but I can follow up on that later to check if the patch is there or not. [13:44:04] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11193689 (10Michael) >>! In T401021#11193561, @Ottomata wrote: >> Only if we als... [13:51:29] I've pushed the external-services changes to all clusters. We can sort edit-check later [13:59:38] ok, thanks! [14:02:46] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11193749 (10Ottomata) > `hasrecommentation:tone>0.7` Oh! Cool! [14:02:53] klausman: o/ https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1184063 comes to mind, IIRC I unblocked manually a deployment a while ago [14:02:59] but I thought there was no diff [14:11:19] ok so ml-serve-codfw looks right, IIRC it was applied manually so this will be a no-op [14:11:40] same for codfw [14:11:44] err eqiad [14:12:12] ok staging is curious [14:12:47] ahhh wait [14:12:57] in staging we have higher limit ranges for all namespaces [14:13:02] 06Machine-Learning-Team, 06Project-Admins, 10Projects-Cleanup, 10Release-Engineering-Team (Doing 😎): Consider archiving #Pilot-Flag - https://phabricator.wikimedia.org/T404308#11193796 (10Aklapper) 05Open→03Resolved a:03Aklapper Thanks everyone! Archived. [14:14:14] so in theory it should be fine in staging as well, having limit ranges so high in there is convenient but also bad since we'll not see resource constraints like in prod [14:14:23] Ack, agreed [14:14:34] what's your preference? [14:15:16] I think if we can keep staging as close to prod as possible from a defaults pov, that would be preferable [14:21:41] git blame says me and Ilias are the responsible, directly from 2022 :D https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/865589 [14:22:22] ok so ideally we just remove that limit range in bulk, and update staging with ad-hoc overrides when we need [14:26:18] yep. currently in a meeting, I can look at it after [16:21:28] FIRING: [6x] HelmfileAdminNGPendingChangesLiftWing: Pending admin_ng changes on ml-serve-codfw - https://wikitech.wikimedia.org/wiki/Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d%2Fadmin_ng - https://alerts.wikimedia.org/?q=alertname%3DHelmfileAdminNGPendingChangesLiftWing [17:14:10] I have silenced the above alerts for the next 14 hours [17:31:51] ty! [17:32:07] I totally forgot about them re-firing more frequent than the base 24h delay