[05:25:27] (03CR) 10CI reject: [V:04-1] build: Updating eslint-config-wikimedia to 0.30.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1150219 (owner: 10Libraryupgrader) [06:02:25] Good morning! [06:08:33] morning isaranto o/ [06:08:34] you'll let me know whenver you're ready to discuss the article-summaries notebook [06:30:41] We can talk in ~1h. Is that ok? [06:37:31] yep, that's ok. [06:47:21] good morning! [06:51:04] Good morning [06:51:30] Good morning [07:07:27] have a great week every1! [07:07:39] kevinbazira: ping me whenever you want I'm available [07:10:41] isaranto: in the call https://meet.google.com/fiq-bnnw-rjq [07:20:47] 06Machine-Learning-Team, 13Patch-For-Review: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10855699 (10elukey) [07:21:49] 06Machine-Learning-Team, 13Patch-For-Review: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10855700 (10elukey) @klausman I just remembered that the move-vlan cookbook supports only codfw at the moment, so for eqiad we can simply depool/reimage/repool,... [07:22:35] hello folks! [07:22:46] I am going to depool and reimage ml-serve1003 [07:30:26] buon giorno o/ [07:37:59] kalimera :) [08:08:43] elukey: ack! [08:08:56] and good morning :) [08:11:36] (03CR) 10Nik Gkountas: [C:03+2] Fix: Filter out section recommendations with no missing sections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1149719 (https://phabricator.wikimedia.org/T394441) (owner: 10Sbisson) [08:13:10] (03Merged) 10jenkins-bot: Fix: Filter out section recommendations with no missing sections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1149719 (https://phabricator.wikimedia.org/T394441) (owner: 10Sbisson) [08:23:28] klausman: o/ I left a message in the task, we can reimage the eqiad workers without any special confd-workflow [08:23:42] (move-vlan is available only in codfw) [08:24:06] that should speed up a lot the reimages, we can simply depool via the reimage cookbook [08:24:07] ack, I should have realized that, since Cathal mentioned it a while back. [08:24:36] yeah cordon/depool is automagic with the cookbooks, right? [08:25:15] the drain+cordon part needs to be done via the k8s cookbook, the depool part is in reimage [08:25:20] (confd depool) [08:25:25] aye [08:40:54] interesting, ml-serve1003 seems to get stuck when loading d-i [08:41:13] ok no unblocked, I just needed to say that out loud [09:01:24] isaranto: I am a couple of mins late, Chrome hates me today [09:46:05] ml-serve1003 pooled and serving traffic :) [09:46:32] 06Machine-Learning-Team: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10856127 (10elukey) [09:48:06] I have a bit of time so I am going to do ml-serve1004 as well - https://gerrit.wikimedia.org/r/c/operations/puppet/+/1150604 [10:25:45] ack! Thank you [10:25:49] * isaranto afk lunch [10:41:45] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10856209 (10Michael) >>! In T393474#10854494, @OKarakaya-WMF wrote: > > looking into the growthexperiments_link_recommendations with [... [10:55:19] for some reason it is taking ages to install debian, I am going to afk for lunch and then I'll come back to check later on. The host is drained and safely depooled :) [11:34:29] Hey folks, can we add Bartosz to deployment-charts repo? He can pull but not push [11:38:23] Update~~~^ He can push but he cannot merge, the +2 option is not available for him. Please add him whenever you have time. [11:44:28] 06Machine-Learning-Team: Article Summary Generation and Evaluation Pipeline using vLLM image - https://phabricator.wikimedia.org/T395246 (10kevinbazira) 03NEW [12:00:56] georgekyz: bartosz I saw that Bartosz wasn't in the deployment group (only in ml). it should be fine now [12:01:08] isaranto: thnx [12:13:01] thank you for the help isaranto: georgekyz:! ❤️ [12:52:41] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06DBA, 10MediaWiki-Recent-changes, and 2 others: Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253 (10isarantopoulos) 03NEW [12:53:10] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07Epic: Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10856653 (10isarantopoulos) [13:35:27] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team: Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10856772 (10A_smart_kitten) [13:42:18] 06Machine-Learning-Team, 10MediaWiki-Recent-changes, 10Moderator-Tools-Team (Kanban): [Spike] Investigate why filtering wasn't working on testwiki - https://phabricator.wikimedia.org/T395256#10856800 (10Kgraessle) [13:46:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [13:46:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [13:46:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [13:47:23] ml-serve1004 back in service [13:48:07] klausman: if you have time later on during the week you can start from 1011, we should be able to finish by end of week if so :) [13:48:20] 06Machine-Learning-Team: Migrate all Lift Wing k8s workers to Bookworm and containerd - https://phabricator.wikimedia.org/T387854#10856821 (10elukey) [13:48:40] and after that we'll enable PSS in eqiad, so the whole migration will be done [13:51:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [13:51:49] Deployment reference-need-predictor-00010-deployment in revision-models at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revision-models&var-deployment=reference-need-predictor-00010-deployment - ... [13:51:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [13:52:57] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10856827 (10gkyziridis) >>! In T390706#10842717, @elukey wrote: > The first step is to read and create a draft of https://wikitech.wikimedia.org/wiki/SLO/Template... [14:16:35] elukey: ack. that's the plan [14:23:25] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team: Improve ORES extension table backfill script - https://phabricator.wikimedia.org/T395253#10856907 (10Pppery) [14:30:23] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10856946 (10elukey) @gkyziridis naming clash :D That SLO is related to a part of Visual Editor, so we should find a different name. Maybe we could come up with a... [14:35:58] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10856982 (10isarantopoulos) I'd suggest ToneCheck, EditCheck_tone or something similar. [14:44:21] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10857007 (10OKarakaya-WMF) Thank you for the answer @Michael , I agree. I think there is something wrong with the tool dfwmf. I've crea... [15:38:23] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10857165 (10gkyziridis) I created an initial page for [[ https://wikitech.wikimedia.org/wiki/SLO/ToneCheck | SLO/ToneCheck ]]. It is still under progress, I just... [16:27:22] 06Machine-Learning-Team, 10MediaWiki-Recent-changes, 10Moderator-Tools-Team (Kanban): [Spike] Investigate why filtering wasn't working on testwiki - https://phabricator.wikimedia.org/T395256#10857325 (10isarantopoulos) 1. highlighting indeed didn't work either on idwiki or testwiki. I'm not sure how to inves... [17:25:38] 06Machine-Learning-Team, 10ORES, 06Moderator-Tools-Team, 10PageTriage, and 3 others: ParserFunctionsTest::testIfexist failure by run of ORESFetchScoreJob in CI - https://phabricator.wikimedia.org/T395074#10857446 (10matmarex) 05Open→03Resolved [17:33:05] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 05Goal: Q4 24-25 Goal: Investigate Add-a-link model training and deployment - https://phabricator.wikimedia.org/T393474#10857452 (10Michael) @OKarakaya-WMF I look forward to tomorrow's meeting! I should be able to help you with several of the questions yo... [17:46:14] (03PS1) 10Sbisson: test_section_recommendations: 'present' section is optional [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1150741 (https://phabricator.wikimedia.org/T394441) [18:12:47] (03PS1) 10Sbisson: Stop adding languages to TranslationRecommendation [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1150744 [18:15:40] (03CR) 10Umherirrender: [C:03+2] "Resubmit" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1150219 (owner: 10Libraryupgrader)