[04:05:37] (03PS2) 10Santhosh: Tweak srqiprofile and gsrsort for better search results variety [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1081237 (https://phabricator.wikimedia.org/T377124) (owner: 10Sbisson) [04:07:15] (03CR) 10CI reject: [V:04-1] Tweak srqiprofile and gsrsort for better search results variety [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1081237 (https://phabricator.wikimedia.org/T377124) (owner: 10Sbisson) [06:56:22] Good morning! [07:03:06] (03PS1) 10Kevin Bazira: article-country: remove support for QID input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1081809 (https://phabricator.wikimedia.org/T371897) [07:04:29] (03CR) 10CI reject: [V:04-1] article-country: remove support for QID input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1081809 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [07:11:04] (03PS2) 10Kevin Bazira: article-country: remove support for QID input [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1081809 (https://phabricator.wikimedia.org/T371897) [08:48:54] morning!! :D [08:53:13] Good morning :) [08:56:16] (03PS6) 10Nik Gkountas: Use category search to find campaign pages instead of template [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1076020 (https://phabricator.wikimedia.org/T373132) [08:56:23] hey! [08:56:30] welcome back Tobias! [08:56:42] <3 I see the place is still standing :) [09:01:16] (03PS2) 10Nik Gkountas: Replace "campaign" term with "collection" or "page_collection" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1079467 [09:02:12] (03PS13) 10Nik Gkountas: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) [09:06:42] (03PS14) 10Nik Gkountas: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) [09:12:30] 06Machine-Learning-Team, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317#10245464 (10hashar) 05Open→03Resolved [09:15:38] (03PS15) 10Nik Gkountas: Fetch campaign metadata and return them with recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) [09:15:38] (03PS10) 10Nik Gkountas: Support Default collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1072175 (https://phabricator.wikimedia.org/T374597) (owner: 10Santhosh) [09:21:29] I am rebooting a bunch of the prod and staging machines to pick up security updates. Shouldn't disrupt anything (ml-lab machines are already done) [09:21:38] It only affects our SUeprmicro nodes. [09:33:27] hey folks [09:33:35] Mroning, Luca [09:33:44] since we are rebooting, I'd need to apply new BMC settings to ml-serve2010 and ml-serve2011 [09:33:56] no rush, if you have time this week :) [09:34:05] ack. Speaking of the BMC. I had an oddity with ml-lab1002 today [09:34:30] I loggied in, cd /system1/sol1 and then "start" .... which justc ompletely hang [09:34:49] I tried resetting the BMC via the webui, but that didn't fix it [09:37:25] elukey: I'll do the codfw machines (including 10/11) this afternoon. Will ping you when I get to them [09:43:11] klausman: ok for the codfw machines, I am going to be afk right after lunch but should be ready later [09:43:33] re: BMC - just checked and ml-lab1002 doesn't have the Redfish license yet, so it was configured manually [09:43:53] ah, ack, so maybe a cfg error there, [09:43:57] I cannot retrieve the BIOS config (due to the license problem), I guess that something console-redir is to be fixed [09:44:19] yeah, so hopefully when the licenses will arrive a re-run of the provision cookbook will give us more info [09:44:27] Ack, ty! [09:45:01] the other nice thing that I found out last week was that ml nodes have AMD CPUs as well :D [09:45:08] so the BIOS configs are named differently [09:45:12] yes! and beasts, too [09:45:24] very lovely, but now the provision cookbook should support that use case as well [09:45:44] ah, we should've probably poked people about that [09:47:20] nah it is fine, we are slowly picking up all the use cases now, new vendor == new problems :D [09:47:46] Some fo them exciting :) [10:22:39] * klausman lunch [12:09:13] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10245987 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1002 for host... [12:09:22] Reisntalling ml-lab1002 to fix some disk layout issues. [12:10:40] 06Machine-Learning-Team: ml-lab should have a symlink /home -> /srv/home/ - https://phabricator.wikimedia.org/T377478#10245991 (10klausman) This is done for lab1001, I will do the same to 1002 once it is re-imaged. The old contents of users' `/srv/$USER` has been moved to `/srv/home/$USER/old-srv` [12:46:52] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10246093 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1002 for host ml-l... [12:48:16] 06Machine-Learning-Team: ml-lab should have a symlink /home -> /srv/home/ - https://phabricator.wikimedia.org/T377478#10246099 (10klausman) Now also done for 1002. [12:48:50] 06Machine-Learning-Team: vscode remote ssh into ml-lab freezes - https://phabricator.wikimedia.org/T377067#10246107 (10klausman) My suspicion is that VSCode is trying to fetch some components via http/https and does not know about the necessity to use a proxy. I don't know one could tell it to use webproxy:8080... [15:32:38] 06Machine-Learning-Team, 10ORES, 10CheckUser, 06Moderator-Tools-Team, and 4 others: Failure in PageTriage extension on CheckUser test GlobalBlockingHandlerWithDatabaseRowsTest::testRetroactiveAutoblockWhenLocalUserNotAttached - https://phabricator.wikimedia.org/T377609#10246971 (10Dreamy_Jazz) a:03Dreamy_... [16:02:53] I found sth related to the langid 5xx over the weekend -> https://logstash.wikimedia.org/goto/46765bc63d06d39dc32adef256063287 [16:25:10] 10Lift-Wing, 06Machine-Learning-Team: [langid] fasttext only processes one line at a time - https://phabricator.wikimedia.org/T377751 (10isarantopoulos) 03NEW [16:30:55] 10Lift-Wing, 06Machine-Learning-Team: [langid] fasttext only processes one line at a time - https://phabricator.wikimedia.org/T377751#10247305 (10isarantopoulos) [16:31:20] going afk folks, have a nice evening/rest of day o/ [16:54:33] (03CR) 10Umherirrender: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1081608 (owner: 10Umherirrender) [17:12:52] 06Machine-Learning-Team, 10ORES, 10CheckUser, 06Moderator-Tools-Team, and 5 others: Failure in PageTriage extension on CheckUser test GlobalBlockingHandlerWithDatabaseRowsTest::testRetroactiveAutoblockWhenLocalUserNotAttached - https://phabricator.wikimedia.org/T377609#10247514 (10Umherirrender) 05Ope... [17:39:49] 06Machine-Learning-Team, 10Add-Link, 10Growth-Scaling, 06Growth-Team: Establish processes for running the dataset pipeline - https://phabricator.wikimedia.org/T276438#10247742 (10Michael) Growth is working on surfacing link-recommendations in new ways (T362584), and so I'm trying to get a grasp on how this... [17:59:47] (03CR) 10Reedy: [C:03+2] Use namespaced classes [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1081608 (owner: 10Umherirrender) [18:26:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:26:49] Deployment euwiki-articlequality-predictor-default-00020-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [18:26:49] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=euwiki-articlequality-predictor-default-00020-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:31:35] (03Merged) 10jenkins-bot: Use namespaced classes [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1081608 (owner: 10Umherirrender) [18:31:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [18:31:49] Deployment euwiki-articlequality-predictor-default-00020-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [18:31:49] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=euwiki-articlequality-predictor-default-00020-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [19:00:06] (03PS3) 10Sbisson: Tweak srqiprofile and gsrsort for better search results variety [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1081237 (https://phabricator.wikimedia.org/T377124) [19:31:03] (03CR) 10Eamedina: Fetch campaign metadata and return them with recommendations (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1070308 (https://phabricator.wikimedia.org/T373132) (owner: 10Nik Gkountas)