[06:42:04] hello folks! [08:56:19] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8] - https://phabricator.wikimedia.org/T294949 (10cmooney) These hosts hit the ARP issue described in T306421, and have been offline following re-image until this morning: https:... [09:00:02] elukey o/ [09:00:02] Thanks for the merge. [09:00:02] I was planning to deploy on eqiad. [09:00:02] Please let me know incase there is some work being done on eqiad or I should proceed with the deployment. [09:00:58] (03CR) 10Elukey: [C: 03+1] "Left some nits for the commit msg, but the code looks good!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/783843 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:01:23] (03CR) 10Elukey: [C: 03+1] "Left a nit but the code looks good" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778250 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:02:05] (03CR) 10Elukey: [C: 03+1] draftquality: add the ORES augmented feature output (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778225 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:05:14] (03CR) 10Elukey: "I have some doubts about the predictor code, I left some comments about it!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778248 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:06:08] kevinbazira: o/ yep please go ahead on eqiad! I'll try to fix codfw today/tomorrow so after it we'll be free from maintenance-related problems :) [09:06:41] great. thanks for the confirmation. starting deployment now ... [09:25:24] Both editquality damaging and goodfaith deployments on eqiad have been completed successfully. [09:25:24] checking eqiad isvcs pods now ... [09:29:11] the new isvcs are up and running: [09:29:12] NAME READY STATUS RESTARTS AGE [09:29:12] ruwiki-damaging-predictor-default-bcxqj-deployment-f89bcc845nbb 3/3 Running 0 9m12s [09:29:12] ruwiki-goodfaith-predictor-default-npfz4-deployment-5999956lhsl 3/3 Running 0 5m14s [09:29:12] sqwiki-damaging-predictor-default-jxp27-deployment-b5755dbpwvcv 3/3 Running 0 9m11s [09:29:12] sqwiki-goodfaith-predictor-default-hvjkw-deployment-7c4d55j6xp8 3/3 Running 0 5m13s [09:29:14] srwiki-damaging-predictor-default-nfwq2-deployment-948756btczc7 3/3 Running 0 9m10s [09:29:16] srwiki-goodfaith-predictor-default-mwrqc-deployment-d9c5847jstq 3/3 Running 0 5m12s [09:32:40] good! [09:32:56] do you know, more or less, how many more are left? [09:41:05] (03PS3) 10AikoChou: articlequality: add the ORES augmented feature output [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778248 (https://phabricator.wikimedia.org/T301766) [09:45:42] (03CR) 10AikoChou: "Thanks for reviewing! I replied to your question." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778248 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:49:39] (03PS2) 10AikoChou: editquality: fix incorrect values in augmented feature output [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/783843 (https://phabricator.wikimedia.org/T301766) [09:50:12] (03CR) 10Elukey: articlequality: add the ORES augmented feature output (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778248 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:54:23] (03CR) 10AikoChou: "Thanks for the suggestion! 😊" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/783843 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [09:55:52] (03PS3) 10AikoChou: topic: add the ORES augmented feature output [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778250 (https://phabricator.wikimedia.org/T301766) [09:59:49] (03PS4) 10AikoChou: topic: add the ORES augmented feature output [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778250 (https://phabricator.wikimedia.org/T301766) [10:04:24] (03PS3) 10AikoChou: draftquality: add the ORES augmented feature output [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778225 (https://phabricator.wikimedia.org/T301766) [10:05:31] (03CR) 10AikoChou: draftquality: add the ORES augmented feature output (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778225 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [10:07:15] (03CR) 10AikoChou: topic: add the ORES augmented feature output (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778250 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [10:18:39] (03CR) 10AikoChou: articlequality: add the ORES augmented feature output (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/778248 (https://phabricator.wikimedia.org/T301766) (owner: 10AikoChou) [10:30:47] going afk in a few for lunch, ttl! [10:42:43] elukey about 13/76 editquality models are left. [12:58:26] kevinbazira: ack thanks! [13:17:27] Morning all! [13:19:14] o/ [13:29:52] I am re-initializing the ml-serve-codfw cluster, it will likely take 2/3 hours [14:21:59] how'd it go? [14:25:46] still in progress :) [14:26:16] but me and Tobias should have caught all corner cases with eqiad, I hope this one will take less [14:41:23] once the reimages are done (ETA ~1 or 2 hours) we should be able to just re-deploy all k8s configs via helm [14:41:31] and then the pods should come up [15:04:01] we also have 4 new ml-serve nodes for eqiad, I'll add them tomorrow [15:39:56] elukey: o/ hi Luca! regarding the task https://phabricator.wikimedia.org/T302851, did you get the user/password for revscoring release from halfak? [15:40:54] aiko: o/ hi! I wanted to follow up with you, Aaron gave me a password but it doesn't work, I am trying to follow up with him again to see if we can change it [15:41:25] the main issue is that the email of the pypi account is, IIUC, scoring-internal@, that is an old mailing list probably not up anymore (maintained by the WMF) [15:41:31] so resetting the password may be tricky [15:41:47] once done we can publish revscoring and deploy it :) [15:41:52] does it sound ok? [16:06:35] yep, sounds good to me. :) [16:53:36] elukey: If there is something I could help, please let me know :) [17:06:42] ack! [17:20:44] aiko: https://pypi.org/project/revscoring/2.11.2/ :) [17:20:45] done! [17:22:27] 10ORES, 10Machine-Learning-Team (Active Tasks): revscoring feature extraction error for wikitext papes in Wikidata - https://phabricator.wikimedia.org/T302851 (10elukey) https://pypi.org/project/revscoring/2.11.2/ is published, next step is to prepare the changes to deploy ORES and to test them in deployment-p... [17:39:56] ok so the ml-serve-codfw cluster is up again, up to kserve. I'll deploy the revscoring pods tomorrow morning, but it looks good :) [17:40:02] going afk! have a nice rest of the day folks [21:52:01] chrisalbon: https://arxiv.org/abs/2204.06974 This one's sorta scary [21:52:31] Lol well fuck [21:52:44] Roughly my thoughts as well [21:53:26] I don't know how feasible the attack is (and what its merits/payback would be). But it is another of those Transparency trumps all things