[06:52:21] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10AKhatun_WMF) @MPhamWMF Hi, could you please clarify the question `Is there an optimal separation... [10:38:16] 10Lift-Wing: Integrate cert-manager/issuer in ml-serve clusters - https://phabricator.wikimedia.org/T298976 (10elukey) I had a chat with Janis today, and IIUC for `inference.discovery.wmnet` we should: 1) Create a new signing profile like https://gerrit.wikimedia.org/r/c/operations/puppet/+/745496 (and set the... [11:49:30] * elukey lunch! [12:32:06] 10Machine-Learning-Team, 10ORES, 10translatewiki.net, 10Security, 10Vuln-DoS: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10Nikerabbit) Can this task be closed? ORES is not in use for translatewiki.net. [12:34:24] 10Machine-Learning-Team, 10ORES, 10translatewiki.net, 10Security, 10Vuln-DoS: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10Ladsgroup) 05Open→03Declined Yes. ORES is being moved to a new infra but that still doesn'... [15:00:29] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10MPhamWMF) @AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant w... [15:16:30] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10AKhatun_WMF) >>! In T288262#7628599, @MPhamWMF wrote: > @AKhatun_WMF , sorry, it's been a while... [15:17:58] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10MPhamWMF) Oh cool, no need to reinvent the wheel then! we can just use the current solution then [15:45:31] * elukey bbiab [16:02:49] o/ [16:06:23] o/ [16:10:31] I am currently trying to deploy cert-manager on ml-serve-eqiad [16:12:38] nice! [16:14:56] it is re-using what serviceops has done, and it will handle (initally) inference.discovery.wmnet [16:15:08] hopefully after it we'll also move the webhook certs etc.. [16:15:12] so we'll not use the puppet ca [16:17:52] ahh ok, does this mean we won't need the `wmf-certificates` package in the images too? [16:22:06] nono we will, it contains the ca bundle to support puppet CA and PKI [16:22:22] (the PKI is basically a new root CA, based of cfssl) [16:23:10] gotcha, ok that makes sense [16:23:41] accraze: I got some answers from kserve upstream about processes etc.. [16:23:48] not sure what is best for us [16:23:58] I left some comments in the task, lemme know your thoughts [16:24:06] (not urgent, when you have time) [16:24:15] will do! [16:24:34] i was actually just reading up a bit more on our options [16:24:45] in theory the ray workers sound really cool [16:25:20] but i don't know enough about ray yet to know if it will increase complexity [16:27:17] I am not sure if it will be much better that increasing the tornado workers [16:32:33] agreed, let's try increasing the tornado workers for the 'frontend-endpoint' first before digging into the ray workers [16:33:01] (also wow i like the new kserve doc site redesign) [16:39:29] hmmm it seems like jenkins is down? [16:39:45] elukey: what project should I put the ECC error task on? [16:41:27] 10Lift-Wing: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman) [16:42:30] klausman: ops-codfw I'd say [16:42:58] 10Lift-Wing, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman) [16:43:02] elukey: we're not running anything of substance in codfw, right, so the usual cookbook-enabled Icingo downtime+reboot should be enough [16:44:47] it should yes [16:45:53] 10Lift-Wing, 10SRE, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10ops-monitoring-bot) Host rebooted by klausman@cumin2001 with reason: Reboot to clear ECC state in dmesg [16:58:32] 10Lift-Wing, 10SRE, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman) `root@ml-serve2001:/sys/devices/system/edac/mc# grep . mc*/*count mc0/ce_count:0 mc0/ce_noinfo_count:0 mc0/ue_count:0 mc0/ue_noinfo_count:0 mc1/ce_count:0 mc1/ce_noinfo_coun... [17:06:49] folks I have to go in a bit, deployments for admin_ng stuff are broken on ml-serve-eqiad/codfw due to https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/754981 (needs to be merged etc.. but CI is under maintenance) [17:08:21] no worries elukey! thanks for the update [17:08:30] +1'd anyway :) [17:09:43] 10Lift-Wing, 10SRE, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10Papaul) confirmed all green in IDRAC [17:18:36] thanks! [17:18:41] going afk for a run, ttl! [17:19:17] see ya later elukey [17:39:39] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10Lydia_Pintscher) Yeah I think the underlying question we came to with this was if it would make... [17:44:41] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10AKhatun_WMF) >>! In T288262#7629267, @Lydia_Pintscher wrote: > @AKhatun_WMF: You mention on the... [17:45:20] 10Machine-Learning-Team, 10DC-Ops, 10ops-codfw: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH) [17:45:46] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-codfw: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH) [17:46:04] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-codfw: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH) a:03Papaul [17:54:48] 10Machine-Learning-Team, 10ORES, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Estimate how many Wikidata items have low/no ORES score - https://phabricator.wikimedia.org/T288262 (10Lydia_Pintscher) Ahh makes sense. Probably not worth bothering then. [18:03:09] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH) [18:03:41] I was planning on grooming the phab board on monday, but apparently that was a day off for me. Let's do it this Wednesday for this week then go back to Mondays [18:03:44] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH) [18:04:13] 10Machine-Learning-Team, 10DC-Ops, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH) a:03Jclark-ctr [18:11:49] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install ml-serve200[5-8] - https://phabricator.wikimedia.org/T294945 (10Papaul) [18:18:39] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install ml-staging200[12] - https://phabricator.wikimedia.org/T294946 (10Papaul) [20:27:47] hmm currently working more on the draftquality-transformer [20:28:04] it seems the transformer and predictor images will be nearly the same size [20:28:37] (for draftquality, probably editquality + article|drafttopic too) [20:28:55] due to loading revscoring deps and assets into both [20:30:22] it would be soo nice if we could separate feature extraction and scoring into different packages [20:31:03] the only way it would work with revscoring is if we had a feature store with revision features pre-computed though [21:29:54] 10Lift-Wing, 10Machine-Learning-Team, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2021): Retraining models from ORES to be deployable on Lift Wing - https://phabricator.wikimedia.org/T278261 (10srishakatux) GSoC 2021 is long over. Is there anything remaining in this task before it can be resolve... [21:36:19] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Factor out feature retrieve functionality to a transformer - https://phabricator.wikimedia.org/T294419 (10ACraze) [21:36:21] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Draftquality transformer - https://phabricator.wikimedia.org/T298989 (10ACraze) 05Open→03In progress