[04:44:40] 06Machine-Learning-Team: Move the article-descriptions model server from staging to production - https://phabricator.wikimedia.org/T358467#9756279 (10kevinbazira) 05In progress→03Resolved [04:46:42] 06Machine-Learning-Team: Create logo-detection model-server to be hosted on LiftWing - https://phabricator.wikimedia.org/T361803#9756284 (10kevinbazira) 05Open→03Resolved [04:51:53] (03PS2) 10Kevin Bazira: logo-detection: restrict image processing to trusted domains [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) [06:53:23] Good morning! [08:04:14] Morning! [08:12:12] o/ [09:25:33] hello! this is a bit of a long shot, but have ye been doing anything that might have caused increased impact on parsoid over the last 24h or so? Something finished at 00:00 exactly today and I can't nail down what was happening :) [09:27:54] I'm not aware of anything, but I was out yesterday. isaranto would probably know. [09:29:32] hnowlan: nope, we haven't done anything related to that [09:29:46] grand, thanks! [10:07:07] * klausman lunch [11:03:52] * isaranto lunch [11:10:39] Morning all! [11:14:08] hi Chris o/ [11:14:56] it's super early! [11:26:39] hey! [11:53:57] (03CR) 10Ilias Sarantopoulos: "Thanks for the explanation. Also my comment about localhost was wrong since we would still be requesting commons (but rewrite the url)." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [12:03:30] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9757394 (10isarantopoulos) @mfossati I am in favor of passing the image object in some serialized form. We would need the upload wizard to send a resized image (224x... [12:18:50] (03PS3) 10Kevin Bazira: logo-detection: restrict image processing to trusted domains [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) [12:19:31] (03CR) 10CI reject: [V:04-1] logo-detection: restrict image processing to trusted domains [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [12:22:33] kevinbazira: o/ we can discuss here and decide what to do before implementing [12:23:29] (03PS4) 10Kevin Bazira: logo-detection: restrict image processing to trusted domains [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) [12:25:16] (03CR) 10Kevin Bazira: "sure sure. I've fixed it to pass the allowed_domains through configuration." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [12:25:21] isaranto: o/ I responded to your comment in the patch ^--- [12:43:08] ok! thanks [13:04:31] folks about WIKI_URL, there will be changes after this week's istio config change, I'll explain more with the presentation :) [13:06:15] elukey: o/ ack. looking forward to the presentation :) [13:32:14] o/ elukey! ack [13:55:12] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9757868 (10elukey) All changes rebased and ready to go (for prod). The main idea is the following: * Remove WIKI_URL for revscoring isvcs, so we'll... [14:13:26] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: Revert Risk models are supported by caching in production - https://phabricator.wikimedia.org/T362672#9757994 (10calbon) Update: - Rebased code after prototype. - Waiting for istio change for making a new service, which is imminent - Need to add new visual... [14:13:55] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9757995 (10elukey) a:03elukey [14:14:16] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU - https://phabricator.wikimedia.org/T362670#9757997 (10calbon) Update: No update [14:15:15] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU - https://phabricator.wikimedia.org/T362670#9758000 (10calbon) Decision point: Do we upgrade ROCm drivers? Aiko is getting up to speed with how HF set up the interference endpoints and mayb... [14:19:31] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU - https://phabricator.wikimedia.org/T362670#9758018 (10calbon) We have a theory that the ROCm drivers on the debian package is not required. [14:22:53] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services - https://phabricator.wikimedia.org/T362674#9758029 (10calbon) Logging queries and logging when things are slow is the short term goal. Knowing WHY a query takes a long ti... [14:24:47] 06Machine-Learning-Team, 05Goal: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services - https://phabricator.wikimedia.org/T362674#9758039 (10klausman) [14:24:48] 06Machine-Learning-Team, 10ORES, 13Patch-For-Review: Add slow-logs for ML isvcs - https://phabricator.wikimedia.org/T362663#9758040 (10klausman) [15:27:36] klausman: codfw depooled for inference :) [15:39:57] ack! [15:57:14] (03PS1) 10Ilias Sarantopoulos: revertrisk: update locust results [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1025805 (https://phabricator.wikimedia.org/T361881) [16:09:05] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 07User-notice: Deploy "add a link" to 18th round of wikis (en.wp and de.wp) - https://phabricator.wikimedia.org/T308144#9758442 (10KStoller-WMF) [16:09:47] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 07User-notice: Deploy "add a link" to 18th round of wikis (en.wp and de.wp) - https://phabricator.wikimedia.org/T308144#9758448 (10KStoller-WMF) [16:10:14] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 07User-notice: Deploy "add a link" to 18th round of wikis (en.wp and de.wp) - https://phabricator.wikimedia.org/T308144#9758451 (10KStoller-WMF) [16:11:48] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 07User-notice: Deploy "add a link" to 18th round of wikis (en.wp and de.wp) - https://phabricator.wikimedia.org/T308144#9758459 (10KStoller-WMF) p:05Triage→03Medium [16:12:31] going afk folks, have a nice day/evening! [16:16:59] klausman: repooled codfw [16:17:01] isaranto: o/ [16:17:07] \o Ilias [16:17:28] elukey:ack re;repool. I didn't see any alerts or blwoings-up, so we're probably good to go on Thursday [16:20:11] metrics were also good, and codfw handles ~60 rps (more or less) compared to eqiad that does ~150/200 [16:30:01] bye Ilias! [16:44:05] Heading out as well. Happy labor day to all who have tomorrow off, and see y'all on Thursday [16:45:25] same to all of you folks! See you on Thursday!