[03:43:14] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:45:52] (03CR) 10Santhosh: WIP - Community-defined campaign translations (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1059945 (https://phabricator.wikimedia.org/T371515) (owner: 10Eamedina) [06:41:28] Good morning! [06:41:36] Taking a look in the alert above [06:46:19] 06Machine-Learning-Team: Reorganize LiftWing isvcs repo structure to improve maintainability - https://phabricator.wikimedia.org/T369344#10060426 (10kevinbazira) [06:52:16] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10060434 (10santhosh) Thanks @kevinbazira. I also tested, LGTM. From API consumer side, CX is ready to use this based on https://gerr... [06:54:00] 06Machine-Learning-Team: Reorganize LiftWing isvcs repo structure to improve maintainability - https://phabricator.wikimedia.org/T369344#10060436 (10kevinbazira) Following the conversation shown in the screenshot, we are going to remove the llm directory from the LiftWing isvc repo and its corresponding CI pipel... [07:25:02] There was a usage spike in codfw https://logstash.wikimedia.org/goto/8a575b7fdccbe005676ca8caacc5687a [07:25:39] the thing is that in multiprocessing we don't log the json payload so I'm filing a patch to fix that [07:29:03] (03PS1) 10Ilias Sarantopoulos: revscoring: log payload in multiprocessing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 [07:29:39] good morning folks :) [07:31:24] (03CR) 10CI reject: [V:04-1] revscoring: log payload in multiprocessing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 (owner: 10Ilias Sarantopoulos) [07:43:29] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:46:15] hey aiko! [07:51:58] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 (owner: 10Ilias Sarantopoulos) [07:56:33] I'm getting this in CI https://phabricator.wikimedia.org/P67283, seems like cloudevents has invalid metadata according to pip 24. will try to bump the package [07:56:50] (03PS2) 10Ilias Sarantopoulos: revscoring: log payload in multiprocessing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 [07:57:47] (03CR) 10Nik Gkountas: [C:03+2] Add support for using both topic and seed filters [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060333 (owner: 10Santhosh) [07:59:07] (03CR) 10CI reject: [V:04-1] revscoring: log payload in multiprocessing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 (owner: 10Ilias Sarantopoulos) [07:59:35] (03Merged) 10jenkins-bot: Add support for using both topic and seed filters [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060333 (owner: 10Santhosh) [08:07:23] I've made a patch to increase the asyncio workers https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1062342 [08:07:38] just trying to mitigate the issue [08:07:58] updating the revscoring image seems to be a disaster [08:27:11] Morning! [08:29:06] ο/ [08:29:59] Anything I can help with? [08:32:40] let me see what the above patch will do and we'll see [08:35:25] preprocessing is taking too long for the moment [08:37:42] Ah, MWPFH again? [08:42:16] not sure [08:43:09] it wasn't thaaat long as the other times (it was above 10s), could be due to the rev ids but we don't have json payload logging in multiprocessing [08:43:18] ack [08:44:11] so it has been going on for some days but the budget got burnt now [08:46:54] https://grafana.wikimedia.org/d/n3LJdTGIk/kserve-inference-services?orgId=1&var-cluster=codfw%20prometheus%2Fk8s-mlserve&var-namespace=revscoring-articlequality&var-component=All&var-model_name=enwiki-articlequality&from=now-6h&to=now [08:47:30] codfw has a lot more traffic than eqiad. And all this traffice is coming from MediaWiki [08:53:08] well the new language agnostic model will save us from all this horror [08:55:58] so I'm open for namespace recommendations.I suggest to group them per product: e.g. articlemodels. Models that operate on articles are more likely to use the same resources to fetch features etc. [08:56:15] and ofc we are talking about language agnostic versions of the models [08:56:28] any other suggestions welcome as I'm a bit short of ideas [08:57:00] I just want to avoid to create another namespace named `articlequality` [09:14:09] Do we want to distinguish revisions from articles in this context? [09:20:01] it would make sense to have them in the same "bucket", right? [09:35:49] klausman: when does the errorbudgetburn alert go away? I'm can't find the alert declaration in the alerts repo. is there a separate repo? [09:36:27] I think it's part of the grizzly templating [09:36:37] elukey: do you know where the SLO burn alert is defined? [09:37:13] as for articles vs revisions, this would mean that articlequality and revertrisk would be in the same NS, right? [09:38:23] Sorry, Purra, not Grizzly [09:38:30] Pyrra* [09:39:42] And the alert should stop once we are no longer on course to burn the budget before the end of the period [09:41:50] as for the Pyrra config, it's in Puppet: modules/profile/manifests/pyrra/filesystem/slos.pp [09:42:48] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/pyrra/filesystem/slos.pp if you like something clickable :) [09:44:06] So in this case 98%ile over 12w must be <5s latency [09:44:39] And errors is <2% 500s over 12w [09:49:14] thanks! iiuc that means that the alert may stop firing in the next quarter (end of Aug) if too much of the budget is burnt [09:50:50] on the articles vs revisions: yes revertrisk and articlequality would go in the same ns. same with articletopic-outlink. Given that the latter 2 are already in use we would have to move them gradually [09:51:09] I think it's a sliding window in this case [09:51:27] so if we serve <5s for a while, the alert should clear [09:52:20] as for the NSes, yeah, it sounds good to me. \ [09:58:28] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10060995 (10kevinbazira) @santhosh, thank you for the confirmation. @klausman, as shown in P67284, the rec-api has been deployed in pr... [10:04:41] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10061003 (10klausman) This is likely caused by a missing `/` in the API GW config. I will prepare a patch in a moment. For illustratio... [10:35:38] Hello [10:42:53] Hello I'm tryng to use LiftWing with PyWikibot as described at https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage#Example_usage_of_external_endpoint [10:44:00] On a wiki which is not on the "wikipedia" family [10:44:38] I got an erro WHen I try to add a family on data = {"lang": "en", family="another", "page_title": "Wings of Fire (novel series)"} [10:45:11] do you know how I can ask liftwing to check a page on another family which is not a wikimedia project ? [10:45:37] Hi nic25 ! which model do you want to use from Lift Wing? [10:45:49] articlequality [10:46:42] and  enwiki-damaging [10:46:48] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work, 13Patch-For-Review: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10061115 (10klausman) One thing to note is that with the current scheme (plus the slash-fixing patch above), we... [10:48:17] these models are designed to work on revisions of specific wikis so they won't work with any project. For example enwiki-damaging can be used to get predictions only for revisions of English Wikipedia and so on. [10:48:17] You can find the available models here https://meta.wikimedia.org/wiki/Machine_learning_models [10:49:44] ok do you know how to use a similar model ? The content of my wikis is close to wikipedia's content [10:52:25] for example can I use : Language Identification https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language_Identification ? [10:56:32] * klausman afk for lunch [10:57:53] unfortunately it wouldn't work. these models are trained on data from each wiki so they are specific to them. There are models that are language agnostic like revertrisk https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_language_agnostic_prediction but still these operate on language wikis (and wikidata) [10:58:39] language indentification works because it has nothing to do with a wiki. it just detects the language based on a string. The other models contact MediaWiki API to get content on the article/revision [11:09:10] + [11:26:17] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work, 13Patch-For-Review: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10061224 (10kevinbazira) @klausman, +1 on using `https://api.wikimedia.org/service/lw/recommendation/` as the AP... [11:27:19] * isaranto lunch! [11:39:00] (03PS1) 10Nik Gkountas: Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 [11:39:46] (03CR) 10CI reject: [V:04-1] Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 (owner: 10Nik Gkountas) [11:41:41] (03PS2) 10Nik Gkountas: Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 [11:43:29] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [11:50:21] That's an interesting alert [11:52:14] I'll put in a silence to keep the noise down [12:11:28] thanks! [12:43:08] isaranto, klausman - o/ re: error budget - I warned a while ago about the fact that Keith was testing Pyrra with the articlequality enwiki model server, it is not something that we added, just a test that needs to be refined etc.. I suggested at the time to follow up with Keith, same thing now :) [12:45:54] roger that [12:46:22] ack, will reach out [13:04:24] Morning all [13:11:29] \ο [13:43:34] 10Lift-Wing, 06Machine-Learning-Team: Request to host reference needed on Lift Wing - https://phabricator.wikimedia.org/T372405 (10XiaoXiao-WMF) 03NEW [13:49:39] (03CR) 10Sbisson: [C:03+1] Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 (owner: 10Nik Gkountas) [13:59:54] 06Machine-Learning-Team, 10Automoderator, 06Moderator-Tools-Team: Perform a load test for Multilingual Revert Risk on LiftWing - https://phabricator.wikimedia.org/T372298#10061786 (10isarantopoulos) @Samwalton9-WMF What is the expected load coming from Automoderator for large wikis? [14:02:41] 10Lift-Wing, 06Machine-Learning-Team: Request to update Readability model on Lift Wing - https://phabricator.wikimedia.org/T369712#10061796 (10isarantopoulos) p:05Triage→03Medium [14:21:52] 06Machine-Learning-Team, 10Automoderator, 06Moderator-Tools-Team: Perform a load test for Multilingual Revert Risk on LiftWing - https://phabricator.wikimedia.org/T372298#10061846 (10Samwalton9-WMF) @KCVelaga_WMF may be able to quantify further, but by default we request a score for every new edit in the mai... [14:24:18] 06Machine-Learning-Team, 10Automoderator, 06Moderator-Tools-Team: Perform a load test for Multilingual Revert Risk on LiftWing - https://phabricator.wikimedia.org/T372298#10061852 (10Samwalton9-WMF) I just recalled that we did get some data on this in T352026. [14:32:58] 06Machine-Learning-Team, 05Goal: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that uses an inference optimization engine in production. - https://phabricator.wikimedia.org/T371395#10061890 (10calbon) Infra - Setting up the puppet roles - Can't commit puppet roles until... [14:33:14] 06Machine-Learning-Team, 05Goal: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production. - https://phabricator.wikimedia.org/T371395#10061902 (10calbon) [14:35:54] 06Machine-Learning-Team, 05Goal: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU. - https://phabricator.wikimedia.org/T371396#10061921 (10calbon) Update: - Waiting for ml-lab machines to be delivered to the eqiad data center. [15:03:23] as you understand I'm deeply troubled about the namespace naming :D [15:33:45] ahhhh okok nice, I didn't see an answer and I thought to ask [15:33:46] nice work! [15:40:22] (03CR) 10Eamedina: [C:03+1] Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 (owner: 10Nik Gkountas) [16:14:05] aiko: I like your suggestion with a twist. I would just focus on the entity on which the model operates (e.g. article) and not the input as this may vary (article name or page_id for the article) [16:14:57] so perhaps we can go with `article-models` and `revision-models` for now. wdyt? I can just go ahead and create the first one since we're going to use it now [16:15:16] if there isn't a better naming suggestion that is [16:18:19] (03CR) 10Santhosh: [C:03+2] Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 (owner: 10Nik Gkountas) [16:19:01] (03Merged) 10jenkins-bot: Fix "most popular" recommendation endpoint [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1062374 (owner: 10Nik Gkountas) [16:19:33] I'm taking a swing at solving the issues in CI when trying to build the revscoring image https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1062286 [17:05:39] (03PS3) 10Ilias Sarantopoulos: revscoring: log payload in multiprocessing [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1062286 [17:05:57] I think I found a solution for now 🤞 [17:06:05] going afk folks, have a nice evening/rest of day [17:09:18] isaranto: +1 I don't have a better naming suggestion lol [17:09:39] have a nice evening! [17:13:18] I'll go with that one then! [17:13:38] Guten abend o/ [19:41:02] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T372432 (10RobH) 03NEW [19:41:23] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10062785 (10RobH) [19:43:23] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10062790 (10RobH) [19:44:33] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10062796 (10RobH) a:03klausman @klausman: Would you, or someone on your team, please update the puppet repo for these new h...