[01:07:57] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 12th round of wikis - https://phabricator.wikimedia.org/T308137 (10kevinbazira) [01:08:38] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 12th round of wikis - https://phabricator.wikimedia.org/T308137 (10kevinbazira) 23/23 models were trained successfully in the 12th round of wikis. [06:56:05] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [07:00:29] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) m1-master and m2-master proxies failed over [07:01:09] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [07:21:54] (03PS2) 10Elukey: python: remove FIXME and refactor some exceptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) [07:22:19] (03CR) 10CI reject: [V: 04-1] python: remove FIXME and refactor some exceptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [07:23:19] (03CR) 10Elukey: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [07:24:03] (03CR) 10Elukey: "Folks I added a change to events.py, based on some feedback that Andrew Otto gave me in a previous CR (so that we don't build again docker" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [07:35:07] hello folks [07:35:17] https://integration.wikimedia.org/ci/job/inference-services-pipeline-outlink/296/console seems broken for a generic pipeline problem [07:35:51] 10Machine-Learning-Team, 10serviceops-radar, 10Language-Team (Language-2023-January-March): Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10santhosh) [07:36:13] could it be https://gerrit.wikimedia.org/r/c/integration/config/+/894640 ? [07:37:32] 10Machine-Learning-Team, 10serviceops-radar, 10Language-Team (Language-2023-January-March): Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10santhosh) Created a parent ticket https://phabricator.wikimedia.org/T331505 to track the overall progress [07:38:36] mmm probably not [07:39:56] asked in #wikimedia-releng [08:01:57] o/ [08:02:47] * elukey commuting [08:02:53] o/ [08:14:55] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10MoritzMuehlenhoff) [08:59:52] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [09:00:20] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [09:01:56] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [09:09:21] (03CR) 10Hashar: "recheck after revert of https://gerrit.wikimedia.org/r/c/integration/pipelinelib/+/895673 for CI test error | T331497" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:12:44] (03CR) 10AikoChou: [C: 03+1] "Looks good to me!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:17:48] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "+1 for the relaxed versions" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:25:25] aiko: o/ thanks! [09:26:04] (03CR) 10Elukey: [C: 03+2] python: remove FIXME and refactor some exceptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:35:46] (03Merged) 10jenkins-bot: python: remove FIXME and refactor some exceptions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895132 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:42:50] 10Machine-Learning-Team: Delete old ml-related docker images that are deprecated - https://phabricator.wikimedia.org/T331513 (10elukey) [09:45:20] 10Machine-Learning-Team: Delete old ml-related docker images that are deprecated - https://phabricator.wikimedia.org/T331513 (10elukey) Candidates for deletion: https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-articlequality/tags/ https://docker-registry.wikimedia.org/... [09:48:41] is there a retention policy in our docker registry ? e.g. delete images that have not been pulled for X months etc. . Searched wikitech but dint find anything https://wikitech.wikimedia.org/wiki/Docker-registry [09:50:21] not really, IIUC we keep them until a specific cleanup is triggered [09:57:03] Grazie! [10:00:49] 10Machine-Learning-Team: Delete old ml-related docker images that are deprecated - https://phabricator.wikimedia.org/T331513 (10isarantopoulos) Thanks for raising this. I confirm, we are not using the above images anymore. [10:02:43] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Clement_Goubert) [10:05:57] 10Machine-Learning-Team: Delete old ml-related docker images that are deprecated - https://phabricator.wikimedia.org/T331513 (10achou) And also this https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services/tags/ ? [10:15:36] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 12th round of wikis - https://phabricator.wikimedia.org/T308137 (10kevinbazira) Model evaluation has been completed and below are the backtesting results: | | Precision@0.5 | Recall@0.5 |mhrwiki | 0.93 | 0.34 |miwiki... [10:54:27] deployed all the images to staging, httpbb works except nsfw [10:54:37] I'll deploy later on to prod, and then we should be good :) [10:58:06] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10cmooney) 05Open→03Resolved [11:27:22] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [11:38:33] * elukey lunch [11:57:24] (03PS19) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) [12:03:22] I was also looking to add running unit tests in ci as an extra pre-commit step which would be the same for all images. but pytest returns exit code 5 if it finds no tests (we don't have in some images. So either we add some tests (even a dummy one) in all images or we manipulate the exit code on run [12:03:46] anyway will revisit in another patch [12:03:58] * isaranto goes to do the stuff he is supposed to be doing :) [12:14:20] I had a question in the context of Ores legacy service: IIUC wp10 is articlequality model -> https://phabricator.wikimedia.org/T196240 [12:14:20] so would it be safe in the legacy service to "translate" wp10 to articlequality liftwing call? [12:14:38] I'm pasting the same question on phab for tracking [12:17:40] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10isarantopoulos) In some contexts we have a model named `wp10` e.g. -> https://ores.wikimedia.org/v3/scores/enwiki/ To the best of my knowledge this is the artic... [13:11:17] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10achou) > In the context of this API we are building is it safe to translate/redirect wp10 calls to articlequality on liftwing? From what I read wp10 is an alia... [13:13:29] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10isarantopoulos) Thanks @AikoChou ! That's what I understood but wanted to be sure :) . Will proceed with this [13:24:39] isaranto: the name of wp10 comes from the Wikipedia:Version 1.0 Editorial Team https://en.wikipedia.org/wiki/Wikipedia:Content_assessment :) [13:25:09] ack, thanks! [13:42:18] 10Machine-Learning-Team, 10ORES, 10Wikimedia Enterprise: Investigate tools that use ORES - https://phabricator.wikimedia.org/T330854 (10isarantopoulos) A bit overwhelming but I am creating a list with what and who uses ORES based on the following resources: https://www.mediawiki.org/wiki/ORES/Applications a... [13:49:08] Good morning all! [13:50:47] \o heyo Chris [13:51:48] o/ [13:53:26] isaranto: from my POV the migration endpoint change is LGTM, but I don't want to stomp on Luca [13:53:30] ...'s comment [13:53:34] (also not on Luca...) [14:04:37] klausman: thanks! Luca's comments were valid and will be addressed ✔️ (mostly talking about error/exception handling). [14:05:14] the reason I requested a review before the app is fully done was just to make sure that every1 is on the right page and I don't go on and fully implement sth else [14:05:38] Yeah, something entirely new is best reviewed incrementally [14:15:25] klausman: just saw the video form rob miles u posted earlier. really interesting! [14:15:58] isaranto: checking the code review now [14:16:03] Yeah, it's magic incantations and power over genies :) [14:16:53] one qs about https://phabricator.wikimedia.org/T330854 - are you using it to collect all ores tools or only the WME ones? In case the answer is the former it may be confusing when the enterprise folks will read the task :) [14:19:49] The task refers to all tools. yeah you're right. I just wrote it as an update to the task though [14:20:12] I can delete the comment and repost it after we get a response from WME :) [14:24:22] (03CR) 10Elukey: [C: 03+1] "I think it is a good first version!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [14:38:41] (03CR) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [14:44:41] 10Machine-Learning-Team, 10ORES, 10Wikimedia Enterprise: Investigate tools that use ORES - https://phabricator.wikimedia.org/T330854 (10HShaikh) @elukey thanks for reaching out. @prabhat would be a good resource to talk to regarding any migrations related to moving from ores model. and how deeply Ores is int... [14:46:44] 10Machine-Learning-Team, 10ORES, 10Wikimedia Enterprise: Investigate tools that use ORES - https://phabricator.wikimedia.org/T330854 (10Ottomata) FWIW, there are about [[ https://grafana-rw.wikimedia.org/d/znIuUcsWz/eventstreams?orgId=1&refresh=1m&var-dc=codfw%20prometheus%2Fk8s&var-service=eventstreams&from... [14:48:21] all model servers on kserve 0.10 [14:48:23] \o/ [14:51:09] \o/ 🎉 [14:51:24] congrats elukey: great work! [14:53:50] Hooray! [14:54:34] running httpbb to double check, hopefully only nswf is broken [14:55:21] yep! [14:56:14] 10Machine-Learning-Team: Upgrade Kserve's k8s control plane to 0.10 - https://phabricator.wikimedia.org/T331114 (10elukey) New control plane deployed on all clusters and tested! [14:57:34] 10Machine-Learning-Team: Upgrade the inference-services repo codebase to kserve 0.10 (fastapi) - https://phabricator.wikimedia.org/T329032 (10elukey) Task completed, all clusters upgraded to kserve 0.10. The nsfw model doesn't work but since it is experimental we'll follow up in T331416 [14:58:00] 10Machine-Learning-Team: Upgrade the inference-services repo codebase to kserve 0.10 (fastapi) - https://phabricator.wikimedia.org/T329032 (10elukey) a:03elukey [15:01:49] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10Ladsgroup) >>! In T329071#8662812, @elukey wrote:... [15:06:14] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) [15:09:40] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) @calbon This is probably a task that Event Platform will do, unless you want @AikoCh... [15:16:57] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10JArguello-WMF) [15:22:58] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10cmooney) [15:25:20] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10cmooney) [15:56:44] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: EnWiki Recent Changes Page no longer displays damaging filters - https://phabricator.wikimedia.org/T331045 (10calbon) Did it get renamed or something? I swear it used to be called "Damaging predictions" group filter [15:58:00] mmm I can't see events flowing to kafka when I try to use the event functionality [15:58:21] ah no wait maybe it is me being stupid [15:58:23] let's see [16:02:48] ok it works but I found a bug [16:04:51] (03PS1) 10Elukey: revscoring: relax schema version checks when retrieving the rev-id [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895825 [16:06:49] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: EnWiki Recent Changes Page no longer displays damaging filters - https://phabricator.wikimedia.org/T331045 (10isarantopoulos) Not 100% sure if I'm looking at the correct place but from what I [[ https://github.com/wikimedia/mediawiki-extensions-ORES/blame/1... [16:07:34] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/895825 (owner: 10Elukey) [16:08:16] what bug? 🐞 [16:08:37] :D [16:08:54] it is annoying to have the schema checks that we support in two places [16:09:00] maybe we should refactor it [16:09:04] anyway, not super urgent [16:10:21] good point [16:23:59] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10elukey) >>! In T329071#8676678, @Ladsgroup wrote:... [16:31:21] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10Ottomata) > wouldn't be able to populate the Maria... [16:37:00] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10diego) >The problem that I see with 1) is that we... [16:37:33] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10achou) +1 having separate schemas for classification models, embeddings, and recommendat... [16:37:38] 10Machine-Learning-Team, 10API Platform, 10Platform Team Initiatives (API Gateway): API-Gateway: lift auth restriction for POST requests - https://phabricator.wikimedia.org/T331547 (10elukey) [16:37:49] klausman: I opened --^ [16:38:09] please add any comment/info that you want to it, it is just to kick off the conversation [16:40:40] * elukey afk for a walk [16:43:24] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 6th round of wikis - https://phabricator.wikimedia.org/T304550 (10Sgs) >>! In T304550#8669991, @Trizek-WMF wrote: > All models work fine except: > * **cbk-zam**: search returns add a link results, bu... [16:45:53] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10Ladsgroup) >>! In T329071#8676986, @elukey wrote:... [16:54:16] elukey: thanks! [17:02:45] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10Ottomata) > much simpler via the suggested idea. I... [17:27:31] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Edit-Review-Improvements-Integrated-Filters, 10Event-Platform Value Stream, and 2 others: Integration of Revert Risk Scores to Recent Changes as a filter - https://phabricator.wikimedia.org/T329071 (10elukey) >>! In T329071#8677053, @Ladsgroup wrote:... [17:30:31] * elukey afk! [17:30:34] have a nice rest of the day folks [18:17:13] 10Machine-Learning-Team: The nsfw model hangs in predict() after moving to Kserve 0.10 - https://phabricator.wikimedia.org/T331416 (10Htriedman) @elukey not exactly sure what's going on here, but I can check into it and get back to you! [18:19:58] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) > Moreover, the explainer functionality is still in the exploratory phase in L... [18:52:06] 10Machine-Learning-Team: The nsfw model hangs in predict() after moving to Kserve 0.10 - https://phabricator.wikimedia.org/T331416 (10elukey) @Htriedman thanks a lot! If you want to test the docker image: https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-nsfw/tags/ (the...