[08:07:28] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:16:36] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10Marostegui) [08:20:57] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:24:35] 10Lift-Wing, 10Machine-Learning-Team: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10achou) [08:25:21] 10Lift-Wing, 10Machine-Learning-Team: Deploy Revert-risk wikidata model to ml-staging - https://phabricator.wikimedia.org/T333125 (10achou) [08:42:32] 10Lift-Wing, 10Machine-Learning-Team: Investigate Explainer for Revert-Risk model - https://phabricator.wikimedia.org/T330131 (10achou) [08:43:50] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10fgiunchedi) [08:45:55] 10Lift-Wing, 10Machine-Learning-Team: Investigate Explainer for Revert-Risk model - https://phabricator.wikimedia.org/T330131 (10achou) Tree SHAP is a white-box model, which means we need to load the model into the explainer. This makes it like another predictor, as it needs to perform all the tasks that the p... [08:52:53] hello folks :) [08:53:26] so after some review, the rc1 page_change topic produces new events only in codfw right now, like revision-create [08:53:54] this is expected since we accept mediawiki edits only in one DC at the time [08:54:11] when we switchback to eqiad we'll see events coming in from the eqiad topic [08:54:24] so our streams will be served only by one Lift Wing cluster at the time [08:54:37] Hi folks! [08:54:47] hey isaranto! welcome back :) [08:55:11] thnx! good to be back! [08:56:58] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [08:57:43] the mediawiki.revision_score_drafttopic topic gets regular events in codfw, and lift wing looks ok [08:57:50] so we have our first stream now :) [08:58:20] I saw it, great stuff! 🎉 [08:58:33] catching up on stuff will return with questions if needed [09:10:36] 10Machine-Learning-Team, 10Platform Team Workboards (Platform Engineering Reliability): Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 (10elukey) It is ok to see events only in Lift Wing codfw because, as expected, page_change emits events only in the... [09:13:16] 10Machine-Learning-Team: Investigate if/how to enable the swagger UI for InferenceService resources - https://phabricator.wikimedia.org/T332602 (10elukey) a:05elukey→03None [09:13:39] isaranto: one thing that we are going to do this week is a sort of planning for next quarter, ideally happening instead of the team meetings [09:14:12] ack! I'll spend some time to prepare as well [09:14:12] if possible we should create some tasks and place them in the related backlog column, but of course you are just got back so no rush :) [10:02:06] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10hnowlan) [10:02:24] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ArielGlenn) [10:03:09] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10hnowlan) [10:41:05] * elukey afk! Lunch [11:11:32] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Chinese-Sites, 10User-notice: Deploy "add a link" to 14th round of wikis - https://phabricator.wikimedia.org/T308139 (10kevinbazira) The conclusion on the backtesting results is that most of the languages look fine besides: - wuuwiki, zh_classicalwiki... [11:14:59] 10Machine-Learning-Team, 10Language-Team, 10Epic: Migrate Content Translation Recommendation API to Lift Wing - https://phabricator.wikimedia.org/T308164 (10Pginer-WMF) [11:16:10] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Chinese-Sites, 10User-notice: Deploy "add a link" to 14th round of wikis - https://phabricator.wikimedia.org/T308139 (10kevinbazira) [11:24:36] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10Jelto) [11:34:08] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [12:12:40] * isaranto afk lunch! [13:08:54] hi folks! [13:09:09] so I have to change the Redis password for the ORES score cache [13:09:21] this will imply a little outage for us, but we can't really do things differently [13:09:28] should be only a few mins [13:09:59] 10Lift-Wing, 10Machine-Learning-Team: Improve Outlink topic model by using add-a-link model results for articles with few links - https://phabricator.wikimedia.org/T333159 (10achou) [14:04:05] 10Machine-Learning-Team, 10Platform Team Workboards (Platform Engineering Reliability): Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 (10elukey) Verified with Joseph, the data can be seen in hive -> event database -> mediawiki_revision_score_drafttopi... [14:29:35] PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2007 bytes in 1.060 second response time https://wikitech.wikimedia.org/wiki/ORES [14:30:09] 10Machine-Learning-Team, 10Data-Engineering, 10Research, 10Event-Platform Value Stream (Sprint 10): Design event schema for ML scores/recommendations on current page state - https://phabricator.wikimedia.org/T331401 (10Ottomata) > If this is the case, we can update the schema version accordingly at that ti... [14:38:59] RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 983 bytes in 1.190 second response time https://wikitech.wikimedia.org/wiki/ORES [14:51:16] Hey , I opened a ticket https://phabricator.wikimedia.org/project/view/956/ so that we could have some more access to k8s needed for debugging. [14:51:16] I was trying to check wether scale to zero works. e.g. on staging for ruwiki-goodfaith it seems to work as no pods exist but scale up doesnt work when I try to make a request - I just get 404 [14:53:05] at the moment I cannot get/describe the deployed `inferenceservices` so after we get some more access I will continue [14:53:39] 10Lift-Wing, 10Machine-Learning-Team, 10SRE-Access-Requests: Machine Learning team - k8s resources ccess - https://phabricator.wikimedia.org/T333174 (10isarantopoulos) [14:53:48] weird, I am in the middle of a rollout, will check later [14:54:29] not in a hurry [14:55:06] 10Lift-Wing, 10Machine-Learning-Team, 10SRE-Access-Requests: Machine Learning team - k8s resources ccess - https://phabricator.wikimedia.org/T333174 (10isarantopoulos) [15:32:27] isaranto: ok I am back, what endpoint did you try? [15:33:17] ruwiki good faith on ml-staging [15:34:47] isaranto: we don't have ruwiki goodfaith in there, the one that you changed is zhwiki [15:34:51] I just tried and it works [15:34:58] I waited some seconds and I got a score [15:36:51] elukey: 🤦‍♂️ [15:36:59] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/900239 [15:37:17] ru was in articlequality [15:37:28] yep yep :) [15:37:36] anyway, good thing that scale to zero works [15:37:42] and it seems faster that I remembered [15:39:00] yeah, now I see the pod is terminating. I just wanted to measure how much time it takes to spin up [15:39:41] * elukey nods [15:41:40] it took 9.27seconds to spin up the pod and return a response for editquality [15:41:52] niiice [15:42:32] yeah more or less what I remembered [15:43:03] but if we have a giant binary model to pull from swift it may be another story [15:43:14] def [15:43:35] downloading + loading takes a lot of time [15:43:40] both I mean take time [15:48:34] * elukey bbiab [16:42:17] going afk folks! [16:42:23] have a nice rest of the day :) [17:03:39] 10Machine-Learning-Team, 10ORES, 10Advanced-Search, 10All-and-every-Wikisource, and 69 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10Jdlrobson) [22:09:46] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Volans) >>! In T330165#8731601, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolfo... [23:17:25] 10Machine-Learning-Team, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10colewhite)