[05:55:35] (03CR) 10Kevin Bazira: [C: 03+1] Update wheels submodule with latest changes [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/798894 (owner: 10AikoChou) [06:36:06] good morning :) [06:43:22] it is weird, since the 25th at around 9:30 UTC we are seeing a LIST latency for k8s that is really high [06:43:25] https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&var-datasource=thanos&var-site=eqiad&var-cluster=k8s-mlserve&from=now-7d&to=now [06:43:28] both codfw and eqiad [06:43:43] seems matching with 504s thrown by the api [06:44:03] and in the logs I can see a lot of [06:44:03] "List" url:/apis/networking.internal.knative.dev/v1alpha1/certificates (started: 2022-05-30 06:39:31.05167811 +0000 UTC m=+1785897.737582693) (total time: 3.003551571s): [06:44:06] May 30 06:39:34 ml-serve-ctrl1002 kube- [06:45:29] Tried to restart kube-api on ml-serve-ctrl1002 [06:47:31] the master is now on 1001 and I don't see errors anymore [06:50:00] also restarted on ml-serve-ctrl2002 [06:50:01] mah [06:50:08] it looks like it is working in eqiad [07:12:56] I found https://awesome-astra.github.io/docs/pages/tools/integration/feast/ for the integration between feast and cassandra [07:13:08] it seems not part of feast though, but an external plugin [07:15:33] https://github.com/feast-dev/feast/pull/475 seems a failed attempt to add cassandra support to feast [07:16:59] and https://github.com/feast-dev/feast/pull/1875 it seems related to the above plugin for Astra [07:17:24] they are saying that 3rd party integrations should be in separate repos, like the Hive one [07:17:27] sigh [07:44:04] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) Reporting in here some details about what we discussed with Eric via email. The extended use cases that we are trying to implement are two bas... [07:47:40] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) As discussed with Eric, I tried to get more info about the actual state of implementation of the Cassandra connector in Feast. https://github... [07:47:48] added all info info in --^ [07:49:44] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) The next step is trying to figure out if our use cases could be onboarded on AQS (so avoiding a new cluster) or if the ml-cache cluster is nee... [07:49:59] there is also https://anchor.fm/featurestore/episodes/Kubeflow--FEAST-With-David-Aronchick-Co-creator-of-Kubeflow-e120k48 [07:50:07] will try to listen to it [08:03:17] 10Machine-Learning-Team, 10Data-Services, 10Wikilabels, 10Cloud-VPS (Debian Stretch Deprecation), 10cloud-services-team (Kanban): Upgrade wikilabels databases to buster/bullseye - https://phabricator.wikimedia.org/T307389 (10elukey) Thanks a lot for the ping! I think that this is an old project that we (... [08:05:08] 10Machine-Learning-Team: [DSE Hackathon] Sounds of the Commons: Neural Audio Mashups - https://phabricator.wikimedia.org/T292306 (10elukey) 05Open→03Resolved a:03elukey [08:10:30] 10Machine-Learning-Team, 10ORES: Migrate ORES/Revscoring/etc. repos to Gitlab or Gerrit - https://phabricator.wikimedia.org/T264651 (10elukey) Given the big investment in time required to move the current ORES repos to gitlab, I'd probably shift the focus to LiftWing-related repos. What do you think? [08:33:09] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10ORES: Mismatch between ORES and MW score in ro.wp on the "damaging" model - https://phabricator.wikimedia.org/T299268 (10elukey) Hi! Sorry for the late response, the task got buried among other reports! The MW API for recent changes seems not working a... [08:58:41] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10ORES: Mismatch between ORES and MW score in ro.wp on the "damaging" model - https://phabricator.wikimedia.org/T299268 (10achou) @Strainu I noticed the MW output you copied has a different rev_id with the ORES output. The ORES link points to a score of t... [09:01:09] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10ORES: Mismatch between ORES and MW score in ro.wp on the "damaging" model - https://phabricator.wikimedia.org/T299268 (10elukey) 05Open→03Invalid I am inclined to close this as invalid, but in case something is missing please re-open and we'll keep... [09:01:54] good work aiko :) [09:59:13] Morning [10:13:10] Oh that "mismatch" bug was subtle, I had to stare at Aikos answer for a whole minute before I spotted the difference :D [10:32:41] yeah Aiko did a great job :) [10:41:46] :D [13:28:04] kevinbazira_: you can deploy your articlequality changes if you want! [13:28:56] thanks for the merge. working on the deployment now ... [13:36:33] both eqiad and codfw deployments have been completed successfully. [13:36:33] checking pods now ... [13:38:20] NAME READY STATUS RESTARTS AGE [13:38:21] frwiki-articlequality-predictor-default-5bcx6-deployment-dl6nx5 3/3 Running 0 5m13s [13:38:21] frwikisourcewiki-artcbe553aa8aac2e82c274bbe1c54928b7-deploqtzp6 3/3 Running 0 5m14s [13:38:21] all new pods is up and running. \o/ [13:58:30] super :) [13:59:08] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update wheels submodule with latest changes [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/798894 (owner: 10AikoChou) [14:01:54] 10Machine-Learning-Team: Deploy revscoring 2.11.4 to ORES - https://phabricator.wikimedia.org/T309536 (10elukey) [14:02:35] aiko: created --^ to track the deploy of revscoring to ORES, let's write down a procedure etc.. and schedule the deploy. No rush, even next week [14:29:55] elukey: fyi, hugh and I bumped our sync to tomorrow just before the ML team meeting, since something rl has come up for Hugh. Upside is that I can give a summary on what we talked about from fresh(er) memory :) [14:31:04] ack! [15:59:43] have a good evening folks!