[09:15:07] (03PS1) 10Bartosz Wójtowicz: revise-tone-task-generator: Use multiple workers in a single deployment. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) [10:24:09] 06Machine-Learning-Team, 07Essential-Work: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663#11443241 (10gkyziridis) a:03gkyziridis [11:03:55] (03CR) 10Gkyziridis: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [11:35:39] (03PS10) 10Nik Gkountas: fix section suggestion fetching for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1214540 (https://phabricator.wikimedia.org/T384485) [11:52:57] (03CR) 10Bartosz Wójtowicz: revise-tone-task-generator: Use multiple workers in a single deployment. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [12:38:14] (03PS1) 10Nik Gkountas: Fix continue offset and seed for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216781 (https://phabricator.wikimedia.org/T384485) [12:43:07] (03CR) 10Gkyziridis: [C:03+1] revise-tone-task-generator: Use multiple workers in a single deployment. (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [12:55:35] (03CR) 10Bartosz Wójtowicz: [C:03+2] revise-tone-task-generator: Use multiple workers in a single deployment. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [13:05:38] (03Merged) 10jenkins-bot: revise-tone-task-generator: Use multiple workers in a single deployment. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1216754 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [13:44:28] (03CR) 10Sbisson: [C:03+2] Fix continue offset and seed for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216781 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [13:45:58] (03Merged) 10jenkins-bot: Fix continue offset and seed for section translation recommendations [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216781 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [13:47:43] (03CR) 10Sbisson: [C:03+2] fix section suggestion fetching for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1214540 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [13:48:22] (03Merged) 10jenkins-bot: fix section suggestion fetching for single page collections [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1214540 (https://phabricator.wikimedia.org/T384485) (owner: 10Nik Gkountas) [14:36:25] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the second batch of wikis: > 1000 AND <= 2000 monthly edits - https://phabricator.wikimedia.org/T411487#11444085 (10Kgraessle) [14:37:57] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the second batch of wikis: > 1000 AND <= 2000 monthly edits - https://phabricator.wikimedia.org/T411487#11444088 (10Kgraessle) [14:38:20] 06Machine-Learning-Team, 07Essential-Work: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663#11444092 (10gkyziridis) ==== Update ==== I think that the issue is the version incompatibility of `torch` and `rocm` (pytorch `2.4.1` is very old for this family of models)... [14:38:26] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for first batch of wikis: < 1000 monthly edits - https://phabricator.wikimedia.org/T411485#11444093 (10Kgraessle) [14:39:18] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for first batch of wikis: < 1000 monthly edits - https://phabricator.wikimedia.org/T411485#11444095 (10Kgraessle) [14:40:34] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the third batch of wikis: > 2000 AND <= 5000 monthly edits - https://phabricator.wikimedia.org/T411489#11444097 (10Kgraessle) [14:40:50] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the second batch of wikis: > 1000 AND <= 2000 monthly edits - https://phabricator.wikimedia.org/T411487#11444098 (10Kgraessle) [14:41:03] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the third batch of wikis: > 2000 AND <= 5000 monthly edits - https://phabricator.wikimedia.org/T411489#11444099 (10Kgraessle) [14:41:19] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the fourth batch of wikis: > 5000 AND <= 10000 monthly edits - https://phabricator.wikimedia.org/T411490#11444100 (10Kgraessle) [14:41:40] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the fifth batch of wikis: > 10000 AND <= 30000 monthly edits - https://phabricator.wikimedia.org/T411492#11444101 (10Kgraessle) [14:41:49] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the sixth batch of wikis: > 30000 AND <= 70000 monthly edits - https://phabricator.wikimedia.org/T411493#11444102 (10Kgraessle) [14:42:05] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for the seventh batch of wikis: > 70000 AND <= 150000 monthly edits - https://phabricator.wikimedia.org/T411494#11444103 (10Kgraessle) [15:16:34] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 06Moderator-Tools-Team, 07OKR-Work: Enable revert risk filters for first batch of wikis: < 1000 monthly edits - https://phabricator.wikimedia.org/T411485#11444179 (10Kgraessle) [15:22:49] (03PS1) 10Umherirrender: Use PHP8 constructor property promotion syntax for dependency injection [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1216820 [15:53:11] 06Machine-Learning-Team, 07Essential-Work: Upgrade AMD GPU + torch version of ML Labs machines - https://phabricator.wikimedia.org/T410663#11444307 (10Isaac) Thanks @gkyziridis for digging into this! Out of curiosity, why not jump to the current stable versions (2.9.1 for torch and 6.4 for AMD)? I see you comm... [16:08:44] FIRING: LiftWingServiceErrorRate: ... [16:08:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=recommendation-api-ng&var-backend=recommendation-api-ng-main.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:13:44] RESOLVED: LiftWingServiceErrorRate: ... [16:13:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=recommendation-api-ng&var-backend=recommendation-api-ng-main.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [16:37:49] (03PS1) 10Sbisson: Extract and finetune HTTPX config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216830 [16:41:24] (03CR) 10Nik Gkountas: [C:03+2] Extract and finetune HTTPX config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216830 (owner: 10Sbisson) [16:42:03] (03Merged) 10jenkins-bot: Extract and finetune HTTPX config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1216830 (owner: 10Sbisson) [17:08:51] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-Recent-changes, 10PersonalDashboard, and 3 others: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438#11444656 (10Samwalton9-WMF) [21:52:24] (03CR) 10Jforrester: [C:03+2] Use PHP8 constructor property promotion syntax for dependency injection [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1216820 (owner: 10Umherirrender) [22:07:28] (03Merged) 10jenkins-bot: Use PHP8 constructor property promotion syntax for dependency injection [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1216820 (owner: 10Umherirrender)