[05:22:50] 10Machine-Learning-Team, 10Gerrit, 10Language-Team, 10serviceops-radar: Create Gerrit repository for /services/machinetranslation and migrate code from Gitlab - https://phabricator.wikimedia.org/T331256 (10KartikMistry) [06:53:37] 10Machine-Learning-Team, 10serviceops-radar, 10Language-Team (Language-2023-January-March): Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10KartikMistry) [06:53:54] 10Machine-Learning-Team, 10Gerrit, 10serviceops-radar, 10Language-Team (Language-2023-January-March): Create Gerrit repository for /services/machinetranslation and migrate code from Gitlab - https://phabricator.wikimedia.org/T331256 (10KartikMistry) [08:19:10] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10fgiunchedi) [08:22:00] hello folks [08:22:32] I have almost everyhing ready for kserve 0.10, and I see 0.10.1 :D [08:22:40] https://github.com/kserve/kserve/releases [08:23:12] from the changelog I think that we could wait, nothing really urgent in my opinion (I thought some horror security release but it seems not the case) [08:29:12] hello!!! [08:32:11] elukey: I 'll write up some unit tests and submit the patch for calls with aiohttp later [08:32:34] super! [08:32:41] I am adding people to the kserve 0.10 code reviews [08:33:22] ack [08:45:49] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MoritzMuehlenhoff) [09:04:52] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:05:21] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:06:18] \o [09:06:28] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:06:29] elukey: https://slimtoolkit.org This might prove interesting at some point. [09:06:33] Nice work Luca! [09:09:20] (03CR) 10Klausman: [C: 03+1] nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:09:50] (03CR) 10Klausman: [C: 03+1] revert-risk: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:10:10] (03CR) 10Klausman: [C: 03+1] revscoring: remove unnecessary aiohttp session cleanup [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894004 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:12:07] (03CR) 10Klausman: [C: 03+1] outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:16:07] thanks for the reviews! [09:21:03] (03CR) 10Kevin Bazira: [C: 03+1] nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:26:06] (03CR) 10Kevin Bazira: [C: 03+1] revscoring: remove unnecessary aiohttp session cleanup [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894004 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:26:46] https://github.com/kserve/kserve/pull/2718/files is very interesting [09:26:49] isaranto: --^ [09:26:59] we could try it in the future and see how it goes [09:29:06] klausman: https://github.com/kserve/kserve/issues/2257 this is interesting for the api-gateway context, not sure if better/worse from what we have [09:29:13] (new things in 0.10.1) [09:29:36] checking [09:29:37] (03CR) 10Elukey: [C: 03+2] revscoring: remove unnecessary aiohttp session cleanup [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894004 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:29:48] Ack [09:34:37] elukey: it does look interesting. Not that we strictly _need_ path-based routing. It would have simplified the APIGW effort, probably. Maybe. Not sure how much benefit it would be today. [09:37:02] it is an option for the future, if needed [09:37:19] Aye. [09:38:58] (03Merged) 10jenkins-bot: revscoring: remove unnecessary aiohttp session cleanup [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894004 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [09:48:05] 10Machine-Learning-Team, 10serviceops-radar, 10Language-Team (Language-2023-January-March): Hosting machine request for machine translation - https://phabricator.wikimedia.org/T329971 (10akosiaris) I had a chat with @elukey on Friday regarding this. To summarize, nothing of the above changes and for now, we... [09:49:36] 10Machine-Learning-Team, 10Gerrit, 10serviceops-radar, 10Language-Team (Language-2023-January-March), 10Patch-For-Review: Create Gerrit repository for /services/machinetranslation and migrate code from Gitlab - https://phabricator.wikimedia.org/T331256 (10hashar) I have created the Gerrit repo `mediawiki... [09:51:03] isaranto: o/ there seems to be an interesting change in how blubber works now, not sure if it is due to the buildx layout or not [09:51:22] we have source rules in all docker images like: python/*.py [09:51:41] previously it didn't work recursively, so the "revscoring" subdir wasn't picked up [09:52:04] but now I see that all images are rebuilt if we change any file in that dir [09:52:30] see https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/894004/ [09:53:56] Is there an exclude directive or something like that? [09:55:39] Hmm I wasn't aware of the previous behavior.. going to take a look [09:56:02] But yeah if we have specific instructions it doesn't make sense to rebuild for every change [09:56:35] maybe I am misrembering, the the blubber copy works fine, I don't see the revscoring .py files on other docker images [10:02:46] 10Machine-Learning-Team, 10Patch-For-Review: Implement new mediawiki.revision-score streams with Lift Wing - https://phabricator.wikimedia.org/T328576 (10Ottomata) > Does it make sense if we create version 3.x of the revision-score schema Yes that makes sense. And since these are new streams anyway, there is... [10:10:04] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade Kserve's k8s control plane to 0.10 - https://phabricator.wikimedia.org/T331114 (10elukey) [10:10:16] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade Kserve's k8s control plane to 0.10 - https://phabricator.wikimedia.org/T331114 (10elukey) a:03elukey [10:12:55] ml-staging-codfw has the new control plane (0.10), going to test it in a bit with some deployments [10:13:14] (03CR) 10Elukey: [C: 03+2] revert-risk: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [10:18:05] (03Merged) 10jenkins-bot: revert-risk: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894005 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [10:42:51] mmm something is wrong with revert risk, I forgot to add the python3-distutils package in its blubber config [10:42:56] but now when I test it I see [10:43:07] File "/opt/lib/python/site-packages/knowledge_integrity/models/revertrisk.py", line 11, in [10:43:10] import xgboost as xgb # type: ignore [10:43:12] ModuleNotFoundError: No module named 'xgboost' [10:43:19] we have [10:43:20] knowledge_integrity[revertrisk-multilingual] @ git+https://gitlab.wikimedia.org/repos/research/knowledge_integrity.git [10:43:31] so I guess that we always get the latest and greatest [10:43:41] That sounds dangerous [10:43:43] but the repo uses poetry, not sure if pip likes it [10:43:54] multi-lingual is still experimental so.. borderline :D [10:50:42] mmm in theory from https://github.com/python-poetry/poetry/issues/321 pip should support poetry [10:57:19] also I see some nvidia packages.. [10:57:20] sigh [10:57:23] ok i'll open a task [10:57:27] It's like Linux distros in 1998: too damn many packaging systems :) [11:26:45] * klausman lunch [11:27:53] I also get [11:27:53] AttributeError: module '__main__' has no attribute 'MultilingualRevertRiskModel' [11:28:08] that is probably https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/mykola/multilingual_user_independent/knowledge_integrity/models/revertrisk_multilingual/model.py#L386 [11:28:12] mmm [11:28:17] ok I am going to lunch, then I'll try to work on RR :) [11:28:22] * elukey lunch [11:54:02] sry was afk for a while. finished with my patch. shall I help with RR? (or even take it on if u have other stuff to do) [12:11:17] just dealing with CI a bit [12:23:38] o/ I'll have a look on the RR problem [13:01:17] (03PS11) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) [13:02:36] (03CR) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [13:04:34] there it is --^ [13:05:01] thinking of adding a Readme markdown on how to run etc [13:06:43] hope the size of the patch is not intimidating. actual new code is approx 150 lines. the rest is configuration and unit tests [13:24:13] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [13:24:17] also added ci pipelines for the above https://gerrit.wikimedia.org/r/c/integration/config/+/894640 [13:25:15] taking a break and planning on continuing my work checking on ORES stuff (bots , usage etc) [13:25:41] elukey: if u need me to do stuff around kserve 0.10 lemme know, I'm always available [13:41:50] isaranto: are you planning on adding Prom metrics to the migration endpoint? Or does FastAPI do that automagically? [13:55:50] (03CR) 10Hashar: "recheck after CI change https://gerrit.wikimedia.org/r/c/integration/config/+/894640" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [13:56:13] (03CR) 10CI reject: [V: 04-1] feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [13:58:41] klausman: ofc we should. It doesnt add them automatically afaik. but I need to check how prom metrics are handled in other apps in wmf as I am unaware (I mean if there are specific standards etc) [13:59:24] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) (owner: 10Ilias Sarantopoulos) [14:07:52] (03PS12) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) [14:11:39] now it works 🎉 [14:21:54] Elukey I am going to try to make the meeting on T329071 but I might still be dropping my kids off at school [14:34:33] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [14:36:24] isaranto: o/ back, don't worry for RR, it was more a braindump, I'll research a bit with Aiko :) [14:36:31] chrisalbon: ack! No problem if you can't make it [14:37:02] Also good morning all! [14:37:10] Elukey thanks [14:37:21] moooorning! [14:37:36] isaranto: re prometheus - we usually add specific annotations to pods about what port/path they are served on, and the prometheus master get them automatically [14:38:00] noiice [14:38:14] aiko: o/ [14:39:22] elukey: also if u want to discuss about asyncio ping me anytime. perhaps after u take a look at the patch [14:39:33] yes yes definitely [14:41:58] aiko: so this is the error that I am getting with RR-multi-lingual: [14:41:59] https://phabricator.wikimedia.org/P45052 [14:42:09] (03PS13) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) [14:42:10] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10MatthewVernon) [14:42:39] is it me not using the correct model? (maybe) [14:43:34] also I had to add xgboost to requirements.txt, I see it as extra the KI repo [14:43:48] we should probably pin the docker images to specific commits [14:54:39] (03PS14) 10Ilias Sarantopoulos: feat: Create a migration endpoint between LiftWing/ORES [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/892998 (https://phabricator.wikimedia.org/T330414) [14:55:05] added full instructions in README.md on how to run locally or from a statbox --^ [14:56:22] nice! [15:00:43] elukey: o/ looks like you didn't use the correct model [15:01:54] aiko: same thing for xgboost? [15:02:19] are you testing the revert-risk model? or the multilingual one? [15:02:33] https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-revertrisk/tags/ [15:02:47] or https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-revertrisk-multilingual/tags/ ? [15:02:59] aiko: I am bulding multi-lingual from blubber [15:03:43] the multilingual model don't need the xgboost [15:04:10] they use a different lib called catboost [15:04:20] you can see https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/main/pyproject.toml#L23 [15:05:07] trying to rebuild without it, I don't find the original issue [15:07:40] elukey: https://phabricator.wikimedia.org/P45052 > Line 8 uses /opt/lib/python/site-packages/knowledge_integrity/models/revertrisk.py, that's not right if you're testing the multilingual one [15:08:35] should use https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/blob/main/knowledge_integrity/models/revertrisk_multilingual/model.py [15:09:23] have you set the MODEL_NAME to revertrisk-multilingual [15:10:15] ah the env variable? [15:10:20] ok probably I missed it [15:12:02] yeah that was it [15:12:06] thanks aiko :) [15:12:33] nice :) [15:12:49] (03PS3) 10Elukey: nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) [15:12:51] (03PS4) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [15:12:53] (03PS1) 10Elukey: blubber: add python3-distutils to revert-risk configs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894663 (https://phabricator.wikimedia.org/T329032) [15:13:13] aiko: the only thing missing is https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/894663 [15:13:21] needed by kserve, the rest now works :) [15:14:24] (03CR) 10AikoChou: [C: 03+1] blubber: add python3-distutils to revert-risk configs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894663 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:15:47] thanks :) [15:17:56] (03CR) 10Elukey: [C: 03+2] blubber: add python3-distutils to revert-risk configs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894663 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:20:54] (03Merged) 10jenkins-bot: blubber: add python3-distutils to revert-risk configs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894663 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:26:30] (03CR) 10AikoChou: [C: 03+1] outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:26:54] (03CR) 10AikoChou: [C: 03+1] nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:27:26] aiko: <3 thanks! [15:32:50] (03CR) 10Elukey: [C: 03+2] nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:34:19] (03Merged) 10jenkins-bot: nsfw: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894006 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:36:59] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Jelto) [15:45:43] (03PS5) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [15:47:02] (03CR) 10Elukey: "Adde the python3-distutils dep in the last patch, missed it while testing, the rest looks the same :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [15:52:55] * isaranto afk for a short walk [16:03:47] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 12th round of wikis - https://phabricator.wikimedia.org/T308137 (10kevinbazira) a:03kevinbazira [16:32:44] 10Machine-Learning-Team, 10ORES, 10Advanced-Search, 10All-and-every-Wikisource, and 73 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10MPhamWMF) [16:37:01] (03PS6) 10Elukey: WIP - outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [16:37:28] isaranto: o/ [16:38:02] I found something weird when testing outlink, preprocess/predict/prostprocess now take the headers attribute as well [16:38:17] see for example https://github.com/kserve/kserve/pull/2713/files [16:38:26] have you found the same problem with revscoring? [16:39:08] ah yes snap, then it is me not testing things right [16:39:11] sigh nevermind [16:41:47] (03PS7) 10Elukey: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) [16:41:49] (03PS1) 10Elukey: nsfw,revert-risk: update method signature for Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894695 (https://phabricator.wikimedia.org/T329032) [16:42:01] * elukey hides in shame for the testing done horribly [16:44:14] (03CR) 10Elukey: "Added some changes in method signatures after testing, sorry folks!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [16:45:59] (03CR) 10CI reject: [V: 04-1] nsfw,revert-risk: update method signature for Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894695 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [16:46:19] Back [16:46:47] elukey: ack! [16:47:32] (03PS2) 10Elukey: nsfw,revert-risk: update method signature for Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894695 (https://phabricator.wikimedia.org/T329032) [16:47:51] hopefully the last changes [16:48:38] * isaranto gives elukey a pat on the back as he has done similar things many times [16:48:41] sometimes I really miss some static typing [16:48:45] 😃 [16:50:00] We can enforce some static typing with mypy (or at least emulate static typing behavior as it is not a thing for interpreted languages) [16:58:06] (03CR) 10Ilias Sarantopoulos: [C: 03+1] nsfw,revert-risk: update method signature for Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894695 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [16:59:12] elukey: my apologies I should have caught the above --^ but it has been some while since I did the changes for revscoring... Should have documented the changes required going for kserve 0.10. note taken [17:00:23] (03CR) 10Ilias Sarantopoulos: [C: 03+1] outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [17:02:31] isaranto: nono please I didn't test it correctly, and found out only later, it is completely my bad :) [17:02:49] sometimes I try to do too many things at once and this is the result :) [17:13:23] I realized that we are not testing the nsfw model in httpbb [17:13:24] mmm [17:13:31] I am testing it locally and it seems hanging [17:14:29] (03CR) 10Elukey: [C: 03+2] outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [17:15:40] lemme check.. 👀 [17:17:03] ok remembered. I never tested it cause there is an issue with apple silicon (`qemu: uncaught target signal 6 (Aborted) - core dumped`). was waiting to test it on ml-staging directly [17:18:13] (03Merged) 10jenkins-bot: outlink: upgrade to Kserve 0.10 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/894007 (https://phabricator.wikimedia.org/T329032) (owner: 10Elukey) [17:19:34] isaranto: ah yes also the input required is a b64 image that may be rather huge :D [17:20:07] I am trying to see if upgrading tensorflow would help. [17:20:49] as far as I understand we can try the model with whatever image [17:20:57] I mean for QA reasons [17:24:20] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10herron) [17:27:39] ok so in staging it is super quick [17:28:25] not sure, will restart tomorrow :) [17:30:02] I'm checking as well [17:30:44] going afk for today folks, see you tomorrow o/ [17:33:19] \o [17:35:21] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ssingh) [17:46:01] bye luka! [17:46:27] unfortunately just bumping tensorflow doesnt help 💫 [18:09:59] tested nsfw on staging and both prod clusters. works fine! [18:10:43] I encoded a random image (MNIST 😛) to base64 and tried it. tomorrow I'll add some httpbb tests for it [19:08:46] added the tests here. tested on all clusters, work fine! -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/894714 [19:08:50] cya tomorrow! [19:15:18] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 6th round of wikis - https://phabricator.wikimedia.org/T304550 (10Trizek-WMF) All models work fine except: * **cbk-zam**: search returns add a link results, but the API returns "Unable to process req... [20:02:03] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10herron) [23:19:51] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=786ee8c7-4753-4e2d-96f9-8b55b691ff09) set by bking@cumin2002 for 1 day... [23:20:56] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f9f1bd07-4af1-41e3-82b7-3ab0f2ff8672) set by bking@cumin2002 for 1 day... [23:22:21] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10bking) [23:25:12] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10RKemper)