[04:31:26] 10Machine-Learning-Team: Host WikiGPT on Toolforge - https://phabricator.wikimedia.org/T328398 (10kevinbazira) [07:55:28] 10Machine-Learning-Team: get a GPU on Lift Wing - https://phabricator.wikimedia.org/T327923 (10elukey) >>! In T327923#8571148, @Isaac wrote: > Super excited by this given that Research has been exploring more advanced transformer models that strongly benefit from GPUs not just as training but at prediction time... [07:55:47] 10Machine-Learning-Team: Get a GPU on Lift Wing - https://phabricator.wikimedia.org/T327923 (10elukey) [08:26:53] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ml clusters to kserve 0.9 - https://phabricator.wikimedia.org/T325528 (10elukey) The revscoring model servers have been migrated to kserve 0.10 to circumvent some issues with numpy, and avoid to retrain all the models. The next steps are the following: 1)... [09:05:36] hey folks the new revscoring images are deployed on all revscoring model servers in staging :) [09:12:35] nice! from my side we are ready for the python upgrade as well (now that we have a way to test all the endpoints) [09:12:55] since everything is going to be deployed together we can wait to test it a bit [09:26:28] isaranto: I added a comment to the kserve 0.9 upgrade task - I think that we could finish the migration of the model servers to 0.9 (without py upgrade), upgrade k8s and then finish the py 3.9 upgrade (and possibly fully migrate to kserve 0.10) [09:26:32] how does it sound? [09:26:44] trying to line up the upgrades to have also some stability [09:27:35] (basically rr and outlink are missing from 0.9 IIRC) [09:27:54] for the rest of the model servers, yes. for revscoring we should release it now with kserve 0.9 since we have already merged everything and it works [09:28:23] with 0.10 ? [09:28:36] if so yes +1, I added a specific mention to it [09:28:42] I'm going to create the tickets for upgrading python in rr and outlink to keep track in the future [09:28:51] super thanks [09:28:54] yes , exactly as u wrote in the comment. [09:29:40] also on httpbb I'm doing the changes Lazarus requested - add a json_payload field in yaml [09:29:51] saw it thanks! [09:30:05] really nice that thinkgs are flowing between us and sre [09:30:29] <3 [09:38:44] I tested all staging servers and they serve find :smil [09:38:50] 😄 [09:38:57] 10Machine-Learning-Team, 10artificial-intelligence, 10revscoring: Update revscoring dependencies to fix security reports - https://phabricator.wikimedia.org/T325366 (10elukey) 05Stalled→03Open a:03isarantopoulos Task completed by Ilias as part of the above task :) [09:39:45] 10Machine-Learning-Team: Remove hack from ML's blubber files - https://phabricator.wikimedia.org/T324658 (10elukey) Done! Thanks Aiko and Ilias :) [09:41:40] super [10:04:42] (03PS1) 10Elukey: Update Revert Risk's requirements.txt to support kserve 0.9 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/885286 (https://phabricator.wikimedia.org/T325528) [10:05:04] all right logging off folks, will check later for pings o/ [11:28:05] * klausman lunch [12:38:30] (03CR) 10Kevin Bazira: [C: 03+1] Avoid sharing the same aiohttp session in rr and outlink [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [12:54:10] (03PS8) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [12:54:49] (03PS9) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [12:56:43] (03CR) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) (owner: 10Ilias Sarantopoulos) [12:57:39] (03PS10) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:05:03] (03PS11) 10Ilias Sarantopoulos: test: liftwing manual testing on deployment server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884292 (https://phabricator.wikimedia.org/T327787) [13:11:35] (03CR) 10AikoChou: "I'd like to test this. I think outlink and rr may not suffer from the same problem as revscoring models did, because outlink and rr update" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/884924 (owner: 10Elukey) [13:18:41] 10Machine-Learning-Team: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10isarantopoulos) A brief description on how to enable MP has been added on LiftWing's Wikitech page along with a link to this task https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe [13:20:44] 10Machine-Learning-Team, 10Patch-For-Review: [revscoring] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T325657 (10isarantopoulos) [13:23:43] 10Machine-Learning-Team: [outlink] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T328438 (10isarantopoulos) [13:23:55] 10Machine-Learning-Team, 10Patch-For-Review: [revscoring] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T325657 (10isarantopoulos) [13:25:24] 10Machine-Learning-Team: [revertrisk] Upgrade python from 3.7 to 3.9 in docker images - https://phabricator.wikimedia.org/T328439 (10isarantopoulos) [13:26:10] I split the python upgrade task and added the ones for outlink and revertrisk to the liftwing backlog [13:56:34] 10Machine-Learning-Team: Get a GPU on Lift Wing - https://phabricator.wikimedia.org/T327923 (10isarantopoulos) I'm trying to find whether kserve supports sharing GPU among model servers. What seems promising on this topic is the [[ https://kserve.github.io/website/0.10/modelserving/mms/modelmesh/overview/ | Mode... [13:58:46] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools, 10Patch-For-Review: httpbb with HTTP POSTs and json payload - https://phabricator.wikimedia.org/T328280 (10isarantopoulos) a:03isarantopoulos [13:59:13] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10isarantopoulos) a:03isarantopoulos [14:03:27] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools, 10Patch-For-Review: httpbb with HTTP POSTs and json payload - https://phabricator.wikimedia.org/T328280 (10isarantopoulos) After discussing during the review with @RLazarus we went with the second approach. In the aforementioned patch the... [14:11:37] Morning all [14:14:27] o/ [14:32:05] 10Machine-Learning-Team: Host WikiGPT on Toolforge - https://phabricator.wikimedia.org/T328398 (10kevinbazira) WikiGPT is now up and running. You can see it here: https://wiki-gpt.toolforge.org/ [14:41:13] WoW kevinbazira: great work! [14:47:49] Thanks isaranto: chrisalbon did most of the work :) [14:47:56] since I am not that familiar with tool-forge...Any other sources other than the wikitech page? https://wikitech.wikimedia.org/wiki/Help:Toolforge [14:48:30] actually I'm trying to find where the code for this app lives. [14:48:58] I've worked with it before and wrote documentation that I still follow todate: https://phabricator.wikimedia.org/T282429#7074243 [14:49:09] from the wikitech page I ssh in toolforge and the hit `become TOOL_NAME` but cant find WikiGPT in there [14:50:06] aa nevermind it is `wiki-gpt`. thanks a lot for the phab resource 🤗 [14:50:57] if and when u can, could u add me to the group `tools.wiki-gpt`? [14:53:28] 10Machine-Learning-Team: Host WikiGPT on Toolforge - https://phabricator.wikimedia.org/T328398 (10Zache) Is there some documentation on what is WikiGPT and how it is technically implemented? In a perfect world with technical information, I mean how I can install it locally. :) [14:54:37] great. please share your toolforge username so that I can add you as a maintainer. [14:54:46] isaranto [14:56:26] When I enter "isaranto", toolforge says "no results found". Could you please create a toolforge account if you haven't yet. [14:57:35] hmm it could be Ilias Sarantopoulos then [14:57:50] Now trying "Ilias Sarantopoulos" [14:58:18] done, you've been added as a mantainer [14:58:58] please don't put the code on GitHub as it has private keys we don't want in public [14:59:33] defintely, thanks! [15:03:14] o/ I'm afraid I need to skip today's meeting and rest earlier due to my eyes. See you tmr folks! [15:04:05] No problem [15:04:12] hope you get well soon. [15:33:00] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10isarantopoulos) @elukey I closed this task since your change has already been merged and deployed. [15:34:10] 10Machine-Learning-Team, 10artificial-intelligence, 10revscoring: Update revscoring dependencies to fix security reports - https://phabricator.wikimedia.org/T325366 (10calbon) 05Open→03Resolved [15:34:22] 10Machine-Learning-Team: Host WikiGPT on Toolforge - https://phabricator.wikimedia.org/T328398 (10calbon) 05Open→03Resolved [15:34:41] 10Machine-Learning-Team: Test revscoring model servers on Lift Wing - https://phabricator.wikimedia.org/T323624 (10calbon) 05Open→03Resolved [15:34:44] 10Machine-Learning-Team, 10Patch-For-Review: Test ML model-servers with Benthos - https://phabricator.wikimedia.org/T320374 (10calbon) [15:35:00] 10Machine-Learning-Team: Create a pre-commit hook for inference-services repo - https://phabricator.wikimedia.org/T325198 (10calbon) 05Open→03Resolved [16:10:45] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10EChetty) [16:14:06] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10EChetty) [16:53:11] heading out now \o [17:10:51] night Tobias! [17:16:28] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10Aklapper) @isarantopoulos: Hi, this task is still open. If this task is resolved, please set the task status to `resolved`. Thanks a lot! [17:33:10] 10Machine-Learning-Team, 10Infrastructure-Foundations, 10SRE-tools: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10RLazarus) 05Open→03Resolved [19:15:11] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10colewhite) [22:05:02] I think the wikigpt demo is getting overwhelmed by traffic. I guess that is a good problem to have. [22:16:58] 10Machine-Learning-Team: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10calbon) [22:20:26] 10Machine-Learning-Team: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10calbon) [22:20:58] 10Machine-Learning-Team: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10calbon) Also can someone add me to the toolforge group? [22:24:48] 10Machine-Learning-Team: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10calbon)