[08:02:37] good morning :) [08:12:31] (03PS1) 10Elukey: Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 [08:17:49] (03CR) 10CI reject: [V: 04-1] Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [08:21:35] knowledge-integrity 0.1.0 depends on torch==1.10.1 [08:21:36] ah lovely [08:26:30] * elukey files a pull request [08:28:31] * elukey also learns about poetry [08:51:33] (03CR) 10Elukey: [C: 03+1] "LGTM, left a nit but nothing blocking, you decide what's best :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) (owner: 10AikoChou) [08:59:11] opened https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/10 for the research team [08:59:25] not to surewho to ping though [09:20:09] (03PS2) 10Elukey: Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 [09:21:06] (03CR) 10CI reject: [V: 04-1] Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [09:28:04] (03PS3) 10Elukey: Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 [09:41:04] (03CR) 10Ilias Sarantopoulos: [C: 03+2] revscoring: delete individual revscoring images [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868433 (https://phabricator.wikimedia.org/T323586) (owner: 10Ilias Sarantopoulos) [09:48:18] (03Merged) 10jenkins-bot: revscoring: delete individual revscoring images [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868433 (https://phabricator.wikimedia.org/T323586) (owner: 10Ilias Sarantopoulos) [09:50:29] (03PS4) 10Elukey: Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 [09:55:48] (03CR) 10CI reject: [V: 04-1] Update torch and joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [09:57:08] ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device: '/tmp/pip-target-au362gg1/lib/python/torch/jit' [09:57:11] * elukey cries in a corner [09:57:23] (03CR) 10Elukey: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [10:04:03] ok now it works :) [10:19:40] (03CR) 10Kevin Bazira: [V: 03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) (owner: 10AikoChou) [10:44:35] \o [10:44:49] elukey: working on a docker image update for nllb [10:45:12] klausman: o/ sorry for the brutal news on a friday :( [10:45:15] ERROR: Cannot install torch==1.13.1+cu116 and torchaudio==0.12.1+cu113 because these package versions have conflicting dependencies [10:45:18] yaaaay [10:45:56] ah yes sure, I think we need torch audio with cu116 as well [10:46:15] no idea what the difference is [10:47:03] I wish it was easier to find the specific deps of a given pypi package+version [10:48:31] 10Machine-Learning-Team, 10Research: Update torch's settings in the Knowledge Integrity repo - https://phabricator.wikimedia.org/T325349 (10elukey) [10:48:41] 10Machine-Learning-Team, 10Research: Update torch's settings in the Knowledge Integrity repo - https://phabricator.wikimedia.org/T325349 (10elukey) p:05Triage→03High [10:48:52] also opened https://phabricator.wikimedia.org/T325349 for the research team [10:54:52] elukey: btw, while I'm hacking around in this, I'll also add a --nopush flag to deploy.py [10:58:44] ack [11:03:57] Pushing first test with new torch* packages to staging now [11:08:16] nice :) [11:28:49] elukey: staging with torch==1.13.1+cu116 torchaudio==0.13.1+cu116 looks fine so far. I'll do mor testing [11:39:37] \o/ [11:39:47] super thanks a lot for the quick fix [11:39:57] going to lunch, lemme know if I can test/help later on [11:40:33] ack [12:02:13] o/ [12:03:43] <- lunch [12:09:38] looking why the pipeline failed https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/10/pipelines [14:08:24] aiko: o/ I am going to update the pull request, Muniza said that we don't need torch [14:10:59] Whhhhhaaat did I walk into on my first day back [14:12:31] wwhaaaat? [14:12:39] :D [14:12:50] Lol [14:13:45] hope you're better Chris! [14:13:55] I am! [14:14:03] That kicked my butt [14:15:59] I was reading the two threads on slack but it looks like it all got resolved right as I woke up [14:16:28] elukey: I'm not sure if transformers use torch.. left a comment asking Muniza [14:17:51] aiko: ah ok.. is there any code in the ki repo that uses torch? [14:18:07] chrisalbon: only a security drill on friday, nothing big : [14:18:08] :) [14:18:19] but better safe than sorry [14:19:54] aiko: which transformers? I see it is an xgboost model and pytorch isn't used anywhere in the repo indeed [14:19:56] elukey: I don't think so [14:20:44] elukey: yeah looks no pytorch code in the repo [14:21:15] aiko: nevermind I understood you mean the transformers package [14:22:25] so if we don't use pytorch code in the repo that should be fine, right? [14:23:35] yes! [14:23:37] (03PS5) 10Elukey: Remove torch and update joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 [14:24:02] updated my code review to remove torch from the revert risk deps as well [14:24:10] because I was thinking maybe when we install transformers, it'll install torch automatically? [14:25:23] chrisalbon: one of the things that I value the most on github is "dependabot" [14:25:37] it scans for requirements.txt files and reports vulnerabilities [14:25:46] if we migrate fully to gitlab we should try to have something similar [14:25:54] Yeah I love it [14:26:08] I need to open several tasks, we should also update revscoring at some point [14:26:21] not high priority but.. [14:26:49] in this case revert risk is fine, if we don't use torch in there no rush [14:27:10] but NLLB is a different beast :D (I think that Tobias already updated the staging endpoint) [14:27:17] Why do we need to update revscoring? [14:27:33] (03CR) 10AikoChou: [C: 03+1] "let's try it! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [14:28:05] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Remove torch and update joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [14:28:51] chrisalbon: there are some open vulnerabilities from dependabot.. nothing really exploitable afaics, but I'd prefer to fix them if possible. It should be a matter of bumping the deps and rebuild [14:29:14] my idea is to fix revscoring upstream (run tests etc..) and update only lift wing [14:29:38] no real threat for ORES atm, but we should keep our deps updated in my opinion [14:31:05] (03CR) 10Elukey: [C: 03+2] Remove torch and update joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [14:31:08] Okay cool [14:32:26] elukey: once we move to gitlab we can setup dependabot to automatically update vulnerabilities and issue a new merge request. I'm pretty sure that would be possible. [14:32:27] Have worked with a similar setting on github and it's great because it can open a PR, run test etc and u can just validate it works [14:32:42] lovely [14:33:40] the more we automate the better [14:34:24] (03Merged) 10jenkins-bot: Remove torch and update joblid dependencies [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868616 (owner: 10Elukey) [14:37:56] elukey: near as I can tell, the updated image works fine™. But I am uneasy about switching prod to it on a Friday afternoon [14:41:10] klausman: yeah I know, but we have a critical vulnerability, there is not that much we can do.. We can ping CTX people and ask for their review [14:43:43] Pinged Santhosh on Slack [14:44:07] the endpoint is pass protected so we are kinda safe-ish, but.. [14:52:03] elukey: I just realized it's already 20:30 for Santhosh (and Kartik), so they won't be seeing my ping before Monday [14:52:23] (03PS4) 10AikoChou: outlink: fix mwapi session host headers [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) [14:52:47] (03PS5) 10AikoChou: outlink: fix mwapi session host headers [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) [14:53:57] klausman: yeah I imagined :( [14:54:10] anybody else from content translation that we can ping? [14:54:20] Maybe Niklas [14:55:26] I'll try [14:55:57] ah, you were quicker [14:56:10] ah TIL Niklas is a manager now! [14:56:26] (03CR) 10AikoChou: [C: 03+2] "Thanks for the review. :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) (owner: 10AikoChou) [14:56:42] I worked with him a while ago on a nasty memcached key size issue [14:56:58] ah, always good to have existing personal connection [14:57:14] really great experience, even if nailing down the issue was horrible [14:58:28] (the size of a special CTX's key's payload was easily maxing out the tx bw of the mediawiki memcached servers, at the time 1Gbps, for few seconds causing timeouts etc.. [14:58:31] ) [15:02:03] (03Merged) 10jenkins-bot: outlink: fix mwapi session host headers [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/868131 (https://phabricator.wikimedia.org/T325199) (owner: 10AikoChou) [15:02:29] elukey: keys should not be that big :D [15:07:30] we learned the hard way :D [15:07:37] it was around 800KBs IIRC [15:13:56] klausman: I'd say that we can deploy if your tests are good, so we have more time to check etc. before the weekend [15:13:59] what do you think? [15:14:32] Yeah, I think that's a good idea. I'd just feel bad if you then come back Monday to a fire and an angry mob :-/ [15:15:14] hopefully not but don't worry, in case we'll deal with it :) [15:15:39] All right. I'll flip the Endpoint config for prod [15:17:11] Done. Endpoint is in Updating Mode [15:20:05] nice thanks <3 [15:20:31] chrisalbon: o/ when you have a moment https://phabricator.wikimedia.org/T324567#8463820 [15:25:12] Ok, Prod endpoint is switched and working fine [15:26:25] 10Machine-Learning-Team: Update revscoring dependencies to fix security reports - https://phabricator.wikimedia.org/T325366 (10elukey) [15:30:43] klausman: great work thanks! [15:30:54] 10Machine-Learning-Team: Update wikilabel's dependencies - https://phabricator.wikimedia.org/T325367 (10elukey) [15:52:47] 10Machine-Learning-Team, 10Patch-For-Review: Fix translatewiki-reverted and frwikisource-articlequality isvcs - https://phabricator.wikimedia.org/T324567 (10calbon) Yeah let's remove them for now. My guess is that these were the start of models that never made it to production. [15:55:20] elukey: is there a CVE-# for the torch issue? [15:55:42] didn't see it in the dependabot alert [15:55:47] alright. [15:56:09] ah no https://github.com/advisories/GHSA-47fc-vmwq-366v [15:56:13] CVE #? [15:56:22] CVE-2022-45907 [15:56:47] chrisalbon: https://nvd.nist.gov/vuln/detail/CVE-2022-45907 [15:57:22] ah, cool [15:58:18] elukey: pushed my changes to my GH branch [16:10:34] super [16:14:18] 10Machine-Learning-Team: Update revscoring dependencies to fix security reports - https://phabricator.wikimedia.org/T325366 (10elukey) Filed https://github.com/wikimedia/revscoring/pull/526 Next steps: * run revscoring's test and see if they pass (a couple might still be broken, so please run them before/after... [16:14:44] stepping afk for ~30 mins [16:17:35] elukey: just saw Muniza's latest comment https://gitlab.wikimedia.org/repos/research/knowledge_integrity/-/merge_requests/10 [16:21:37] elukey: I think I'll need to figure out transformers' requirements and what's the best way for us [16:34:54] I'm running the prod image on ml-sandbox [16:35:22] I got "Message: None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used." :( [16:50:37] then we should fall back to using a specific url with poetry. However it would be helpful if we had the code that produces the binary (aka trains the model) [16:50:59] Alright, I'm heading out. \o Everyone, have some nice and peaceful holidays and get safely into the new year. See you in January! [16:51:47] o/ Tobias, have a great time! [16:53:48] on the poetry subject: [16:53:48] I added these two parts but I can't to poetry update locally. I'm having multiple problem with Apple Silicon :sad: [16:53:48] ``` [16:53:48] torch = {version= "1.13.1", source= "pytorch"} [16:53:48] [[tool.poetry.source]] [16:53:48] name = "pytorch" [16:53:48] url = "https://download.pytorch.org/whl/cpu" [16:53:49] default = false [16:53:49] secondary = true [16:53:50] ``` [16:56:50] 10Machine-Learning-Team, 10revscoring: Update revscoring dependencies to fix security reports - https://phabricator.wikimedia.org/T325366 (10Aklapper) [16:58:48] klausman: have a nice holiday! [16:59:49] isaranto: I can try [17:02:58] mmm weird I get errors for links like https://download.pytorch.org/whl/cpu/transformers/ [17:03:05] Access Denied [17:03:26] anyway, I think this can be fixed on monday [17:03:30] cc: isaranto, aiko [17:04:16] I'm looking at https://huggingface.co/transformers/v3.5.1/installation.html#installation-with-pip [17:04:37] it says "Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with: `pip install transformers[torch]`" [17:04:46] really?! [17:05:14] elukey: yeah let's fix it on Monday [17:07:22] ah nice! [17:07:34] basically what Muniza suggested [17:08:25] well another version since the one suggested brings in torch 1.11.0 [17:11:21] we also need to upgrade our transformers version [17:11:50] that supports torch 1.13.1 [17:13:27] elukey: the access denied error don't mean anything. I saw that as well but turns out poetry makes a call to all the repositories specified . Which is useless [17:13:28] ``` [17:13:28] All package sources (including secondary sources) will be searched during the package lookup process. These network requests will occur for all sources, regardless of if the package is found at one or more sources. [17:13:28] ``` [17:13:28] https://python-poetry.org/docs/repositories/#project-configuration [17:15:39] ack thanks [17:16:06] okok so let's call it a day, the major concern was NLLB but we are ok (thanks to Tobias) [17:16:29] going afk for the weekend folks! Have a nice break :) [17:17:32] bye Luca, have a lovely weekend :) [17:19:20] ciao! I am logging off too [17:20:00] bye Ilias, you too! have a nice weekend :) [18:11:30] night all [18:21:36] 10Machine-Learning-Team, 10Wikilabels: Update wikilabel's dependencies - https://phabricator.wikimedia.org/T325367 (10Aklapper) [Please add codebase project tags - thanks!]