[00:17:32] (03CR) 10CI reject: [V:04-1] Remove a duplication of selectors in ext.ores.highlighter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1052157 (owner: 10Ebrahim) [06:10:01] o/ good morning [06:42:42] * isaranto afk be back in an hour [06:58:19] 06Machine-Learning-Team, 13Patch-For-Review: Support building and running of articlequality model-server locally - https://phabricator.wikimedia.org/T368875#9955703 (10kevinbazira) 05Open→03Resolved [06:58:23] 06Machine-Learning-Team, 13Patch-For-Review: Support building and running of articlequality model-server locally - https://phabricator.wikimedia.org/T368875#9955706 (10kevinbazira) a:03kevinbazira [07:38:52] 06Machine-Learning-Team: Reorganize LiftWing isvcs repo structure to improve maintainability - https://phabricator.wikimedia.org/T369344 (10kevinbazira) 03NEW [07:52:08] * isaranto back [07:55:33] hi folks! [07:55:49] knative images deployed on staging, let's see how they goes [07:56:49] a little hiccup that I found - changing the net-istio config-map (removing the example) triggered updates to a lot of pods, and afaics it seems that the net-istio webhook pods were not available to answer TLS calls to validate etc.. [07:57:02] the first time the deployment failed, the second it succeeded [07:57:30] I recall that we had a similar issue with the knative webhook, and IIRC we solved it increasing a little the readiness probe [08:00:10] from kubectl describe pod I don't see a readiness probe configured for the net-istio webhook, so maybe it is a default veeery quick and string [08:00:13] *strict [08:05:13] hey Luca! [08:05:55] ack! thanks for taking care of that [08:09:00] elukey: is there anything we should do to check? run load tests or sth similar? [08:09:52] nono I think this is more related to when we change configmaps that are pushed in more places [08:10:18] we can test prod, if the deploy doesn't go through we can add some extra readiness tolerance [08:10:43] (Basically if the webhook isn't available soon after the deployment then helm considers it failed etc..) [08:10:56] ok, clear! [09:17:52] (03CR) 10Matěj Suchánek: [WIP] Add AbuseFilter variable for revertrisk score (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1051837 (https://phabricator.wikimedia.org/T364705) (owner: 10Kosta Harlan) [10:46:03] * isaranto afk lunch [11:43:46] (03CR) 10Ladsgroup: [C:03+2] "try again" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1052157 (owner: 10Ebrahim) [11:47:12] (03Merged) 10jenkins-bot: Remove a duplication of selectors in ext.ores.highlighter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1052157 (owner: 10Ebrahim) [12:32:35] I found a way to fix the requirements in the hf image and create less of a mess when upgrading the package versions using a pip constraints file https://pip.pypa.io/en/stable/user_guide/#constraints-files [12:33:19] I'm trying to make it work with blubber at the moment cause there is no native support (not the same way that we define requirements.txt) [12:40:55] oh nevermind it won't work [13:21:44] 06Machine-Learning-Team: Simplify dependencies in hf image - https://phabricator.wikimedia.org/T369359 (10isarantopoulos) 03NEW [13:23:10] 06Machine-Learning-Team: Simplify dependencies in hf image - https://phabricator.wikimedia.org/T369359#9956678 (10isarantopoulos) [13:34:54] 06Machine-Learning-Team: Simplify dependencies in hf image - https://phabricator.wikimedia.org/T369359#9956694 (10isarantopoulos) In a previous [[ https://phabricator.wikimedia.org/T357986#9679664 | iteration ]] I wrongly thought that this behavior was done because of `torch` and `torch-rocm` having different m... [13:35:10] I figured it out -^, it was much simpler than I thought [13:46:34] nice :) what a rollercoaster for a Friday :) [13:48:12] PYTHONPATH making life confusing was a constant in a previous job. Fortunately, for unrelated reasons, we switched the whole codebase to Go, which solved that problem [13:49:56] * isaranto nods [14:11:38] I need some help in the production-images repo (iirc I've had this issue in the past but don't remember what I did to solve it). [14:12:06] when I run `docker-pkg -c config.yaml build images/ no images are built although I haven't built them all [14:12:37] or more specifically the command I remember I was using `docker-pkg -c config.yaml build images/ --select "*pytorch*"` [14:19:31] sec, let me access my secondary memeory (.bash)history) :) [14:21:15] That command should work, AFAICT. But maybe you need to remove the already-built images in your local repo? [14:21:31] IIRC, docker-pkg will only build entirely-absent image versions [14:22:22] I'm trying to build amd-pytorch-common image which doesn't exist [14:22:43] let em do a local experiment with current HEAD [14:22:45] thanks for the answer Tobias! I'll just delete my local images to be sure [14:25:58] nooo it doens't work, and now I have to redownload all my docker images :( [14:26:07] damn [14:26:13] I can't get it to build, either [14:26:14] going to try if there's anything fancy going on with docker-pkg installation [14:32:34] isaranto: is the changelog correctly bumped to a new version? [14:33:58] I couldn't bump it cause I hadn't built the amd-pytorch-image https://phabricator.wikimedia.org/P65870 [14:34:50] now I don't have any image locally. hmm I was puzzled into how this works [14:35:02] maybe if I just download the amd-pytorch-common image , then update the changelog and retry it would work [14:35:20] can you share the full diff? [14:35:24] otherwise it is difficult [14:35:46] one trick to force a local build is to comment out the docker-registry in config.yaml. But naturally, for an actual update/bump the changelog needs an update and _then_ docker-pkg will DTRT [14:36:10] It's just a bit strict about not building images that it sees as already published. [14:36:44] elukey: which diff? the git diff? [14:37:33] isaranto: the diff for the production-images repo that you are trying to build [14:38:40] from P65870 it seems that you have a change lined up [14:39:14] ack [14:39:58] this is the diff https://phabricator.wikimedia.org/P65870#263864 [14:40:21] I managed to solve it by commenting out the docker registry [14:40:44] sorry for the hassle folks! [14:41:32] I needed the amd-pytorch-common image which was not built ofc because it had no change and for some reason downloading it manually didnt work [14:41:44] thanks both! [14:42:30] ah interesting trick, nice! [14:44:11] I'm gonna keep some notes, I think I faced the same or similar issue 1-2 months ago [14:45:04] `Fool me once, shame on you. Fool me twice, shame on me` [14:46:24] The commented-out registry does mean that everything that the image needs for building neds to already be local, unless it's on the default registry (i.e. not the WMF one) [14:57:39] 06Machine-Learning-Team: Simplify dependencies in hf image - https://phabricator.wikimedia.org/T369359#9956896 (10isarantopoulos) I'm trying the above change in a new version of base pytorch image (`docker-registry.wikimedia.org/amd-pytorch23:2.3.0rocm6.0-3`) and then use that one for the huggingface image. I'll... [15:07:36] (03CR) 10Jdlrobson: [C:03+1] Remove a duplication of selectors in ext.ores.highlighter [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1052157 (owner: 10Ebrahim) [16:14:07] a, I see some nvidia packages installed again [16:14:07] pff [16:14:24] anyway, will continue I think this direction will work [16:14:33] logging off folks, have a nice weekend!