[06:34:51] (03PS7) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:36:03] o/ [06:36:33] isaranto: kalimera! I tried to make the ores-legacy's tests to run in CI, it works in local [06:36:35] (03CR) 10CI reject: [V: 04-1] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:37:41] ah no of course something is off with paths [06:37:42] uff [06:41:38] (03PS8) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:42:18] (03CR) 10CI reject: [V: 04-1] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:57:22] (03PS9) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:58:19] (03CR) 10CI reject: [V: 04-1] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [06:58:52] (03PS10) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [07:02:26] finally, it seems working [07:02:44] the only nit that bother me is that running `tox` in the ores-legacy dir triggers both env to run, and ci fails [07:02:50] if I run `tox -e local` all good [07:02:57] I am probably missing a setting [07:03:30] I didn't test the ores-legacy image/code with the new paths, they should work but I need to double check [07:03:42] if the change is horrible we can revert to its original state [07:03:51] (need to run some errands) [07:14:28] \o Morning [07:15:28] elukey: I don't think there is an easy way to make tox only run with -e local if started by a user (i.e. not CI/Jenkins). How long does it usually take? if it's not too much, maybe we could put it in a commit hook, or a pre-push/review hook [07:16:06] Since CI doesn't care about git hooks, we could use -e local there and not affect non-human users. [07:45:10] morning! [07:45:19] it is weird though, I expect tox to be able to do it [07:46:22] the tests are quick for the moment [07:46:37] they work in CI and locally, so it could be a first start [07:53:14] The only other way (besides -e local) I know of limiting envs is removing them from tox.ini [07:53:52] The final way would be to have a local wrapper script that detects what repo you're in and adds -e local to teh tox call. But that seems brittle and hackish [08:00:51] klausman: tox has a special setting called "envlist" that is meant to execute envs if -e is not specified, but it doesn't work afaics [08:02:58] Huh. [08:03:08] It has always worked for me, but it's been a while [08:03:35] mmm maybe it is only for python version [08:03:41] so weird [08:03:56] anway, I'll test the rest of the changes, maybe Ilias has some ideas [08:05:29] So what state does the repo have to be in for it to fail? [08:06:14] (03CR) 10Elukey: "Do we want to add https://github.com/github/gitignore/blob/main/Python.gitignore ?" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [08:06:47] even with 923599 patched in, I can just run `tox` and it does the usual tests fine [08:07:51] klausman: where are you running tox from? I mean the dir [08:08:05] root of the git checkout [08:08:15] if you chdir to ores-legacy you'll see it running two times [08:08:28] ok, checking... [08:08:37] better, the ci run fails [08:08:50] because of the requirements-test.txt different paths [08:08:58] tox -e local works [08:09:04] ah, yes, I can repro now. Let me try a few things [08:09:52] elukey: it's envlist, not env_list [08:10:13] o/ [08:10:31] ah nice! [08:10:38] it works now, thanks klausman [08:10:42] commenting on the change [08:10:49] (03PS11) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:10:49] nice work elukey: for me it was still failing with logging. I saw you moved some files [08:10:57] so probably it is better! [08:11:01] (03CR) 10Klausman: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:11:12] With no underscore, it works fine here [08:11:13] isaranto: I need to check if it works, lemme know if it is a horror or not [08:11:18] klausman: fixed yes [08:11:58] (03CR) 10Klausman: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:15:16] (03CR) 10Klausman: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:18:56] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Cloud-Services, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10kostajh) [08:25:15] (03CR) 10Ilias Sarantopoulos: ores-legacy: run tests in ci (035 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:25:49] no horror at all! I just added a comment and a question [08:30:57] (03PS2) 10Ilias Sarantopoulos: Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [08:31:29] (03CR) 10Ilias Sarantopoulos: Add .gitignore (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [08:32:18] TIL about .git/info/exclude -> https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files [08:32:30] so I can do my own stuff there if I want [08:33:53] Nice find, I've been wondering about that. [08:34:20] I knew (some) hooks are not part of the repo, but I wasn't aware there is more non-committed local config [08:42:32] I also find the global gitignore a nice things [08:43:13] Yes, I use that for files specific to my tooling, like vim swapfiles [08:43:37] (though making vim use one subdir in my homedir for swapfiles is my preferred way of handling them, for other reasons) [08:50:08] (03CR) 10Elukey: ores-legacy: run tests in ci (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:56:42] (03PS12) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:57:08] (03CR) 10Elukey: ores-legacy: run tests in ci (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:58:48] (03PS13) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [08:58:54] (03CR) 10Elukey: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [09:00:04] (03CR) 10Klausman: [C: 03+1] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [09:00:48] (03PS3) 10Elukey: Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 [09:00:50] (03PS14) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [09:01:01] (03CR) 10Elukey: Add .gitignore (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [09:01:11] portability! my mind was stuck and couldnt find the word for it [09:01:27] nice Tobias [09:02:31] (03CR) 10Elukey: ores-legacy: run tests in ci (034 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [09:02:54] ok the crs are updated, I just need to test ores-legacy on a stat100x to verify that all works [09:15:05] isaranto: what if we add a little Makefile to automate the README.md stat100x testing? [09:15:27] something that creates the venv, activate it and run uvicorn etc.. [09:16:17] (back in a few) [09:18:13] 👌 SGTM! [09:20:40] I can do that! [09:23:37] Note that on the stat boxes you will need pip install --upgrade pip [09:23:41] er, no [09:23:48] well, yes, but not what I meant [09:23:59] https://wikitech.wikimedia.org/wiki/HTTP_proxy proxies is what you need [09:27:42] u mean in order to install through pip right? [09:29:34] Yep [09:30:05] As the app woll be calling LW, the no_proxy var is probably important [09:30:08] will* [09:30:35] cool! I already have these scripts on a statbox so will put everything in a Makefile [09:31:24] One thing I wonder about for the Makefile is how to make the sourcing of the venv activate "stick" between recipes [09:35:37] (03CR) 10Kevin Bazira: [C: 03+1] Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [09:45:01] <- early lunch [09:54:22] elukey: o/ I've created the model card for revert-risk multilingual already > https://api.wikimedia.org/wiki/API_reference/Service/Lift_Wing/Get_revertrisk_multilingual_prediction sorry I only posted the link on IRC but didn't update the Phab task ~"~ [09:58:18] haven't done the return object [10:10:37] aiko: ahhh okok feel free to update the links with your page and delete mine! sorry [10:10:51] when you have a moment I'd need to ask you some questions about revert risk :) [10:11:39] klausman: the no_proxy var should be set automatically on stat boxes IIRC [10:11:53] Oh, nice [10:12:11] Can confirm it is [10:12:54] isaranto: sorry I didn't mean that you have to do it, I can create it as well :) [10:13:00] it was more like "do you like the idea?" [10:14:36] I can work on it now since I need to test the new code :) [10:19:56] a no it's fine [10:20:01] feel free to do it [10:20:23] I stumbled upon the issue klausman mentioned [10:28:45] mmm that is? [10:28:49] The proxy vars? [10:29:09] keeping the same environment between each make recipe. perhaps activating it on each step [10:29:26] yeah I do an activate every recipe [10:29:55] Yes, they are separate shells, so the shell environment is reset every time [10:30:14] Per-recipe, the shell persists [10:32:06] * isaranto lunch [10:32:51] yep the code works fine! [10:33:16] (03CR) 10Elukey: "Tested the code on stat1004 with the following:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [10:39:06] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, 10ML-Governance, 10Documentation: Create data transclusion template for ORES model cards - https://phabricator.wikimedia.org/T337723 (10kevinbazira) [10:39:54] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, 10ML-Governance, 10Documentation: Create data transclusion template for ORES model cards - https://phabricator.wikimedia.org/T337723 (10kevinbazira) [10:48:05] (03PS1) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 [10:48:30] all right here the patch [10:51:41] (03PS2) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 [10:52:23] * elukey lunch! [11:08:58] (03CR) 10Klausman: ores-legacy: add Makefile to automate testing on stat100x nodes (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [11:10:04] (03CR) 10Klausman: [C: 03+1] Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [11:23:17] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "Nice work!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:15:25] (03CR) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:15:47] (03PS3) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 [12:16:02] (03CR) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:16:48] isaranto: lemme know if https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/923599/14 is ready to go or not, there is still your open comment about the __init__ imports [12:18:22] aiko: when you have a moment for some revert risk questions lemme know [12:19:51] (03CR) 10Ilias Sarantopoulos: [C: 03+1] Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [12:20:50] (03CR) 10Ilias Sarantopoulos: [C: 03+1] ores-legacy: add Makefile to automate testing on stat100x nodes (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:21:42] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add .gitignore [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924076 (owner: 10Elukey) [12:25:16] (03CR) 10Ilias Sarantopoulos: ores-legacy: run tests in ci (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:25:39] elukey: I want to preemptively mention the logging we want to do in the future as a hard dependency for the services. Is that just Logstash or is more involved? [12:25:42] elukey: yes I am ok! I was confused with the message about testing on statbox [12:26:19] (03CR) 10Klausman: [C: 03+1] ores-legacy: add Makefile to automate testing on stat100x nodes (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:26:52] klausman: can you give me more context? [12:27:29] So with kserve 0.11, we want to add request logging, right? I want to put the deps for that in the doc already, so we don't have to change it immediately. [12:28:32] (03CR) 10Ilias Sarantopoulos: [C: 03+1] "One last os.path.join left, other than that free to merge!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:28:57] * isaranto taking a small break before meetings [12:29:06] request logging is broken in 0.10, it will be automatically added with 0.11, a basic access log to stdout. In 0.11 we also have a new arg to pass to the model server with the access log format, that will give us more flexibility (like stating the user agent etc..) [12:29:37] the logs will be shipped automatically to logstash, we'll just need to parse them on the other side (json is still not really supported) and make a dashboard [12:29:55] Ok, thanks! [12:32:13] (03PS15) 10Elukey: ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:32:15] (03PS4) 10Elukey: ores-legacy: add Makefile to automate testing on stat100x nodes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 [12:32:22] (03CR) 10Elukey: ores-legacy: run tests in ci (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:34:27] (03CR) 10Elukey: [C: 03+2] ores-legacy: run tests in ci [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/923599 (owner: 10Ilias Sarantopoulos) [12:34:50] (03CR) 10Elukey: [V: 03+2 C: 03+2] ores-legacy: add Makefile to automate testing on stat100x nodes [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/924479 (owner: 10Elukey) [12:35:01] thanks all for the reviews! [12:35:34] thank you for all the work :) [12:44:26] elukey: incidentally, not having marked a target as .PHONY bit me in the rear end this very weekend :) [12:45:35] :) [12:46:19] Also did a bunch of edits and updates to the slo doc just now. [12:46:55] I think the "compound view" of LW/IS SLOs as we discussed is feasible. Still a lot of questionmarks about useful/credible numbers though. [12:47:42] those will come with time, we can start with some and soft-launch the SLO [12:47:48] test it in a quarter etc.. [12:48:06] it is also a shift in the team's mentality and focus, we'll need some time to adjust [12:48:11] Agreed [13:14:16] Morning all [13:14:43] o/ [13:32:29] morniin [13:32:43] o/ [13:34:15] elukey: o/ what questions about revert risk? [13:36:36] \o [13:36:46] elukey: cadvisor fell over on ores1007, taking a look. [13:37:57] Huh. The config provides a flag cadvisor doesn't know about (-listen) [13:41:58] cadvisor seems to only be installed on 1007, not on any of the other eqiad machines [13:43:51] It was installed yesterday!? [13:44:04] Ok, not touching anything for now [13:46:54] 10Machine-Learning-Team, 10ORES, 10artificial-intelligence, 10ML-Governance, 10Documentation: Create data transclusion template for ORES model cards - https://phabricator.wikimedia.org/T337723 (10kevinbazira) A data template has been created and can be found here: https://meta.wikimedia.org/wiki/Machine_... [13:47:05] Ah, likely related to https://phabricator.wikimedia.org/T108027 [14:00:22] aiko: o/ is there a model card for language agnostic? [14:59:08] 10Machine-Learning-Team: Fix Regular Expression in API GW config for revert risk - https://phabricator.wikimedia.org/T337378 (10klausman) 05Open→03Resolved [14:59:16] 10Machine-Learning-Team, 10MinT: Shut down and deconfigure NLLB setup on AWS - https://phabricator.wikimedia.org/T337369 (10klausman) 05In progress→03Resolved [15:38:06] folks I created: [15:38:18] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/924544 to scale up revert risk [15:38:35] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/924545 to raise the api gateway's auth rate limits for it [15:38:52] (to what WME asked, namely 200k req/hour [15:38:53] ) [15:38:57] lemme know what you think :) [15:43:02] they seem fine to me [15:44:49] I was thinking if we could even have minReplicas less than 5 just to figure out what's needed, but this is just a thought I had [16:00:07] ah yes I scaled up more to be on the safe side, we can scale down later on if we feel so [16:00:21] I have just a feeling that 5 is more conservative [16:00:50] ack [16:01:01] klausman: do you have time to roll out the patches tomorrow?? :) [16:01:17] (and also quickly test if the scale up works) [16:31:34] (going afk, see you folks tomorrow!) [16:36:41] sure [16:36:52] \o [16:40:27] o/ [18:42:57] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10achou) I found two problems while testing the following Change-Prop staging config: ` outlink-top... [19:15:42] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Ottomata) Ah, yes, you'll need to filter out canary events. We need better docs on this. I'm [[ https:/...