[09:01:24] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10hashar) In the Gerrit sshd logs (`gerrit1001` `/var/log/gerrit/sshd_log`): ` [2022-06-28T08:36:42.332Z] 3aca3ca9 [SSHD] pha... [09:14:33] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10hashar) rORES replication config is at https://phabricator.wikimedia.org/source/ores/manage/uris/ {F35283163 size=full} P... [09:19:33] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10hashar) From the URI configuration history at https://phabricator.wikimedia.org/source/ores/uri/view/20533/ > @MarcoAureli... [09:35:12] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10hashar) The Phabricator replication from GitHub to Gerrit has been broken since September 2019 at least since the configure... [09:38:55] \o [09:39:53] elukey: can I bother you about testing staging? I recall you had a ready-made json file and curl command line to do simple endpoint testing. Also, I wonder what we want to do with deployment charts in the context of the staging cluster, since none of what runs there will be all that permanent. [09:41:02] o/ sur [09:41:05] *sure [09:41:47] I would use deployment-charts anyway with a staging specific config, so that we'll test a model/isvc deployment end-to-end [09:42:14] for the json + curl command I can give it to you but we'd need to deploy one isvc [09:43:39] IIRC in the `services` dir there are special values like `values-staging.yaml` [09:43:49] but I didn't check how are they picked up [09:45:03] I'll have a look [09:45:17] it is probably in the helmfile.yaml config of the services [09:46:05] My main concern is that deploying everything that we have on -serve also on -staging likely will be a capacity problem [09:46:42] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10Aklapper) >>! In T311390#8032141, @hashar wrote: > I am really tempted to take that opportunity to phase out the {nav Githu... [09:48:02] klausman: we should probably deploy only few things in staging, to test major things and isvcs config [09:48:22] I agree that all isvcs in production shouldn't be on ml-staging [09:48:34] but maybe one for each kind/type (article quality, edit quality, etc..) [09:48:59] Yeah, that sounds good. [09:49:03] super [09:49:29] Not sure yet if we want enwiki for most of them, or something else to be not blind to differences. I guess time will tell what wikis are useful [09:50:16] Trying really hard to avoid the "It works for enwiki, so it works everywhere" trap :) [09:51:06] 10Machine-Learning-Team, 10Phabricator, 10Release-Engineering-Team: Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores - https://phabricator.wikimedia.org/T311390 (10hashar) I have triggered a mirror push on phab1001: ` name=/srv/phab/phabricator/bin/repository mirror --verbose ORES Pushi... [10:06:19] the ml-cache codfw cluster is up on bullseye :) [10:08:06] going afk, ttl! [12:59:29] elukey: do we have something like pcc for deployment charts? It would be really need to be able to run a diff against staging/prod before submitting a change [12:59:52] (I mean, I can hand-edit/-copy the files to the deployment machine, but that seems less than safe) [14:04:10] klausman: in theory https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/809149 should show us a diff when there is one [14:30:32] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Load test the Lift Wing cluster - https://phabricator.wikimedia.org/T296173 (10calbon) [14:30:45] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Test Ray worker in Kserve - https://phabricator.wikimedia.org/T309624 (10calbon) 05Open→03Resolved [14:50:53] Hm, I don't see anything in the diff. Does it maybe not work since no version is deployed at all atm? [14:52:15] I was about to say that I think something doesn't work [14:52:29] we are missing something probably, we should see a diff [14:57:10] Do we maybe need to add staging at the bottom of revscoring-articlequality/helmfile.yaml? [14:58:08] Trying that now [14:59:22] Unrelatedly, did you see the PSU failure thing on ores2007? [15:03:15] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) >>! In T302232#8031653, @Eevans wrote: >>>! In T302232#8030200, @elukey wrote: >> @lbowmaker hi! I reviewed https://www.mediawiki.org/wiki/Pla... [15:03:27] nvm, it went back to OK just now [15:04:14] elukey: that addition to revscoring-articlequality/helmfile.yaml seems to have done the trick: https://integration.wikimedia.org/ci/job/helm-lint/7597/console [15:05:45] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10elukey) >>! In T302232#8030372, @lbowmaker wrote: > @elukey - in an ideal world we would like to abstract you from a lot of the underlying details of... [15:06:20] klausman: yeah I think they are doing some PSU work in codfw, Papaul mentioned it yesterday [15:06:29] ah, ack [15:08:14] klausman: ah I may have found what's missing [15:08:44] there is an entry called "environments" at the bottom of the articlequality's helmfile.yaml [15:09:11] we need to add "ml-staging-codfw" to it [15:09:24] read what I have sent at 16:57 and 17:04 :) [15:10:32] ahhhh [15:10:47] sorry I lost it between wikibugs updates [15:10:58] super yes [15:11:23] I am going to take a little walk before my next meeting, ttl [15:11:29] aye, \o [15:15:13] Doing the same, for groceries :) [15:40:42] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10Eevans) >>! In T302232#8033571, @elukey wrote: >>>! In T302232#8031653, @Eevans wrote: >>>>! In T302232#8030200, @elukey wrote: >>> @lbowmaker hi! I r... [15:43:03] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Set up the ml-cache clusters - https://phabricator.wikimedia.org/T302232 (10Eevans) >>! In T302232#8033574, @elukey wrote: >>>! In T302232#8030372, @lbowmaker wrote: >> @elukey - in an ideal world we would like to abstract you... [16:06:45] and merged! [16:07:45] Um. I may have broke the jenkins gate-and-merge [16:13:41] nah, it's ok. [16:13:48] But the pods are crashlooping [16:16:13] And I can't get the logs :( [16:21:25] Oh well, something for tomorrow-me [16:21:27] \o [17:13:13] Night all!