[03:54:36] o/ good morning! [03:55:41] Getting an early start , just a heads-up that I’ll be offline for a couple of hours around midday. [04:17:48] 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck, 10Editing-team (Tracking): Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10952768 (10isarantopoulos) After a brief IRC discussion with the team I have updated the page to mention only successful requests fo... [05:08:40] I made a patch to remove the pytorch 2.1.2+ROCm5.6 image from the production-images repo as it is no longer used anywhere https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1164329 [05:09:23] I'd like to do the same with the rest of the images but for now pytorch 2.2, 2.3 & 2.5 are still used [05:58:14] good morning! [06:20:13] o/ [06:57:29] Good morning [07:59:13] * isaranto afk bbl [08:31:09] o/ https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1164271 if anyone has a moment [08:39:17] ^ Left a small comment from my side 🙌 [09:45:59] thanks for the review! updated it :) [10:03:12] +1 from my end! Thnx for working on this Aiko [10:17:05] +1 from my side as well, thank you 🙌 [11:03:27] klausman: o/ as FYI I just synced admin_ng for ml-staging-codfw [11:03:45] ack. what was the occasion/diff? [11:05:03] I am adding the debmonitor user to staging envs, and there were some stuff related to new endpoints, etc.. [11:05:08] nothing out of the ordinary [11:06:01] the debmonitor user will allow us to poll all the containers and their images, so we'll hopefully have a good breakdown of what images are running in a cluster in debmonitor [11:06:04] all WIP :) [11:06:47] :+1: [11:07:15] Thanks for that. I sometimes am a bit worried about old images in the Docker registry one day blowing something up in a very bad way [11:07:36] (nto to mention the wasted resources) [11:08:13] and for us it is a real pain if a vulnerability gets out for a debian package [11:08:27] there is the case of the golang binaries, but that is another can of worms :D [11:08:52] I have half a proposal about that in my brain. I think it's _relatively_ easy to solve [11:09:13] good to hear :) [11:59:04] this sounds great! can we also check what images are running or only sres? [12:06:43] I think it should be available to everybody, will check [12:12:52] thank you! [12:13:24] btw my wifi is realyyyy slow so you may seem some of my messages at random times [15:04:08] I wanted to deploy this edit-check change in experimental ns https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1164271, but helmfile diff shows sth unexpected. The change for edit-check is correct, updating the image version and adding USE_METADATA env [15:04:25] but the change should also remove unnecessary edit-check services previously tested (edit-check-cpu and edit-check-gpu). They aren't being removed, instead their image versions are changed to older versions [15:05:04] the full helmfile diff is here https://phabricator.wikimedia.org/P78719. I saw the current services were deployed 16 days ago. Could this unexpected diff be related to Tobias rebooting all of the staging machines on 11/06? [15:05:19] would it be safe to proceed with sync and then manually delete the unnecessary edit-check services? [15:09:36] just noting the issue here. we can address it next Monday [16:18:43] 06Machine-Learning-Team, 10ORES, 10Temporary accounts, 06Trust and Safety Product Team: No edits will be shown in the recent change with "Very Likely bad faith" ORES filter - https://phabricator.wikimedia.org/T398066 (10SCP-2000) 03NEW [16:20:18] 06Machine-Learning-Team, 10ORES, 10Temporary accounts, 06Trust and Safety Product Team: RecentChanges with "Very Likely bad faith" ORES filter don't show Temporary accounts' edits - https://phabricator.wikimedia.org/T398066#10954791 (10SCP-2000)