Unauthorized

[05:30:39] <isaranto>	 good morning o/
[06:09:40] <bartosz>	 good morning ☀️ 
[06:53:07] <ozge_>	 Good morning
[08:37:10] <isaranto>	 I think I need to create a new Wikipedia page named "Browser Tab Clutter". I see it is missing :P
[09:46:06] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck, 10Editing-team (Tracking): Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#10949656 (10isarantopoulos) Thanks a lot for the helpful comments Luca and sorry for the delayed response here.  regarding operations...
[09:47:16] <isaranto>	 klausman: (and rest of team). what do you think about the above --^? Shall we just use 20x response codes in our latency SLO calculation? I think it does make more sense, just want to hear if anyone has any different thoughts on this
[09:56:20] <klausman>	 I agree. 500s are obviously out, and neither 400 nor 300 is "productive traffic" in the strictest sense, so they would water down the SLO
[10:19:46] <isaranto>	 sgtm, thank you!
[11:33:54] <bartosz>	 To me it also sounds more informative to calculate latency only for 2xx responses. If we'd include 3xx and 4xx, throwing a lot of  them could greatly improve our latency numbers :D
[12:15:18] <isaranto>	 agree!
[13:21:16] <kart_>	 elukey, klausman What's next step we should do for issue with s3 + machinetranslation?
[13:21:53] <elukey>	 I think that Tobias was in the process of fixing the swift account
[13:21:54] <klausman>	 I am waiting for the ok from DP to restart the FEs so the username change is active, and then we can do a quick test from a statbox
[13:22:15] <kart_>	 Noted. Thanks!
[13:37:46] <urbanecm>	 Hey hey! What is the liftwing equivalent of https://ores.wikimedia.org/v3/scores? i noticed Growth has code like https://github.com/wikimedia/mediawiki-extensions-GrowthExperiments/blob/master/maintenance/importOresTopics.php#L283, which tries to determine whether articletopic can be generated.
[13:38:02] <urbanecm>	 or does the new model work on _any_ wiki?
[13:40:25] <georgekyz>	 Probably stupid question: I see that in inference-services we are using `use-system-site-packages: false` in blubber. I am trying to build an image similar to that one from inference-services (but without kserve of course). So I am using the same base image which has torch in it. My question is: Do I need to set `use-system-site-packages: true` in order to use the pre-installed libraries from the base image? I am using almost the same 
[13:40:25] <georgekyz>	 blubber from inference-services but when the kokkuri pipeline runs I see that the torch is not being used.
[13:52:13] <bartosz>	 georgekyz: Are you creating a new Python venv in the new image? AFAIK the `use-system-site-packages` translates to adding the `--system-site-packages` flag to the venv creation, which allows it to use packages from the global Python installation. However, I'd be wondering if `torch` is actually inside the global Python installation of the base image, or if it's in another venv in your base image 🤔 
[13:53:14] <bartosz>	 If torch is installed in a different venv, setting this flag to true probably won't solve it :( 
[13:54:45] <georgekyz>	 bartosz: thnx for the quick response mate! I got confused a little bit with the paths. I am using `docker-registry.wikimedia.org/amd-pytorch23:2.3.0rocm6.0-3-20250511` as a base image and I was recieving an error that module torch not found. 
[13:55:01] <georgekyz>	 But I think that I had messed up the paths
[13:56:44] <georgekyz>	 More preciesely I was getting:
[13:56:47] <georgekyz>	 https://www.irccloud.com/pastebin/GLB5duxB/
[13:57:34] <bartosz>	 georgekyz: Ah I also got a little confused and thought that you might be using some specific image built in inference-services as your base
[13:58:09] <georgekyz>	 probably because I was using the `Trainer` object from transformers that was needed the 'accelerate>=0.26.0'` but it was huge and I couldn't push that image to registry. So I am trying tomake a very lightweight image and finally achieve to push it :P
[13:59:05] <georgekyz>	 Now I built a new one without the `Trainer` object from transformers and build my custom training loop so we go oldschool :P
[14:15:19] <klausman>	 thanos-fe's have been restarted. No change yet on statbox s3cmds. I think the extra swift command needs doing, but I don't know where it needs to be run
[14:25:45] <elukey>	 done it, does it work now?
[14:30:16] <elukey>	 because I see ERROR: Bucket 'wmf-ml-models' does not exist
[14:30:17] <elukey>	 mmm
[14:32:08] <klausman>	 Same here,
[14:32:24] <klausman>	 s3:/// is just empty
[14:37:56] <elukey>	 now it lists an empty bucket, I think it depends from what account the command to allow the bucket is run from 
[14:38:07] <elukey>	 on thanos-fe1004.eqiad.wmnet
[14:38:20] <klausman>	 I'm running on stat1011.
[14:38:31] <elukey>	 root@thanos-fe1004:/etc/swift# source /etc/swift/account_AUTH_machinetranslation.env 
[14:38:34] <elukey>	 root@thanos-fe1004:/etc/swift# swift list
[14:38:36] <elukey>	 wmf-ml-models
[14:38:39] <elukey>	 wmf-ml-models+segments
[14:39:19] <elukey>	 if I use account_AUTH_mlserve.env the output is the same
[14:40:20] <klausman>	 if I set the same env vars as that env file on stat1011 and then run swift list, I get:
[14:40:31] <klausman>	 Auth GET failed: https://thanos-swift.discovery.wmnet/auth/v1.0 401 Unauthorized  [first 60 chars of response] b'<html><h1>Unauthorized</h1><p>This server could not verify t'
[14:45:59] <elukey>	 yeah but that env file is for swift, how are you using it on stat1011?
[14:46:19] <klausman>	 There is a swift cli util there as well
[14:47:00] <klausman>	 though I dunno if that is supposed tow ork or it being broken is "as desired"
[14:51:14] <wikibugs>	 06Machine-Learning-Team, 06Moderator-Tools-Team: AI/ML Infrastructure Request: Persist historical revert risk multilingual model scores for threshold analysis - https://phabricator.wikimedia.org/T397187#10950918 (10SSalgaonkar-WMF) @DMburugu thank you so much for this helpful response!! It totally makes sense...
[14:51:34] <elukey>	 I think that the account may be configured in the wrong way
[14:51:37] <elukey>	 see in swift.yaml
[14:51:38] <elukey>	   machinetranslation:
[14:51:38] <elukey>	     access:       '.admin'
[14:51:38] <elukey>	     account_name: 'AUTH_machinetranslation'
[14:51:38] <elukey>	     auth:         'https://thanos-swift.discovery.wmnet'
[14:51:40] <elukey>	     user:         'machinetranslation:prod'
[14:51:43] <elukey>	 vs
[14:51:51] <elukey>	   mlserve_ro:
[14:51:51] <elukey>	     access:       ''
[14:51:51] <elukey>	     account_name: 'AUTH_mlserve'
[14:51:51] <elukey>	     auth:         'https://thanos-swift.discovery.wmnet'
[14:51:51] <elukey>	     user:         'mlserve:ro'
[14:51:54] <elukey>	     stats_enabled: 'no'
[14:52:02] <elukey>	 I think that machinetranslation should look like --^
[14:52:11] <elukey>	 and then get the +r permissions
[14:53:22] <klausman>	 you have a point. I have a meeting until 17:00, will make a patch afterwards
[15:03:12] <klausman>	 elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164235
[15:07:38] <elukey>	 klausman: let's wait for Matthew, because in this case I am not sure how it is best to proceed
[15:07:40] <isaranto>	 urbanecm: o/ the Liftwing equivalent for the old articletopic model can be accessed by using either an internal endpoint (suitable for Mediawiki) https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage#Example_usage_of_internal_endpoint or via the api gateway for any other service https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_revscoring_articletopic_prediction
[15:08:02] <elukey>	 because I suspect that every account has its own view of the buckets
[15:08:05] <klausman>	 elukey: ack. will add/edit the stuff you commented on
[15:08:21] <elukey>	 I mean at this point we could use directly mlserve:ro
[15:08:23] <elukey>	 if it is the case
[15:09:11] <isaranto>	 urbanecm: there is also a newer model that can be used named articletopic outlink https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_articletopic_outlink_prediction https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language_agnostic_link-based_article_topic
[15:10:29] <klausman>	 elukey: a separate account is nice in theory, but the distinct is getting pretty thin now
[15:10:38] <klausman>	 distinction*
[15:47:52] <urbanecm>	 isaranto: thanks for the clarification! that makes sense. 
[16:05:33] * isaranto afk!
[21:31:53] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10952217 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr
[21:32:31] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10952221 (10Jclark-ctr) 05Resolved→03Open accidentally resolved ticket  instead of assigning to my self
[21:33:09] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install ml-serve101[2345] - https://phabricator.wikimedia.org/T393948#10952225 (10Jclark-ctr)