[07:22:43] <ozge_>	 Good morning.
[07:33:24] <bartosz>	 morning! 
[08:06:52] <wikibugs>	 06Machine-Learning-Team, 05Goal: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production - https://phabricator.wikimedia.org/T408790#11343521 (10OKarakaya-WMF)
[08:18:22] <dpogorzelski>	 morning
[08:27:46] <dpogorzelski>	 do we have miro or other similar products ?
[08:37:36] <elukey>	 we do have miro, but we need to ask for a license via OIT IIRC
[08:38:28] <elukey>	 dpogorzelski: are the root credentials working fine etc..?
[08:38:51] <dpogorzelski>	 yep, just poking around the new gpu node :) 
[08:39:02] <dpogorzelski>	 kubctl etc also works from deployment
[08:39:17] <dpogorzelski>	 so i should be good
[08:39:49] <elukey>	 perfect :)
[08:45:00] <dpogorzelski>	 what's the k8s env where ml-serve1012 was added?
[08:47:45] <dpogorzelski>	 basically what is the kube_env namespace/env combo to poke the right cluster :) 
[08:50:42] <elukey>	 ml-serve-eqiad
[08:51:48] <elukey>	 you can use `kube-env admin ml-serve-eqiad` from root on deploy2002 to have a broader access (careful, it grants you access to all namespaces etc..)
[08:55:53] <dpogorzelski>	 so kube-env can only load admin via sudo but sudo doesn't have kube-env.sh loaded via /etc/profile.d 
[08:56:04] <dpogorzelski>	 what am i missing ?
[09:01:54] <elukey>	 you need to sudo -i before
[09:02:07] <elukey>	 in a root session, you can use kube-env admin
[09:02:29] <elukey>	 this is what I usually d
[09:02:34] <elukey>	 *do
[09:07:39] <bartosz>	 I'm seeing that some of our staging deployments are setting `monitoring.enabled: false` in their values file and some are keeping it true. If set to true, it adds `prometheus.io/scrape: true` annotation in pods/inferenceservices
[09:07:54] <bartosz>	 However, I also see that we use `prometheus.kserve.io/scrape` annotation in our services. Some staging deployments have `monitoring.enabled: false`, but do inherit the `prometheus.kserve.io/scrape: true` from the production values file and I can see their metrics in Grafana :D 
[09:08:06] <bartosz>	 So I’m wondering - do we actually utilize the `prometheus.io/...` annotations?
[09:17:55] <dpogorzelski>	 aiko: does the llm image have a chart I could use?
[09:19:38] <dpogorzelski>	 aha i see one
[09:20:35] <dpogorzelski>	 somethig weird gets into the root session after sudo -i since my backspace becomes space :) 
[09:20:50] <dpogorzelski>	 (ghostty/zsh)
[09:23:31] <wikibugs>	 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11343742 (10achou) **Update**  I've collected articles in English (en), French (fr), Arabic (ar), and Japanese (ja), then...
[09:42:47] <elukey>	 bartosz: In theory they are used, have you checked if when we use `monitoring.enabled: false` we do see metrics from kserve?
[09:42:58] <dpogorzelski>	 terminfo gets lost after sudo -i, that's the reason. in general i'm not a super fan of shared admin jump hosts. having the capability to act independently on the "owned" (sub)domain of resources would remove some of the unnecessary friction imo. a PKI via Vault that could dispatch on demand, short lived credentials, like a signed cert for k8s, based on the user's identity would be perfect 
[09:44:14] <elukey>	 dpogorzelski: definitely, but we haven't done it so far. The root admin kube-env is used only by SREs when needed, otherwise you can use the per-user kube-env that is not shared
[09:44:39] <elukey>	 we have plans to add Vault in the future but at the moment we need to balance our needs vs the complexity that it will bring
[09:45:15] <dpogorzelski>	 👍
[09:45:20] <elukey>	 to summarize - I didn't mean that you need to always use kube-env admin, it was just a suggestion for when you need to debug cluster-level things
[09:46:26] <wikibugs>	 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11343856 (10achou) @dcausse Thanks a lot! I found it was also missing $schema. (...
[09:48:04] <dpogorzelski>	 yep :) 
[09:54:32] <bartosz>	 elukey: Hmm it seems to me  that we see (or don't see :D) the same number of kserve metrics on staging, regardless of `monitoring.enabled: true/false`. I've checked the Kserve dashboard (https://grafana.wikimedia.org/goto/rP-WTczDR?orgId=1) and Kserve Inference Services dashboard (https://grafana.wikimedia.org/goto/dIinTckvR?orgId=1)
[09:59:42] <dpogorzelski>	 what is the stat1004 host?
[10:05:46] <elukey>	 bartosz: then it is probably not being used by the inference-services chart, so we can probably remove it. I can try to check later on to double check, but you noticed a diff right?
[10:06:05] <elukey>	 dpogorzelski: old host that has been decommed, it was part of the stat10xx series
[10:06:17] <dpogorzelski>	 kk
[10:12:45] <dpogorzelski>	 aiko: when `❯ pip install -r src/models/llm/requirements.txt` i get `ERROR: bitsandbytes-1.0.0-py3-none-manylinux_2_24_x86_64.whl is not a supported wheel on this platform.` how do you workaround this on mac?
[10:13:16] <bartosz>	 elukey: thank you, I'm happy to double check later as well, it's not too urgent. I don't think I noticed any diff, I stumbled onto it as I'm setting up a new inference service on staging and was wondering what should I set in the `monitoring.enabled` value. So I started investigating it..
[10:22:59] <aiko>	 dpogorzelski: I think we didn't test it on mac. bitsandbytes is for llm quantization, we were testing it on ml-lab
[10:28:38] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] "LGTM! THNX!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira)
[10:29:13] <aiko>	 probably we can start without bitsandbytes
[10:32:43] <wikibugs>	 06Machine-Learning-Team: Create a notebook for revise tone structured task generation logic - https://phabricator.wikimedia.org/T405324#11343944 (10achou) 05Open→03Resolved Done. This is the [[ https://gitlab.wikimedia.org/repos/machine-learning/exploratory-notebook/-/blob/main/tone-check/task_generation...
[11:48:11] <wikibugs>	 (03CR) 10Nik Gkountas: "I'm wondering if it's really desired to remove the dependency on CXServer languagepairs. The recommendation API at this point only serves " [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)
[11:48:40] <wikibugs>	 (03CR) 10Nik Gkountas: [C:04-1] "Giving -1 for visibility" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)
[11:56:46] <wikibugs>	 (03CR) 10Nik Gkountas: [C:03+2] Improve periodic update flow and error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201718 (https://phabricator.wikimedia.org/T406854) (owner: 10Sbisson)
[11:58:12] <wikibugs>	 (03Merged) 10jenkins-bot: Improve periodic update flow and error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201718 (https://phabricator.wikimedia.org/T406854) (owner: 10Sbisson)
[12:10:15] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] revertrisk-wikidata: add model-server for Graph2Text model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira)
[12:11:29] <wikibugs>	 (03Merged) 10jenkins-bot: revertrisk-wikidata: add model-server for Graph2Text model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira)
[13:06:24] <dpogorzelski>	 ok so back to basics:
[13:06:24] <dpogorzelski>	 * is this the service we want to deploy on top of the node with the new GPU? https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/
[13:06:24] <dpogorzelski>	 * if yes, what's the corresponding helm chart and what is the container full name?
[13:18:56] <dpogorzelski>	 elukey: in case i want to add a taint toleration what's the procedure step by step? ideally i would be able to perform quick iterations locally where i just have the yaml spec, make a change, kubectl apply and see if it works
[13:27:42] <elukey>	 dpogorzelski: I'd start from the other way around, namely checking the deployment-charts repo and helmfile
[13:28:08] <elukey>	 not sure if you already checked those, but in there we store the current deployments for the various ml-services
[13:28:25] <elukey>	 you can find a breakdown for each namespace under helmfile.d/ml-services/etc..
[13:28:35] <dpogorzelski>	 yep, there is an llm chart but the image name doesn't sound right and it seems there is a deployment which is 56d old :) 
[13:29:05] <dpogorzelski>	 at least there is a pod deployed to the llm namespace in k8s
[13:29:33] <elukey>	 so that llm is not a chart, but an helmfile config
[13:29:55] <elukey>	 the chart that we use should be inference-services
[13:30:06] <dpogorzelski>	 https://www.irccloud.com/pastebin/5HV1RYKH/
[13:30:13] <elukey>	 that creates the appropriate InferenceService resources etc..
[13:30:20] <dpogorzelski>	 kk
[13:30:37] <dpogorzelski>	 can't inferm from the name above if this is the container that would be produced from https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/
[13:31:26] <dpogorzelski>	 i think this could have been just a deployment spec parametrized via Gitlab's CI
[13:32:04] <elukey>	 we don't have gitlab auto-deploys to k8s at the moment :)
[13:32:14] <elukey>	 anyway, the above link that you pasted produces https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-llm/tags/
[13:32:28] <elukey>	 you can find the other images etc.. in the main docker-registry page
[13:32:46] <dpogorzelski>	 ok perfect so those helm files seem to be correct
[13:32:51] <dpogorzelski>	 i would just have to add taints
[13:33:37] <elukey>	 yes, and that part is new because we never done it.. In theory I'd expect to find some option in the InferenceService CRD related to tollerations
[13:33:49] <elukey>	 that will be passed to all layers etc..
[13:34:16] <dpogorzelski>	 what does the inferenceservice crd do?
[13:34:47] <elukey>	 this is the part that I was trying to explain, before starting to deploy stuff via kubectl apply :D
[13:35:20] <elukey>	 we use kserve to manage most of the things deployed on ml-serve clusters, that runs a controller in the kserve namespace
[13:35:44] <elukey>	 that controller is responsible to handle the InferenceService resources, that are based on the related CRD
[13:36:16] <elukey>	 if you check the inference-services repo, you'll find at some point the definition of InferenceService buried in the templates
[13:36:56] <elukey>	 it is a way to set up knative-serving resources, istio resources, etc.. without explicitly managing them
[13:38:27] <dpogorzelski>	 i'm familiar with knative but not with kserve. how do they relate to each other?
[13:38:44] <dpogorzelski>	 does kserve require knative?
[13:38:59] <elukey>	 in our setup yes
[13:39:32] <elukey>	 at the time knative and istio were required to run it, not sure if nowadays there are less constraints
[13:40:10] <elukey>	 anyway, we usually don't need to explicitly create any istio/knative resource, we just need to tune some settings etc..
[13:40:10] <dpogorzelski>	 seems they are only if you want request based scaling
[13:40:24] <aiko>	 dpogorizelski: I think we can test the aya model
[13:40:33] <aiko>	 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1101000
[13:40:44] <aiko>	 this one is 8B version once deployed in ml-staging with old gpu
[13:40:50] <aiko>	 https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/aya/aya.py
[13:41:03] <aiko>	 we tried 32B but did not succeed
[13:41:12] <aiko>	 https://phabricator.wikimedia.org/T379052#10394897
[13:41:33] <dpogorzelski>	 cool
[13:41:52] <aiko>	 and we concluded that aya-expanse-32B model can be hosted on LiftWing but do serve it efficiently we'll need to use vllm image
[13:42:08] <aiko>	 https://phabricator.wikimedia.org/T391941
[13:44:50] <dpogorzelski>	 so if i wanted to extract this block https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1101000/4/helmfile.d/ml-services/experimental/values-ml-staging-codfw.yaml#81 into it's own file and keep iterating over it what would be the procedure? would every change/iteration require a separate gerrit submit+review etc?
[13:47:15] <dpogorzelski>	 if i iterated on the deployment host it would get overridden by any change coming from gerrit and I can't iterate locally on my machine afaik
[13:48:01] <elukey>	 since it is technically production I'd suggest to start with filing code changes, get them reviewed, merged and deployed via helmfile to ml-serve-eqiad
[13:48:16] <elukey>	 so you'll learn the process and familiarize with the tools
[13:48:46] <elukey>	 to iterate quickly there may be some hacks that we could use, like copying the deployment-charts repo from /srv to your home dir and modify files in there
[13:49:07] <elukey>	 usually it works but it is not recommended, unless the use case is really straightforward
[13:49:32] <aiko>	 in production, yes, every change/iteration require a gerrit submit+review. only in experimental namespace in ml-staging, we can edit an isvc 
[13:49:34] <dpogorzelski>	 the thing is a change like adding a tain can literally take days to test out with current workflow
[13:49:34] <aiko>	 https://wikimedia.slack.com/archives/G01A0FNPLG4/p1718809932974299
[13:51:12] <elukey>	 dpogorzelski: every change that you file for deployment charts trigger a CI diff that you can inspect, before merging. If we find the right place in the InferenceService CRD where tollerations are added it should be quick
[13:55:35] <dpogorzelski>	 `kubectl patch configmap config-features -n knative-serving -p '{"data": {"kubernetes.podspec-nodeselector": "enabled", "kubernetes.podspec-tolerations": "enabled", "kubernetes.podspec-affinity": "enabled"}}'`
[13:55:55] <dpogorzelski>	 https://github.com/kserve/kserve/issues/730#issuecomment-1145718894
[13:57:36] <dpogorzelski>	 will check the configmap after the meeting
[13:59:11] <elukey>	 nice it seems promising! that config map is configured in deployment-charts, I hope that the setting is available in our version
[13:59:39] <dpogorzelski>	 must be, the issue was from 2022 :) 
[14:07:11] <elukey>	 our version is very old, fingers crossed 
[14:41:00] <wikibugs>	 (03CR) 10Sbisson: "As far as I can tell, CX is enabled on all Wikipedias. See https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/re" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)
[15:06:40] <wikibugs>	 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11345086 (10Ottomata) > Yep, we'll use mediawiki.page_content_change.v1. I think...
[15:10:11] <aiko>	 dpogorzelski: I forgot that we have this Phab ticket https://phabricator.wikimedia.org/T403599 we can use this one
[15:15:37] <dpogorzelski>	 kk
[15:21:07] <dpogorzelski>	 i imagine that contents of `values-ml-staging-codfw` apply to codfw but `values.yaml` if for eqiad?
[15:22:57] <elukey>	 values.yaml applies to all clusters, then you can have specific overrides like for staging etc..
[15:23:30] <aiko>	 values.yaml for production, and we have eqiad and codfw
[15:23:31] <dpogorzelski>	 aha not sure where i looked at that makes sense
[15:23:31] <elukey>	 in every helmfile.yaml you can see the chain of importance in templates->values
[15:23:39] <aiko>	 staging only on codfw
[15:26:25] <aiko>	 how we deploy: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Deploy#How_to_deploy
[15:45:57] <dpogorzelski>	 assuming knative is 1.7 that feature should be in
[15:46:03] <dpogorzelski>	 the configmap doesn't have those values
[15:46:09] <dpogorzelski>	 should i just shove them in?
[15:46:31] <dpogorzelski>	 on eqiad ml cluster that is
[15:53:39] <elukey>	 they are configured in deployment-charts/helmfile.d/adming_ng
[15:53:43] <elukey>	 *admin_ng
[15:54:07] <elukey>	 that is another helmfile chain meant to take care of all the cluster-level controllers/etc.. that are different from regular services
[15:54:34] <elukey>	 we have per cluster or common configs, in theory we could start from staging (filing a code review, merge, deploy etc..)
[15:54:39] <elukey>	 see if it works, and then move to prod
[15:54:45] <elukey>	 wdyt?
[15:54:57] <dpogorzelski>	 sure
[16:02:52] <dpogorzelski>	 hmmm the deployment charts repo when cloned with the commit hook, the hook doesn't work out of the box need to check
[16:15:54] <wikibugs>	 (03PS2) 10Sbisson: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000)
[16:37:55] <wikibugs>	 (03PS3) 10Sbisson: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000)
[20:53:25] <wikibugs>	 (03CR) 10Nik Gkountas: [C:03+2] "So, we have two lists of "valid" language codes, cxserver languages and sitematrix languages (code === 'wiki' and not closed). Currently, " [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)
[20:54:16] <wikibugs>	 (03Merged) 10jenkins-bot: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)