[07:22:43] Good morning. [07:33:24] morning! [08:06:52] 06Machine-Learning-Team, 05Goal: Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production - https://phabricator.wikimedia.org/T408790#11343521 (10OKarakaya-WMF) [08:18:22] morning [08:27:46] do we have miro or other similar products ? [08:37:36] we do have miro, but we need to ask for a license via OIT IIRC [08:38:28] dpogorzelski: are the root credentials working fine etc..? [08:38:51] yep, just poking around the new gpu node :) [08:39:02] kubctl etc also works from deployment [08:39:17] so i should be good [08:39:49] perfect :) [08:45:00] what's the k8s env where ml-serve1012 was added? [08:47:45] basically what is the kube_env namespace/env combo to poke the right cluster :) [08:50:42] ml-serve-eqiad [08:51:48] you can use `kube-env admin ml-serve-eqiad` from root on deploy2002 to have a broader access (careful, it grants you access to all namespaces etc..) [08:55:53] so kube-env can only load admin via sudo but sudo doesn't have kube-env.sh loaded via /etc/profile.d [08:56:04] what am i missing ? [09:01:54] you need to sudo -i before [09:02:07] in a root session, you can use kube-env admin [09:02:29] this is what I usually d [09:02:34] *do [09:07:39] I'm seeing that some of our staging deployments are setting `monitoring.enabled: false` in their values file and some are keeping it true. If set to true, it adds `prometheus.io/scrape: true` annotation in pods/inferenceservices [09:07:54] However, I also see that we use `prometheus.kserve.io/scrape` annotation in our services. Some staging deployments have `monitoring.enabled: false`, but do inherit the `prometheus.kserve.io/scrape: true` from the production values file and I can see their metrics in Grafana :D [09:08:06] So I’m wondering - do we actually utilize the `prometheus.io/...` annotations? [09:17:55] aiko: does the llm image have a chart I could use? [09:19:38] aha i see one [09:20:35] somethig weird gets into the root session after sudo -i since my backspace becomes space :) [09:20:50] (ghostty/zsh) [09:23:31] 06Machine-Learning-Team, 06Discovery-Search (2025.10.20 - 2025.11.07): Initial task generation and ingestion to Cassandra and Search weight tags - https://phabricator.wikimedia.org/T408533#11343742 (10achou) **Update** I've collected articles in English (en), French (fr), Arabic (ar), and Japanese (ja), then... [09:42:47] bartosz: In theory they are used, have you checked if when we use `monitoring.enabled: false` we do see metrics from kserve? [09:42:58] terminfo gets lost after sudo -i, that's the reason. in general i'm not a super fan of shared admin jump hosts. having the capability to act independently on the "owned" (sub)domain of resources would remove some of the unnecessary friction imo. a PKI via Vault that could dispatch on demand, short lived credentials, like a signed cert for k8s, based on the user's identity would be perfect [09:44:14] dpogorzelski: definitely, but we haven't done it so far. The root admin kube-env is used only by SREs when needed, otherwise you can use the per-user kube-env that is not shared [09:44:39] we have plans to add Vault in the future but at the moment we need to balance our needs vs the complexity that it will bring [09:45:15] šŸ‘ [09:45:20] to summarize - I didn't mean that you need to always use kube-env admin, it was just a suggestion for when you need to debug cluster-level things [09:46:26] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11343856 (10achou) @dcausse Thanks a lot! I found it was also missing $schema. (... [09:48:04] yep :) [09:54:32] elukey: Hmm it seems to me that we see (or don't see :D) the same number of kserve metrics on staging, regardless of `monitoring.enabled: true/false`. I've checked the Kserve dashboard (https://grafana.wikimedia.org/goto/rP-WTczDR?orgId=1) and Kserve Inference Services dashboard (https://grafana.wikimedia.org/goto/dIinTckvR?orgId=1) [09:59:42] what is the stat1004 host? [10:05:46] bartosz: then it is probably not being used by the inference-services chart, so we can probably remove it. I can try to check later on to double check, but you noticed a diff right? [10:06:05] dpogorzelski: old host that has been decommed, it was part of the stat10xx series [10:06:17] kk [10:12:45] aiko: when `āÆ pip install -r src/models/llm/requirements.txt` i get `ERROR: bitsandbytes-1.0.0-py3-none-manylinux_2_24_x86_64.whl is not a supported wheel on this platform.` how do you workaround this on mac? [10:13:16] elukey: thank you, I'm happy to double check later as well, it's not too urgent. I don't think I noticed any diff, I stumbled onto it as I'm setting up a new inference service on staging and was wondering what should I set in the `monitoring.enabled` value. So I started investigating it.. [10:22:59] dpogorzelski: I think we didn't test it on mac. bitsandbytes is for llm quantization, we were testing it on ml-lab [10:28:38] (03CR) 10Gkyziridis: [C:03+1] "LGTM! THNX!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [10:29:13] probably we can start without bitsandbytes [10:32:43] 06Machine-Learning-Team: Create a notebook for revise tone structured task generation logic - https://phabricator.wikimedia.org/T405324#11343944 (10achou) 05Open→03Resolved Done. This is the [[ https://gitlab.wikimedia.org/repos/machine-learning/exploratory-notebook/-/blob/main/tone-check/task_generation... [11:48:11] (03CR) 10Nik Gkountas: "I'm wondering if it's really desired to remove the dependency on CXServer languagepairs. The recommendation API at this point only serves " [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson) [11:48:40] (03CR) 10Nik Gkountas: [C:04-1] "Giving -1 for visibility" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson) [11:56:46] (03CR) 10Nik Gkountas: [C:03+2] Improve periodic update flow and error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201718 (https://phabricator.wikimedia.org/T406854) (owner: 10Sbisson) [11:58:12] (03Merged) 10jenkins-bot: Improve periodic update flow and error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201718 (https://phabricator.wikimedia.org/T406854) (owner: 10Sbisson) [12:10:15] (03CR) 10Kevin Bazira: [C:03+2] revertrisk-wikidata: add model-server for Graph2Text model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [12:11:29] (03Merged) 10jenkins-bot: revertrisk-wikidata: add model-server for Graph2Text model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1201558 (https://phabricator.wikimedia.org/T406179) (owner: 10Kevin Bazira) [13:06:24] ok so back to basics: [13:06:24] * is this the service we want to deploy on top of the node with the new GPU? https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/ [13:06:24] * if yes, what's the corresponding helm chart and what is the container full name? [13:18:56] elukey: in case i want to add a taint toleration what's the procedure step by step? ideally i would be able to perform quick iterations locally where i just have the yaml spec, make a change, kubectl apply and see if it works [13:27:42] dpogorzelski: I'd start from the other way around, namely checking the deployment-charts repo and helmfile [13:28:08] not sure if you already checked those, but in there we store the current deployments for the various ml-services [13:28:25] you can find a breakdown for each namespace under helmfile.d/ml-services/etc.. [13:28:35] yep, there is an llm chart but the image name doesn't sound right and it seems there is a deployment which is 56d old :) [13:29:05] at least there is a pod deployed to the llm namespace in k8s [13:29:33] so that llm is not a chart, but an helmfile config [13:29:55] the chart that we use should be inference-services [13:30:06] https://www.irccloud.com/pastebin/5HV1RYKH/ [13:30:13] that creates the appropriate InferenceService resources etc.. [13:30:20] kk [13:30:37] can't inferm from the name above if this is the container that would be produced from https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/ [13:31:26] i think this could have been just a deployment spec parametrized via Gitlab's CI [13:32:04] we don't have gitlab auto-deploys to k8s at the moment :) [13:32:14] anyway, the above link that you pasted produces https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-llm/tags/ [13:32:28] you can find the other images etc.. in the main docker-registry page [13:32:46] ok perfect so those helm files seem to be correct [13:32:51] i would just have to add taints [13:33:37] yes, and that part is new because we never done it.. In theory I'd expect to find some option in the InferenceService CRD related to tollerations [13:33:49] that will be passed to all layers etc.. [13:34:16] what does the inferenceservice crd do? [13:34:47] this is the part that I was trying to explain, before starting to deploy stuff via kubectl apply :D [13:35:20] we use kserve to manage most of the things deployed on ml-serve clusters, that runs a controller in the kserve namespace [13:35:44] that controller is responsible to handle the InferenceService resources, that are based on the related CRD [13:36:16] if you check the inference-services repo, you'll find at some point the definition of InferenceService buried in the templates [13:36:56] it is a way to set up knative-serving resources, istio resources, etc.. without explicitly managing them [13:38:27] i'm familiar with knative but not with kserve. how do they relate to each other? [13:38:44] does kserve require knative? [13:38:59] in our setup yes [13:39:32] at the time knative and istio were required to run it, not sure if nowadays there are less constraints [13:40:10] anyway, we usually don't need to explicitly create any istio/knative resource, we just need to tune some settings etc.. [13:40:10] seems they are only if you want request based scaling [13:40:24] dpogorizelski: I think we can test the aya model [13:40:33] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1101000 [13:40:44] this one is 8B version once deployed in ml-staging with old gpu [13:40:50] https://gerrit.wikimedia.org/r/plugins/gitiles/machinelearning/liftwing/inference-services/+/refs/heads/main/src/models/llm/aya/aya.py [13:41:03] we tried 32B but did not succeed [13:41:12] https://phabricator.wikimedia.org/T379052#10394897 [13:41:33] cool [13:41:52] and we concluded that aya-expanse-32B model can be hosted on LiftWing but do serve it efficiently we'll need to use vllm image [13:42:08] https://phabricator.wikimedia.org/T391941 [13:44:50] so if i wanted to extract this block https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1101000/4/helmfile.d/ml-services/experimental/values-ml-staging-codfw.yaml#81 into it's own file and keep iterating over it what would be the procedure? would every change/iteration require a separate gerrit submit+review etc? [13:47:15] if i iterated on the deployment host it would get overridden by any change coming from gerrit and I can't iterate locally on my machine afaik [13:48:01] since it is technically production I'd suggest to start with filing code changes, get them reviewed, merged and deployed via helmfile to ml-serve-eqiad [13:48:16] so you'll learn the process and familiarize with the tools [13:48:46] to iterate quickly there may be some hacks that we could use, like copying the deployment-charts repo from /srv to your home dir and modify files in there [13:49:07] usually it works but it is not recommended, unless the use case is really straightforward [13:49:32] in production, yes, every change/iteration require a gerrit submit+review. only in experimental namespace in ml-staging, we can edit an isvc [13:49:34] the thing is a change like adding a tain can literally take days to test out with current workflow [13:49:34] https://wikimedia.slack.com/archives/G01A0FNPLG4/p1718809932974299 [13:51:12] dpogorzelski: every change that you file for deployment charts trigger a CI diff that you can inspect, before merging. If we find the right place in the InferenceService CRD where tollerations are added it should be quick [13:55:35] `kubectl patch configmap config-features -n knative-serving -p '{"data": {"kubernetes.podspec-nodeselector": "enabled", "kubernetes.podspec-tolerations": "enabled", "kubernetes.podspec-affinity": "enabled"}}'` [13:55:55] https://github.com/kserve/kserve/issues/730#issuecomment-1145718894 [13:57:36] will check the configmap after the meeting [13:59:11] nice it seems promising! that config map is configured in deployment-charts, I hope that the setting is available in our version [13:59:39] must be, the issue was from 2022 :) [14:07:11] our version is very old, fingers crossed [14:41:00] (03CR) 10Sbisson: "As far as I can tell, CX is enabled on all Wikipedias. See https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/re" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson) [15:06:40] 06Machine-Learning-Team, 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11345086 (10Ottomata) > Yep, we'll use mediawiki.page_content_change.v1. I think... [15:10:11] dpogorzelski: I forgot that we have this Phab ticket https://phabricator.wikimedia.org/T403599 we can use this one [15:15:37] kk [15:21:07] i imagine that contents of `values-ml-staging-codfw` apply to codfw but `values.yaml` if for eqiad? [15:22:57] values.yaml applies to all clusters, then you can have specific overrides like for staging etc.. [15:23:30] values.yaml for production, and we have eqiad and codfw [15:23:31] aha not sure where i looked at that makes sense [15:23:31] in every helmfile.yaml you can see the chain of importance in templates->values [15:23:39] staging only on codfw [15:26:25] how we deploy: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Deploy#How_to_deploy [15:45:57] assuming knative is 1.7 that feature should be in [15:46:03] the configmap doesn't have those values [15:46:09] should i just shove them in? [15:46:31] on eqiad ml cluster that is [15:53:39] they are configured in deployment-charts/helmfile.d/adming_ng [15:53:43] *admin_ng [15:54:07] that is another helmfile chain meant to take care of all the cluster-level controllers/etc.. that are different from regular services [15:54:34] we have per cluster or common configs, in theory we could start from staging (filing a code review, merge, deploy etc..) [15:54:39] see if it works, and then move to prod [15:54:45] wdyt? [15:54:57] sure [16:02:52] hmmm the deployment charts repo when cloned with the commit hook, the hook doesn't work out of the box need to check [16:15:54] (03PS2) 10Sbisson: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) [16:37:55] (03PS3) 10Sbisson: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) [20:53:25] (03CR) 10Nik Gkountas: [C:03+2] "So, we have two lists of "valid" language codes, cxserver languages and sitematrix languages (code === 'wiki' and not closed). Currently, " [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson) [20:54:16] (03Merged) 10jenkins-bot: Validate language codes using sitematrix [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1201739 (https://phabricator.wikimedia.org/T405000) (owner: 10Sbisson)