[01:47:44] FIRING: LiftWingServiceErrorRate: ... [01:47:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=ptwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [05:15:54] (03CR) 10KartikMistry: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 (owner: 10Sbisson) [05:16:45] (03CR) 10KartikMistry: [C:04-1] "Script seems duplicated outside scripts/ dir!" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 (owner: 10Sbisson) [05:47:44] FIRING: LiftWingServiceErrorRate: ... [05:47:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=ptwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:17:44] RESOLVED: LiftWingServiceErrorRate: ... [06:17:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=ptwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:41:16] 10Lift-Wing, 06Machine-Learning-Team, 07OKR-Work: Create event stream for article-country model-server hosted on LiftWing - https://phabricator.wikimedia.org/T382295 (10kevinbazira) 03NEW [08:52:13] hello! [08:55:28] so flash attention worked on LW \o/ [08:56:05] but bitsandbytes didn't and it is probably because it doesn have access to rocm [08:56:35] I got this error again and again https://phabricator.wikimedia.org/T379052#10394897 [09:18:44] FIRING: LiftWingServiceErrorRate: ... [09:18:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-articlequality&var-backend=nlwiki-articlequality-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [09:23:44] RESOLVED: LiftWingServiceErrorRate: ... [09:23:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-articlequality&var-backend=nlwiki-articlequality-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [11:12:17] 06Machine-Learning-Team: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10408723 (10isarantopoulos) I rebuilt a wheel with pytorch 2.5.1(rocm) using `python setup.py bdist_wheel` and was able to succesfully deploy it on Lift Wing. [11:37:53] (03PS1) 10Ilias Sarantopoulos: llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 [11:49:16] (03CR) 10CI reject: [V:04-1] llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 (owner: 10Ilias Sarantopoulos) [11:52:29] (03PS2) 10Ilias Sarantopoulos: llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 [11:53:28] I want to test now bitsandbytes without flash attn to see if it will be ok (as it was before) [11:53:41] I submitted a patch so that we can enable/disable all this from env vars [12:25:09] (03CR) 10Kevin Bazira: [C:03+1] llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 (owner: 10Ilias Sarantopoulos) [12:51:00] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 (owner: 10Ilias Sarantopoulos) [12:51:47] (03Merged) 10jenkins-bot: llm: set flash_attn through env var [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1104981 (owner: 10Ilias Sarantopoulos) [13:14:48] * isaranto afk - lunch [13:57:43] (03PS1) 10Ilias Sarantopoulos: llm: add inference_mode and g++ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105002 (https://phabricator.wikimedia.org/T377848) [14:02:56] Updating rec-api.. [14:25:53] (03PS5) 10Sbisson: Smoke test to check deployments [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 [14:26:16] (03CR) 10Sbisson: "oops" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 (owner: 10Sbisson) [14:26:41] (03CR) 10CI reject: [V:04-1] Smoke test to check deployments [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 (owner: 10Sbisson) [14:29:31] (03CR) 10Sbisson: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1103355 (owner: 10Sbisson) [14:40:21] 10Lift-Wing, 06Machine-Learning-Team: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10409542 (10isarantopoulos) p:05Triage→03High [14:41:48] 06Machine-Learning-Team: [LLM] ML-lab benchmarking - https://phabricator.wikimedia.org/T382343 (10achou) 03NEW [14:42:05] 🙌 [14:48:05] 10Lift-Wing, 06Machine-Learning-Team: [LLM] Lift Wing load testing - https://phabricator.wikimedia.org/T377225#10409579 (10achou) [14:56:21] 10Lift-Wing, 06Machine-Learning-Team: Build and Publish ROCm-Compatible Python Packages - https://phabricator.wikimedia.org/T381859#10409597 (10isarantopoulos) I have created the following repository for this work on gitlab [[ https://gitlab.wikimedia.org/repos/machine-learning/rocm-wheelhouse | https://gitlab... [15:31:18] 06Machine-Learning-Team: [LLM] ML-lab benchmarking - https://phabricator.wikimedia.org/T382343#10409796 (10isarantopoulos) a:03kevinbazira [15:53:16] (03CR) 10Kevin Bazira: [C:03+1] llm: add inference_mode and g++ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105002 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [16:19:38] 06Machine-Learning-Team: [LLM] ML-lab benchmarking - https://phabricator.wikimedia.org/T382343#10409985 (10kevinbazira) **optimum-benchmark** In the meeting, we discussed finding a way to easily run the [[ https://github.com/huggingface/optimum-benchmark | HF optimum benchmark ]] on ml-lab. I built a tool to en... [16:55:25] (03CR) 10Ilias Sarantopoulos: [C:03+2] llm: add inference_mode and g++ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105002 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [16:59:22] (03Merged) 10jenkins-bot: llm: add inference_mode and g++ [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105002 (https://phabricator.wikimedia.org/T377848) (owner: 10Ilias Sarantopoulos) [17:26:38] (03PS1) 10Ilias Sarantopoulos: Revert "llm: add inference_mode and g++" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105039 [17:41:19] (03CR) 10Ilias Sarantopoulos: [C:03+2] Revert "llm: add inference_mode and g++" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105039 (owner: 10Ilias Sarantopoulos) [17:50:14] (03Merged) 10jenkins-bot: Revert "llm: add inference_mode and g++" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1105039 (owner: 10Ilias Sarantopoulos) [17:56:33] I reverted the previous patch cause I changed the wrong blubberfile [17:56:41] will fix it tomorrow [17:56:45] going afk folks, have a nice evening o/ [19:53:03] (03CR) 10Umherirrender: [C:03+2] "Resubmit" [extensions/ORES] (REL1_41) - 10https://gerrit.wikimedia.org/r/1103508 (owner: 10Libraryupgrader) [21:51:24] (03CR) 10Umherirrender: [C:03+2] "Resubmit" [extensions/ORES] (REL1_42) - 10https://gerrit.wikimedia.org/r/1103509 (owner: 10Libraryupgrader) [23:26:14] (03CR) 10Umherirrender: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1104146 (owner: 10Libraryupgrader) [23:49:33] (03CR) 10Umherirrender: [C:03+2] "Resubmit" [extensions/ORES] (REL1_43) - 10https://gerrit.wikimedia.org/r/1103510 (owner: 10Libraryupgrader)