[00:07:44] FIRING: LiftWingServiceErrorRate: ... [00:07:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [00:27:44] RESOLVED: LiftWingServiceErrorRate: ... [00:27:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:01:33] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Degraded RAID on ml-serve1001 - https://phabricator.wikimedia.org/T422382#11818956 (10klausman) I think we can run this machine one a single disk until its replacement arrives. Even if it dies entirely, we have enough serving capacity in eqiad to handl... [12:28:29] (03PS1) 10AikoChou: revscoring: add threshold to elapsed_time logging [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1270917 (https://phabricator.wikimedia.org/T416384) [12:34:15] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11819290 (10klausman) To clarify a basic assumption I have: gRPC only works over HTTP/2, and HTTP/2 is always TLS-encrypted, i.e. there is no way to speak gRPC ov... [12:52:48] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11819416 (10elukey) >>! In T421903#11819290, @klausman wrote: > To clarify a basic assumption I have: gRPC only works over HTTP/2, and HTTP/2 is always TLS-encryp... [13:14:46] 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11819575 (10isarantopoulos) [13:17:03] dpogorzelski, klausman o/ heads up: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1270873 - I need to test how clusters will work without cert-manager trying to renew certs, it is something that we'll need to do before rolling out the new PKI discovery intermediate (basically used by all LVS/k8s services) to limit the blast radius of potential problems. I haven't pulled the trigger for ml-staging since I didn't want to disrupt [13:17:03] testing, lemme know when it is a good time. I'll need half a day or more of cert-manager stop to verify that certs are not issued. [13:30:43] (03PS1) 10AikoChou: python/logging_utils: add configurable framework logger level overrides [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1270939 (https://phabricator.wikimedia.org/T416384) [15:24:27] Hi! [15:30:03] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11820342 (10klausman) After a clarifying chat with Luca about the intricacies of gRPC, HTTP/2 etc, I now have a better picture of what will need building (probabl... [15:40:47] Hi everyone. I'm an administrator on the Japanese Wikipedia (jawiki). To combat increasingly sophisticated vandalism, I realized that relying solely on Edit Filters (AbuseFilter) is no longer enough. I also noticed that the existing LiftWing models often produce false positives due to the unique linguistic structure of Japanese. [15:40:48] Because of this, I decided to do a Full Fine-Tuning (FFT) on the gemma-2b-jpn-it model. I built a dataset of about 700,000 items by extracting past reverted vandalism edits alongside high-quality constructive edits on JAWP, and trained the model on it. [15:41:16] Using Direct Preference Optimization (DPO) and some custom techniques, the model is now set up to append an "AI: Possible Vandalism?" tag to recent changes. I've started a trial run, and it's successfully detecting vandalism in near real-time. [15:41:17] You can see it in action at this link: https://ja.wikipedia.org/wiki/%E7%89%B9%E5%88%A5:%E6%9C%80%E8%BF%91%E3%81%AE%E6%9B%B4%E6%96%B0?hidecategorization=1&hideWikibase=1&tagfilter=AI%3A+Possible+Vandalism%3F&limit=500&days=30&urlversion=2 [15:41:17] If anyone is interested in this machine learning project, please feel free to send me a Wikimail! [16:04:46] Infinite0694: very interesting work! I'll let others to comment, but the main blocker may be the model's license https://ai.google.dev/gemma/terms [16:05:49] have you already checked https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language-agnostic_revert_risk or https://meta.wikimedia.org/wiki/Machine_learning_models ? [16:06:23] ah snap jp is not in the supported languages, but it may become? [16:38:00] "I believe this project fully aligns with the 'safe and responsible use of AI' principles set by Gemma and Google. In fact, it's arguably a model use case for open-source AI. However, I agree that publishing or redistributing the model weights could present some challenges. [16:38:01] You make a very valid point, though. Failing to carefully examine the TOS (Terms of Service) could lead to trouble later on. If this chat is active, I would be very interested in discussing this licensing aspect with you all." [16:47:56] "I have read through that page. In the long run, I think my approach could be applied to build language-specific models that can eventually be merged for multilingual Meta applications. [16:47:57] However, because I am not a native speaker of languages other than Japanese, I am not fully aware of the characteristic vandalism prevalent on other wikis. Furthermore, due to my own server and GPU constraints, I anticipate needing to rely on WMF's resources to scale this in the future. [16:48:00] P.S. The current accuracy of this AI patrol on the Japanese Wikipedia has been really promising so far!" [20:17:41] Infinite0694 That's really cool, I have passed on this info to our Wikimedia Foundation employee Slack room as well [20:35:13] thank you!!!!