[00:07:44] <jinxer-wm>	 FIRING: LiftWingServiceErrorRate: ...
[00:07:44] <jinxer-wm>	 LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate
[00:27:44] <jinxer-wm>	 RESOLVED: LiftWingServiceErrorRate: ...
[00:27:44] <jinxer-wm>	 LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=itwiki-damaging-predictor.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate
[12:01:33] <wikibugs>	 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Degraded RAID on ml-serve1001 - https://phabricator.wikimedia.org/T422382#11818956 (10klausman) I think we can run this machine one a single disk until its replacement arrives. Even if it dies entirely, we have enough serving capacity in eqiad to handl...
[12:28:29] <wikibugs>	 (03PS1) 10AikoChou: revscoring: add threshold to elapsed_time logging [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1270917 (https://phabricator.wikimedia.org/T416384)
[12:34:15] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11819290 (10klausman) To clarify a basic assumption I have: gRPC only works over HTTP/2, and HTTP/2 is always TLS-encrypted, i.e. there is no way to speak gRPC ov...
[12:52:48] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11819416 (10elukey) >>! In T421903#11819290, @klausman wrote: > To clarify a basic assumption I have: gRPC only works over HTTP/2, and HTTP/2 is always TLS-encryp...
[13:14:46] <wikibugs>	 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11819575 (10isarantopoulos)
[13:17:03] <elukey>	 dpogorzelski, klausman o/ heads up: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1270873 - I need to test how clusters will work without cert-manager trying to renew certs, it is something that we'll need to do before rolling out the new PKI discovery intermediate (basically used by all LVS/k8s services) to limit the blast radius of potential problems. I haven't pulled the trigger for ml-staging since I didn't want to disrupt 
[13:17:03] <elukey>	 testing, lemme know when it is a good time. I'll need half a day or more of cert-manager stop to verify that certs are not issued.
[13:30:43] <wikibugs>	 (03PS1) 10AikoChou: python/logging_utils: add configurable framework logger level overrides [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1270939 (https://phabricator.wikimedia.org/T416384)
[15:24:27] <Infinite0694>	 Hi!
[15:30:03] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Investigate enabling gRPC in LiftWing model servers - https://phabricator.wikimedia.org/T421903#11820342 (10klausman) After a clarifying chat with Luca about the intricacies of gRPC, HTTP/2 etc, I now have a better picture of what will need building (probabl...
[15:40:47] <Infinite0694>	 Hi everyone. I'm an administrator on the Japanese Wikipedia (jawiki). To combat increasingly sophisticated vandalism, I realized that relying solely on Edit Filters (AbuseFilter) is no longer enough. I also noticed that the existing LiftWing models often produce false positives due to the unique linguistic structure of Japanese.
[15:40:48] <Infinite0694>	 Because of this, I decided to do a Full Fine-Tuning (FFT) on the gemma-2b-jpn-it model. I built a dataset of about 700,000 items by extracting past reverted vandalism edits alongside high-quality constructive edits on JAWP, and trained the model on it.
[15:41:16] <Infinite0694>	 Using Direct Preference Optimization (DPO) and some custom techniques, the model is now set up to append an "AI: Possible Vandalism?" tag to recent changes. I've started a trial run, and it's successfully detecting vandalism in near real-time.
[15:41:17] <Infinite0694>	 You can see it in action at this link: https://ja.wikipedia.org/wiki/%E7%89%B9%E5%88%A5:%E6%9C%80%E8%BF%91%E3%81%AE%E6%9B%B4%E6%96%B0?hidecategorization=1&hideWikibase=1&tagfilter=AI%3A+Possible+Vandalism%3F&limit=500&days=30&urlversion=2
[15:41:17] <Infinite0694>	 If anyone is interested in this machine learning project, please feel free to send me a Wikimail!
[16:04:46] <elukey>	 Infinite0694: very interesting work! I'll let others to comment, but the main blocker may be the model's license https://ai.google.dev/gemma/terms
[16:05:49] <elukey>	 have you already checked https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language-agnostic_revert_risk or https://meta.wikimedia.org/wiki/Machine_learning_models ?
[16:06:23] <elukey>	 ah snap jp is not in the supported languages, but it may become?
[16:38:00] <Infinite0694>	 "I believe this project fully aligns with the 'safe and responsible use of AI' principles set by Gemma and Google. In fact, it's arguably a model use case for open-source AI. However, I agree that publishing or redistributing the model weights could present some challenges.
[16:38:01] <Infinite0694>	 You make a very valid point, though. Failing to carefully examine the TOS (Terms of Service) could lead to trouble later on. If this chat is active, I would be very interested in discussing this licensing aspect with you all."
[16:47:56] <Infinite0694>	 "I have read through that page. In the long run, I think my approach could be applied to build language-specific models that can eventually be merged for multilingual Meta applications.
[16:47:57] <Infinite0694>	 However, because I am not a native speaker of languages other than Japanese, I am not fully aware of the characteristic vandalism prevalent on other wikis. Furthermore, due to my own server and GPU constraints, I anticipate needing to rely on WMF's resources to scale this in the future.
[16:48:00] <Infinite0694>	 P.S. The current accuracy of this AI patrol on the Japanese Wikipedia has been really promising so far!"
[20:17:41] <inflatador>	 Infinite0694 That's really cool, I have passed on this info to our Wikimedia Foundation employee Slack room as well
[20:35:13] <Infinite0694>	 thank you!!!!