[09:24:11] @jr_mime is your code reviewable? How is it setup? Would you be hosting it? [09:25:06] My software/system accreditor hat has many questions [11:38:53] The "frontend" is private, the ML part is called Cynthia, also private, but it uses ORES, you can Google that as I can't post links [11:39:03] And yes we will be hosting it [11:41:15] Frontend is private can do one, either it be as open source as possible or it can stay as far a way as possible. There is no reason to not open source. And why ORES? ORES is legacy software [11:41:38] The revscoring models were moved to LiftWing [11:42:36] Mainly because it's not sanitized and would reveal some old methods of how I was calculating scores, which I want to get rid of anyway [11:42:53] This setup is old, liftwing didn't exist back then [11:42:55] It has old code is not an excuse [11:43:05] @jr_mime ORES is gone come the end of the year [11:43:15] You need to migrate like now [11:43:32] I poked the guy that manages the ML part about it [11:43:51] I will not endorse any system being used anywhere near globally that isn't something we can audit the code of [11:44:30] There's also a number of wikis I imagine we'd opt out [11:44:42] @jr_mime [11:45:34] I can certainly grant whoever needs access to it, and wikis being opt out is no problem either [11:45:54] Can just filter them out from the IRC feed [11:47:49] @jr_mime it should be open source by default [11:48:11] But id like to see a review of the actual code [11:48:26] I don't like the sound of something involved with fandom and not transparent [11:48:42] Ahh I get you lol [11:48:51] I'm quite happy to look at the code though and review it [11:49:16] Yeah no problem [11:49:26] I'd also rather see it showing data first [11:49:39] Like can we see some data of your scores and whether an edit was reverted [11:49:44] So we can see it will work [11:50:00] @jr_mime what is the code on so you can give me access [11:50:20] I'll look after I've ran some errands [11:51:48] [1/2] On fandom, we only revert IPs that hit a specific threshold, so it wouldn't do this here yet, it's not as advanced as wikimedia sadly. [11:51:49] [2/2] We can still run it and score edits for sure [11:52:07] It's typescript for frontend, python for ML/ores [11:52:21] Well whatever you are using, we'll need to have it run dry first [11:52:33] But I mean where the code hosted? [11:52:42] Oh github [11:52:43] So I can give you a username to add [11:52:50] Use rhinosf1 then [11:53:14] Cynthia actually only uses revscoring - we've implemented our own version of ORES for serving the API endpoints. (The API endpoints are internal) [11:54:13] We won't be able to move our own model to LiftWing, as it's very MediaWiki specific and it connects directly to their data lake. So we will have to stick to revscoring and self-hosting our model. [11:54:16] @noreplyz ah, so you are just using the revscoring model [11:54:24] That's fine [11:54:31] We trained our own model 😄 [11:54:32] but yes [11:54:36] Ok [11:55:19] @noreplyz are you on your PC [11:55:32] I will look at the code, the python I'll probably do a detailed review [11:55:40] The typescript probably just static analysis a [12:01:49] [1/2] One thing to note about the typescript is that I'm in no capacity to be a real programmer, this is just a hobby of mine, so the code is most likely unbearable to you. It's using things it probably shouldn't (like messenger instead of sockets), so much code repetitions, functions that aren't built correctly, but it works. I'll be setting up a new node process as I call them, which wou [12:01:49] [2/2] ld just connect to IRC, create the msg embed like we have in main.ts, and then call the Cynthia service to get the scores. The wikigg.ts process is a great example of what simple setup we'll setup at first (not reading RC obviously) [12:02:01] Now I can't seem to invite you on mobile, give me a few mins [12:02:34] @noreplyz can you give him read to ores? [12:02:50] If you want to, your code 🙂 [12:03:06] The Cynthia code is here: github.com/soap-team/cynthia/tree/master/api though we will not be exposing the API to the public. [12:18:53] @rhinosf1 I sent you an invite to the Xiphos repo, that would be the "frontend" as I call it, or more like the processing/discord bot [12:43:42] @jr_mime no CI is a blocker [12:43:57] What's CI? [12:45:25] @jr_mime continuous integration [12:45:30] Testing & linting [12:45:39] Right got it [12:45:46] I can run static analysis now of your code [12:45:54] But it should be ran on commit [12:46:05] Yeah no worries I see [12:47:09] It was worth it to check if there was interest anyway. I don't plan on going that advanced with this, so I guess that would be where we don't implement this [12:49:47] @jr_mime CI is generally easy to implement and it prevents you introducing bugs [12:54:58] I'm looking at it [13:09:08] @jr_mime or @noreplyz can you approve me running the synk app [13:13:10] and sonarcloud [13:19:27] @noreplyz please DM [13:19:40] Np, lemme start a group chat [13:38:00] A discord bot posting to a feed channel not hosted by us is LOW risk. There's no attack vector really. In terms of code written to the standard of being used as on-wiki bot / OATHAuth tool / or anything that has any risk vector, I would say it is HIGH risk and would recommend againt any expansion beyond a read only feed. [13:58:39] Hey @CVT - Sorry for ping, quick question, how many edits do you rollback per day/week which are deemed to be vandalism? [14:14:16] Hi, I'm Global sysop, i do about 100 rollbacks a week [14:15:11] Are they obvious vandalism, like inserting bad words, blanking of pages, replacing words with those "funny" kid words? [14:30:49] [1/3] hmm... [14:30:49] [2/3] Much of the vandalism I find is simple and kids-ish [14:30:49] [3/3] Note, however, that Hoaxing vandalism, Subtle vandalism, and Format vandalism etc... are mostly addressed by local administrators and are rarely found by CVT [14:32:31] Got it, so would you think it would be hard for machine learning to find the vandalism you actually deal with? [14:33:02] If it's simple/kids-ish, might be hard for a bot to see it as vandalism as it doesn't have enough information to check the whole thing, that's what I'm thinking would be the problem here [14:47:34] I believe that more than half of kids-ish vandalism can be detected by machine learning, but there are many wikis on Miraheze that contain distinctive content, so false positives may occur frequently [14:48:27] kk perfect thanks