[16:20:02] https://github.com/miraheze/RequestSSL/pull/40#pullrequestreview-2063857578 a:EyesSHAKE: [16:20:30] That actually not a bad review from a bot... [16:24:19] lol it caught a type of mine in method name... [16:31:26] AI code review? [16:31:29] 🧐 [16:32:03] Yep I quite like it lol [16:32:43] Hmmm [16:32:48] Can you run it on my PR? [16:33:06] I was just about to lol [16:33:21] It might blow up [16:33:26] 😉 [16:33:39] If I haven't hit the hourly rate limit trying to run on a bunch of old PRa yet lol [16:33:54] It’s my code. It’ll do that anywys [16:34:06] Is it specifically geared towards mediawiki? [16:35:03] Nope, but it reads the entire to find out what to do. [16:38:08] Hm, what’s the price and limit [16:38:36] Its free for public repos [16:40:13] [1/2] https://github.com/coderabbitai/ai-pr-reviewer [16:40:13] [2/2] Is a slightly older version of the code [16:40:29] I assume chat gpt limits apply since it pipes the review through there [16:41:59] Hm. Wonder if they could use copilot idk if it tends to be better or not [16:43:07] I wish it had options to chose which model [16:43:13] Claude > any other model [16:43:47] Hm? [16:46:55] @cosmicalpha it makes a poem with its code reviews [16:46:58] I love this thing [16:47:29] [1/8] >>> 🐰✨ [16:47:30] [2/8] In the land of wikis, where comments flow, [16:47:30] [3/8] A new rule appears to help them grow. [16:47:30] [4/8] No more repeats, no echoes in the hall, [16:47:30] [5/8] Each word unique, standing tall. [16:47:31] [6/8] So let your thoughts be fresh, anew, [16:47:31] [7/8] In the wiki world, for me and you. [16:47:31] [8/8] 🌟📜 [16:50:15] What? [16:50:28] What is that from? [16:53:25] This is the review on CA’s PR for CW to prevent duplicate comments [16:54:02] Erm [17:30:27] can code review by an AI be truly trusted? [17:31:12] No [17:50:42] No one said it’s the only thing we are relying on [17:53:48] [1/62] On a dark office evening, [17:53:48] [2/62] Sat down in my chair. [17:53:49] [3/62] Sharp smell of stale coffee [17:53:49] [4/62] Circling round in the air. [17:53:49] [5/62] Suddenly on the webpage [17:53:50] [6/62] There came a flickering light. [17:53:50] [7/62] My head grew heavy, and my sight grew dim; [17:53:50] [8/62] I had to stop for the night. [17:53:50] [9/62] There it was in the link list: [17:53:51] [10/62] "Edit page; you'll do well" [17:53:51] [11/62] And I was thinking to myself: [17:53:52] [12/62] This could be Heaven or this could be Hell! [17:53:52] [13/62] Then it lit up the quickbar, [17:53:53] [14/62] And it showed me the way. [17:53:53] [15/62] There were pages begging for clean-up; [17:53:54] [16/62] I thought I heard them say: [17:53:54] [17/62] Welcome to the Hotel Wikipedia [17:53:55] [18/62] Such a lovely place [17:53:55] [19/62] So much empty space [17:53:56] [20/62] Plenty of work at the Hotel Wikipedia [17:53:56] [21/62] Any time of year [17:53:57] [22/62] You can find us here... [17:53:57] [23/62] Its structure's maze-passage twisted; [17:53:58] [24/62] No one knows where it ends. [17:53:58] [25/62] It's got a lot of money mirror sites, [17:53:59] [26/62] That it calls friends. [17:53:59] [27/62] And in the dance of the pages [17:54:00] [28/62] Editors sweat - [17:54:00] [29/62] Some change to remember, [17:54:01] [30/62] Some change to forget. [17:54:01] [31/62] So I chose Contributions, [17:54:02] [32/62] Tell me, what have I done? [17:54:02] [33/62] And it said: [17:54:03] [34/62] This is all that you've been good for, here, [17:54:03] [35/62] since two thousand and one. [17:54:04] [36/62] And still those pages beg changes [17:54:04] [37/62] From far away, [17:54:05] [38/62] Keep you up in the middle of the night [17:54:05] [39/62] Just to hear them say... [17:54:06] [40/62] Welcome to the Hotel Wikipedia [17:54:06] [41/62] Such a lovely place [18:00:44] I LOVE HOTEL WIKIPEDIA [18:02:10] https://meta.wikimedia.org/wiki/Hotel_Wikipedia [18:08:26] One song Miraheze karaoke [18:16:12] Time to set up Hotel Miraheze for a quick extra source of income [18:16:16] Who wants to be our first investor? 😄 [18:31:22] Me [18:33:15] Yeah they definitely wont be relied upon but it does provide some pretty decent feedback [18:33:32] https://github.com/miraheze/CreateWiki/pull/511#pullrequestreview-2064141701 @pixldev [18:33:38] Oh boy [18:34:03] nah the feedback on my RequestSSL PR was pretty bad [18:34:13] "make sure you actually are using the services" [18:35:10] The comment about the User::newSystemUser doesn't apply [18:35:39] https://github.com/miraheze/CreateWiki/pull/511#discussion_r1605409202 but I did though.. :( the XSS one is good assuming it’s right mediawiki don’t escape stuff [18:35:44] huh? It is a step by step. It does apply actually and something I'd recommend myself. Duplicating code is bad. Using variables to store it once is good. [18:36:03] Also if canEditRequest() a real function I just missed [18:36:14] due to how the logic works it will only ever be called once anyway [18:36:27] No it is saying to make it [18:36:32] so there are no multiple instantiations [18:36:56] the code also is checking for DNS failures [18:36:57] Ah [18:37:01] 1/10 review [18:37:02] Well yes that isn't the problem. Readability is what it was asking. [18:37:24] Also it seems confused on the purpose. It thinks it’s for all users not just reviewers [18:37:33] "to avoid multiple instantiations" https://github.com/miraheze/RequestSSL/pull/40#discussion_r1605255611 [18:37:53] Tell it that it on comments, it will learn in the org btw [18:38:01] I assume it’ll become more familiar with the mediawiki codebase as we yell at it yeah [18:38:49] Yes it can learn and I'm building custom instructions for it also to train it on MW standards. [18:39:05] Just like me [18:39:51] "and improve code clarity." that is the part I agree with I agree with you it isn't 100% right on the other parts. [18:40:53] [1/8] >>> 🐇 [18:40:53] [2/8] A threshold now we set with care, [18:40:53] [3/8] To manage wikis, fair and square. [18:40:54] [4/8] A warning comes when counts are high, [18:40:54] [5/8] To keep our requests from reaching the sky. [18:40:54] [6/8] With thoughtful code and messages clear, [18:40:54] [7/8] Our system runs with less to fear. [18:40:55] [8/8] 📝✨ [18:40:58] This alone is enough to keep it ngl [18:45:44] Hmm the XSS thing it found seems to be valid though @pixldev since it uses raw => true, it would be hard to produce but still technically valid I believe. [18:47:15] Can a username trigger XSS? [18:47:31] But ye better to be safe [19:57:40] yeah I pretty much have a zero tolerance policy for LLMs so forgive me for being extremely firm for expressing my dissent here [19:58:38] LLMs waste an insurmountable amount of physical resources and have literally no ability to understand any subject matter, only spew out tokens, and any good feedback is hidden among a sea of bad or subtly wrong feedback [19:59:17] testing on small PRs that are easy to review is not a great test; LLMs are best when problems are within their extremely limited knowledge set. go outside that even slightly and I'm sure it'll spew nonsense [20:00:14] have genuinely wanted to get more involved on the tech side but had an enormous amount of stuff crop up IRL which is why I haven't even really paid attention to these channels >< [20:00:49] so sorry for starting off being firm and negative [20:01:59] The tool is simply that, a tool. it won't replace developer due dilligence, but it will help in pointing out where things could be improved. [20:03:08] I think that labelling it a tool is incredibly misleading because it is categorically unable to do what it claims to do [20:03:27] LLMs cannot analyse anything, end of story [20:03:50] They can analyse, the issue imo is with the comprehension of it [20:04:10] although at this level the exact term dont matter [20:05:02] no, saying that LLMs can comprehend or analyse anything is categorically misunderstanding what LLMs do. they spew out a probabilistic sequence of tokens that immediately follow a previous sequence of tokens. just because these tokens resemble suggestions doesn't mean that they have any understanding of the actual context at hand [20:05:04] that is your opinion. [20:05:11] nobody asked it to analyse everything. [20:05:26] that's literally what it is claiming to do, analyse the code and provide feedback [20:05:39] like what is it doing if not analysis [20:05:41] theres a difference between analysing a commit and "analysing everything" [20:05:52] apparently chatgpt has been taught to generate valid wikitext templates... [20:06:07] or at least the code works [20:06:10] Like I've just said, its to assist with picking out small things that could be improved, not write MediaWiki 1.44 [20:06:42] I mean, you have to understand the project and everything it does in order to provide useful feedback. maybe not everything, all the time, but if you want it to provide feedback for any arbitrary PR, it will have to do that over the course of every PR submitted [20:07:24] [1/2] > it will have to do that over the course of every PR submitted [20:07:24] [2/2] good job it doesn't need sleep then. [20:07:35] I don't see how that nullifies any of my points [20:07:50] a single human who knows nothing of the project could provide feedback to every PR but I would say that's more harmful than helpful [20:08:19] Feedback that's wrong can be ignored [20:08:39] Because developers make mistakes sometimes. I really don't know how you are not understanding, its to pick up on small things that can be changed. [20:08:55] I knew like nothing about the CW codebase and still managed to catch an issue that flew by both the Director and Deputy director of tech [20:08:58] my entire point is it literally cannot pick up on the small things. the small things are exactly the things it misses [20:09:05] it picks up big things only [20:09:16] We use tools all of the time to gain insight [20:09:36] All have to be trialed [20:09:43] thats your opinion but I'm clearly talking to myself so I'm not engaging with the conversation anymore [20:09:48] Some we scrap, some we keep [20:10:18] We also need to see if it get's any better when we feed it more data about the codebase [20:10:22] I don't think dismissing it because it includes AI is a good idea [20:10:36] it's not my opinion that LLMs cannot do things. it is an objective fact about how they work. the fact that people claim that they're "analysing" or "understanding" or "comprehending" or "responding" to anything is an anthropomorphism done intentionally to market them as a valuable tool [20:11:32] if it were simply a harmless thing, I would not be responding so strongly. but Microsoft alone was responsible for 30% of all global water usage in 2023 cooling server racks in data centres training AI. all for just, ChatGPT to maybe, sometimes provide feedback to your code [20:11:40] I agree it has its limitations [20:11:45] clearly has not been listening, the tool has already caught things that were missed... [20:12:12] But it has provided additional insight and detected things that have been missed [20:12:35] were they missed? or did everyone just ignore the PR because it was labelled as incomplete, waiting for it to be finished before providing feedback? [20:12:56] like, that's literally why I didn't even bother looking over it. it was labelled as partial and incomplete [20:13:07] They were missed [20:13:31] I think that's a strong assertion, since in the absence of someone proving they had reviewed the PR already, we cannot possibly know whether someone would have caught the mistakes [20:13:32] The code on @reception123's PR is mostly duplicated from another script [20:13:44] I have reviewed it before [20:13:53] okay, I did not know this, so, that's fair [20:13:56] [1/4] To separate out intertwined lines of debate here, your complaint seems to be threefold: [20:13:56] [2/4] * LLMs are resource-intensive and wasteful [20:13:56] [3/4] * LLMs should not be used as a substitute for human oversight [20:13:57] [4/4] * LLMs underperform human review/give limited benefit [20:14:14] I have been meaning to move it to a python package to avoid the duplication [20:14:17] thank you, NotAracham. that summarises my points nicely [20:14:37] And @pixldev's was reviewed [20:14:49] It did catch a valid XSS [20:14:52] again, I'm not denying the fact that it can provide good feedback sometimes. I am denying the fact that it will provide more benefit than harm over the course of its usage [20:14:54] yeah [20:15:11] We can't decide that on a single PR [20:15:20] Your view is prejudicial [20:15:27] I am categorically stating that all LLM/ChatGPT-based tools will provide more harm than benefit over the course of their usage [20:15:35] Yes you are [20:15:41] And I disagree [20:15:46] They have a place [20:16:03] you're saying that 30% of all global water usage is enough to justify its current, extremely limited benefits? [20:16:05] We can't stop big mic from running all these AI's and there's not really any harm directly to us. I am not going to argue if AI may have a net negitive on the world but frankly we can't do much bout it here [20:16:26] Where did you read that [20:16:59] The water is being wasted anyways, if anything it would be more wrong to let it be wasted for nothing instead of using it, even if we'd rather it not be wasted at all [20:17:08] that's a pretty defeatest mentality [20:17:18] if something doesn't provide a lot of benefit you can choose to just not use it on principle [20:18:07] Ethics and sustainability are worth considering [20:18:10] [1/5] I'm very much agreed on the first two points of debate, but my thoughts [20:18:10] [2/5] * For most models, the performance per watt is improving substantially but there's absolutely a long road to go towards anything I'd say is ethical consumption [20:18:10] [3/5] * I would like more information on this water usage element, not quite aligned with my understanding of resource intensiveness... [20:18:11] [4/5] * Non-profit status is at least negating misuse of funds, as our usage today is free to my understanding [20:18:11] [5/5] * At least at present, intended use is not as a replacement for human oversight (or wholesale code creation) but as another tool to ensure safe and performant code goes out the door. [20:18:20] But this is a very black and white debate at the moment [20:18:32] okay, I definitely was wrong there, I should be using proper statistics [20:18:39] microsoft increased its own water usage 30%, not globally [20:18:48] Coderabbit pro is free for Open source [20:18:55] that's far less horrific [20:19:17] however, they did use 22 billion litres of water in 2022, which is still an extremely large amount [20:19:31] Still worth keeping in mind, though I'd like to understand nature of use, e.g. if it's for cooling purposes is it actually destructive use? [20:19:38] global levels, however, are on the order of 10^15 L/year [20:19:44] whereas 22 billion is 10^9 [20:19:52] Destructive use of water isn't really possible [20:20:06] uhhhh, thermodynamics would like to have a word [20:20:26] freshwater is definitely expendable [20:21:01] and note that this is also used to offload heat, which ultimately ends up in the atmosphere, contributing to climate change [20:21:07] Poor choice of terms on my part, but what I mean is things like industrial usages where heavy remediation is required to return output to something approaching potable. [20:21:37] right, and note that heat is arguably more concerning since 22 billion litres of water has… a massive thermal capacity [20:23:12] [1/2] again, I've been arguing so strongly on this because all of the large companies have been undergoing a massive campaign to make people believe that their exponentially increasing resource use on LLMs is creating "AI", something that can actually think and process things, which is categorically untrue. and I do not want to cede any ground on this, because all it does is feed into their s [20:23:13] [2/2] cams and profits and resource waste [20:25:20] [1/3] Agreed that most AI claims are crypto-level scammy on what it can actually do versus claims, but I have to get back to other things this afternoon. [20:25:20] [2/3] Neat relatively-recent paper on resource usage of LLMs for those that find the topic interesting and aren't scared off by studies. [20:25:21] [3/3] https://arxiv.org/pdf/2310.03003 [20:26:19] (notwithstanding microsoft increasing water usage, its not like its going to waste, the water will be cooled and used again) [20:26:25] the LLM isn't drinking the water [20:26:39] The bits, they are thirsty. [20:26:58] [1/2] dont wanna interrupt this but just to check, where would the ->escaped() go, idk if it may mess with the parse() [20:26:58] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1241124827025375302/XJ0Vi6q.png?ex=66490f11&is=6647bd91&hm=2eb10638ab915bab3df4edf2915ed4cefae2ddbe156e45e76d071f74dee97685& [20:27:05] nah, you can interrupt [20:27:27] you wouldn't use parse and escape together [20:27:27] should even tbh [20:27:28] one or the other [20:27:47] hmmm [20:28:14] what would the right way to parse but also escape any possible bad chars in a username be? [20:29:18] You would generally use "parse" on `Html::rawElement` [20:29:45] which woud take care of passing it through sanitizer [20:29:50] oh [20:30:18] https://cdn.discordapp.com/attachments/1006789349498699827/1241125668755079188/8CXtc2h.png?ex=66490fda&is=6647be5a&hm=8c7ad9abf88e1ae56f54cab8f4526e65d9433e7e60e710d8ae397bbec413524a& [20:31:01] [1/4] you only generally need `->escaped()` if you're constructing your own html like: [20:31:01] [2/4] ```$html = "

This is a text block, hello" . wfMessage('whatever', $user)->escaped() . "/pre; [20:31:01] [3/4] ``` [20:31:02] [4/4] or whatever, that is a horrid example [20:31:29] also, sorry for the debate that was more heated than productive [20:31:57] although I will still stand by my points, I don't think it's productive to argue them more here, especially at the moment [20:40:52] [1/2] also as an aside, I should probably hold myself more accountable here instead of just putting this off, but one of the things I was hoping to go about doing is a bigger deep-dive into the tech stack for MH/WT and actually putting stuff up on the tech wiki in more detail. we have a lot of sparse documentation and it's not really linked directly to the code, and there's also a lot of stuf [20:40:52] [2/2] f that's probably not documented at all, and I was hoping to try and make a dent on what we have so it's easier for folks to contribute to stuff [20:41:00] that's a pretty large project though, which is why I've hesitated to get started [20:49:42] there's a certain point where that can become actually counterproductive in terms of what should be public information [20:50:03] how so? I thought effectively all of the wiki stack was public information, since even the Puppet config is open-source [20:50:26] like I'm not gonna put admin passwords on the wiki, especially since I don't have them [20:50:46] I disagree tbh, obviously there's private documentation and credentials [20:50:53] but over all I see no harm [20:51:03] (I am completely omitting credentials here, which feels obvious to say, but will say regardless) [20:51:37] tech stack info should honestly just be for tech folks [20:52:16] (that is why it goes in the tech namespace :p) [20:55:59] Absolutely a worthy endeavor, it's a hard thing to get volunteer hours for but makes lives easier for all [20:56:20] yeah, although my excuse right now is I'm still unemployed, for better or worse [20:56:22] :p [21:01:00] That’s your opinion. I disagree, and as a farm and technology team we do it differently [21:01:35] This goes against our fundamental principles [21:01:46] yeah, as mentioned, most of the stack's info is made public; the means to modify the stack is still gated behind trusted individuals, but what it does is pretty open [21:02:49] not to mention that if things were more documented, it would help individuals hosting their own wikis learn more about some of the things required for scaling up to a larger size, even though the jump to a wiki farm is not a jump many can make [21:09:37] @pixldev@rhinosf1 noted [21:12:35] https://tenor.com/bVclm.gif [22:41:24] [1/2] @notaracham ... Can you have a look at this? [22:41:24] [2/2] https://meta.miraheze.org/wiki/Community_portal#Wiki_request_contains_%22invalid_character%22 [23:51:14] @koreirose @notaracham what is the issue there? [23:51:39] gm? [23:52:08] Not sure why you were pinging collei, but linked user above was getting an invalid character any time they tried to submit a wiki request, no matter which way they went about it. [23:53:07] I've created the wiki on their behalf as they've already explained what they wanted sufficiently, but that mostly just works around the problem. [23:54:37] lol sorry darn auto select [23:55:45] Sorry for random ping, Collei