[18:09:27] Amir1: regarding Spam-blacklist and the perf-improved AbuseFilter version, whats our answer for the mediawiki distributions out there? [18:09:37] Afaik SpamBlacklist is very wildely installed and defaults to Meta for its source [18:09:49] if we trim that down to avoid double filtering, that would break third parties right? [18:10:18] "This extension comes with MediaWiki 1.21 and above. " [18:11:29] https://www.mediawiki.org/wiki/Extension:SpamBlacklist#Examples [18:11:39] enwiki as well. those are the most commonly recommended ones [19:15:54] Krinkle: I think we can announce it in mediawiki-l if needed, but I don't know how widely it's used. I know many websites have their own. Worst case, they can copy paste the old version to their setups [19:16:28] Amir1: We don't have to break it though. Afaik enwiki/metawiki haven't migrated to AbuseFilter yet. [19:17:01] If we hold back migation on those wikis until the regex mode is ready in AF as well, then we can do a clean switch and leave the pages as-is for compat with LTS MediaWiki deployments. [19:17:17] SpamBlacklist is bundled by default, so in theory this is protecting upto each of the 10,000+ MW installs out there. [19:18:35] I'm planning to add support for regex soon, so the rough estimate is that I will deploy the change in two or three weeks [19:18:45] cool. [19:19:01] I honestly don't think it's feasible to hold this change for years for LTS installations to switch [19:19:13] they can copy paste [19:19:17] I do think we still need an announcement indeed and some more brainstorming about the default MW tarball release and existing sites. possibly different strategies for unmodified installs vs sites that have their own blacklist. [19:19:37] we can keep the Spam-blacklist pages protected and frozen as-is, no? [19:19:43] Will announce it [19:20:19] It'll be confusing, admins might edit it, and as long the software is deploying, it's going to do the slow regex on every edit [19:21:00] We should be in a shape where the next tarball comes with AbuseFilter enabled to consume meta's (then new) entries by default. And to turn off SpamBlacklist automatically on upgrade as well. It migh be easiest to hollow out the extension repo for SpamBlacklist since afaik we have no strategy for removing an extension on an existing install. [19:21:02] if you have some data that is widely used, sure. We can find a way. [19:21:34] I assume, they'll make API calls? [19:22:02] The tricky case is people who have custom blacklist entries. They'll lose their protection on-upgrade by default until they migrate into the new system. [19:22:13] Amir1: it's bundled and enabled by dedfault with every MW tarball install. [19:22:25] I think we'd need data to prove the inverse in this case. [19:23:28] https://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1 is the default source [19:24:02] We can simply undeploy it and move it out of tarball and let it people migrate at their own pace [19:24:09] I'll query hadoop for that [19:25:25] I assume we'll want to keep MediaWiki distributed with spam protection by default, which would mean AF needs a config var to source its MediaWiki:.json file from Meta-Wiki like SpamBlacklist does [19:25:49] and similarly a 15-min cache for it, although I think we already have enoguh caching to not need an additional cache [19:26:21] maybe a CACHE_ANYTHING entry given it's an http download, so we'd want it cached even if all otehr cache is off since cache is off by default indeed. [19:26:29] (akin to SpamBlacklist today) [19:26:36] or MainStash, probably simpler [19:34:16] I looked at reqs, maybe I'm doing it wrong but we haven't had one single req in yesterday [19:34:19] https://www.irccloud.com/pastebin/gJG8sQ8x/ [19:35:46] Amir1: path is without query [19:35:51] path = /w/index.php [19:36:05] ah, let me do it again [19:36:52] seems to be about 50K per day [19:37:03] in Turnilo https://w.wiki/72vq [19:37:13] (400*128) [19:37:25] https://usercontent.irccloud-cdn.com/file/rohoey1U/Screenshot%202023-07-13%20at%2020.36.17.png [19:37:38] Typing Spam immediately completed the full sb_ver=1 url query [19:37:47] hmm, not too small, not too large [19:38:29] it's once per 15 minutes, all MW installs, for windows where there are non-zero edits during that time window. [19:39:09] I'll do an announcement and then keep it frozen with a massive comment on the top [19:39:48] grouping by ip shows most don't fetch more than 5 times per day. [19:40:03] and some less than once per day if not edited I guess [19:50:19] https://meta.wikimedia.org/wiki/MediaWiki:Editnotice-0-Spam_blacklist [19:50:20] :) [20:05:25] I've also created https://meta.wikimedia.org/wiki/MediaWiki:Blockedexternaldomains-summary (prepend to Special:Blockedexternaldomains) and https://meta.wikimedia.org/wiki/MediaWiki:Editnotice-0-BlockedExternalDomains.json [20:10:39] Thanks. I will see what I can do to make this less of an issue [22:19:12] duesen: From today's train triage, this fatal may be relevant for your team to help gain familiarty. ref T341595. Lucas has suggested two possible fixes already. Is locally reproducible/verifiable, too, I think (incl via ApiSandox GUI). [22:19:13] T341595: TypeError: Argument 2 passed to ContentModelChange::doContentModelChange() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.16/includes/api/ApiChangeContentModel.php on line 82 - https://phabricator.wikimedia.org/T341595 [23:32:44] Krinkle: thank you for being so mindful of the non-WIkimedia wikis and [[meta:Spam blacklist]] :)