[06:09:55] I have this really weird situation where MW seems to have a hash collision on two pages; it shows the exact same page on both, and when you purge cache on one of them, they switch places if you purged the one that was not displaying [06:10:09] tried disabling opcache, did not help [06:11:22] is correct [06:12:18]
is not [06:13:57] wait, it seems weirder than I thought [06:15:52] is the parser cache key the same? That would be super weird [06:16:20] do you have a link to the wiki? [06:16:49] bawolff: hexactly, the key name is the same [06:17:22] bawolff: I am hosting this and people complained yesterday about other pages, but today the triumphant examples are https://azurlane.koumakan.jp/Kaga and https://azurlane.koumakan.jp/Belfast [06:18:04] both pages show [06:18:24] this is MW 1.36.1, I have yet to upgrade to .2 but I doubt that is the reason [06:19:00] I updated PHP just now (to 7.4.25) and completely restarted the jail running php-fpm [06:20:23] Both show the same history and raw text, so its unlikely to be parser cache's fault [06:20:45] bawolff: let me hit purge cache [06:20:57] oh interesting, https://azurlane.koumakan.jp/w/index.php?title=Kaga&action=info gives an exception [06:20:59] bawolff: try both now [06:21:39] oh exceptions are always nice, going to enable monolog [06:21:58] Maybe something wrong with the text table or however we store revisions now [06:23:19] Interesting https://azurlane.koumakan.jp/w/index.php?title=Belfast&action=info does not show an exception, but it does show a page_id of 912, which based on https://azurlane.koumakan.jp/w/index.php?curid=912 is Kaga [06:23:34] yeah I think there is a schema error again [06:24:25] however Kaga should be page id 479 [06:24:42] yeah, i agree, seems likely something is messed up in the schema [06:25:53] there is a whole ton of "ParserCache.WARNING: Duplicate get(): "azurlane_wiki-mediawiki-:pcache:idoptions:37404" fetched 2 times" which seems to indicate that the parser cache table does not have proper constraints? [06:27:15] the exception is "Revision 1075 belongs to page ID 479, the provided Title object belongs to page ID 912" [06:28:28] * Remilia goes to SQL dump this [06:28:36] for backup purposes [06:28:46] https://azurlane.koumakan.jp/w/api.php?action=query&titles=Belfast|Kaga&prop=info|revisions&rvprop=ids|flags|size|slotsize|sha1|slotsha1|comment|roles has different sha1's, so i think the revision table is ok [06:29:14] Maybe revision table has both pointing to the same text table id [06:34:54] Remilia: So i think the field to look at is the content.content_address in the DB [06:35:27] bawolff: thank you, going to check that [06:35:36] the revisions table seems fine [06:36:09] It's probably in the format of tt: [06:37:38] although i'm not sure what could have happened to make all of them wrong for a page [06:40:40] content_address for these SHAs differ [06:40:43] s [06:41:36] I took the last SHA checksums from the revision table for the two page IDs in question and select'd where content_sha1=... [06:42:25] huh. Weird. I'm not sure what's happening. [06:42:55] I guess text table could have multiple entries that are the same [06:43:38] To rule out weird caching issue, you could try setting $wgMainCacheType = CACHE_NONE; $wgParserCacheType = CACHE_NONE; [06:44:21] I tried $wgMainCacheType = CACHE_NONE; with APC disabled completely [06:45:51] wait [06:47:48] bawolff: changing $wgParserCacheType to NONE fixes this; I thought it used mainCacheType and did not check [06:48:04] but exceptions are not gone [06:48:21] no, $wgParserCache defaults to main cache type if its not set to NONE, but if it, it uses CACHE_ANY [06:48:59] think I will go install that PgSQL GUI thing :| [06:49:36] https://azurlane.koumakan.jp/Belfast now has stuff outside the main content area wrong afaict. I don't think it did before [06:52:01] oh, I think I see what you mean about revisions being wrong [06:53:59] Maybe something wrong with revision.rev_page field or slots.slot_revision_id, but I'm not sure [06:54:00] bawolff: the history pages for Kaga and Isokaze list the changes for Belfast and Ying Swei [06:54:41] and I am trying to figure out if this is the consequence of the issue with parser, or schema problems [06:55:09] The history pages should not involve the parser afaik [06:55:49] thanks for your help, I am going to investigate with PgAdmin for now since pure SQL shell is a bit of a pain for this [06:56:35] I wonder if there is a maintenance script to rebuild the revisions table [07:00:05] My go to next step in terms of debugging if this was my issue would probably be to enable mediawiki debug log, capture all the SQL queries, run them, and see at what point the sql query returns the wrong thing [07:02:20] yeah I have monolog enabled, just did not turn on query logging yet as that would absolutely slam the server with I/O [07:02:55] this wiki gets ≈ 15-100 rps depending on moon phase [07:03:31] yeah, common trick is to do something like if( isset( $_GET['debug'] ) ) {... so that you can log for yourself but not everyone (or do it based on ip, etc) [07:06:13] oh [07:06:18] thank you again [07:10:13] maybe I'll just grab the 8-day db backup history and spin the wiki up in a VM on my PC [08:03:53] bawolff: I found the offending query [08:05:25] Remilia: what is it? [08:06:16] WikiPage::pageData, SELECT page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM "page" WHERE page_namespace = 0 AND page_title = (title of a page) LIMIT 1 [08:06:38] without LIMIT 1 it gives two rows, one of which is correct [08:07:31] but this really makes no sense [08:07:55] is my postgresql broken [08:08:27] how does it match 'Ying_Swei' in page_title='Isokaze' [08:09:19] bawolff: many thanks nevertheless, I think I will go rebuild postgresql [08:09:27] this is insane [08:09:46] wow [08:10:07] I can safely say I have never heard of that happening before [08:23:35] thanks for the IP filter suggestion for logging, I added some pton()s to match against my prefix and only include monolog settings for that now [10:59:30] Hello and BIG thanks for the awesome software and great support. I got a question this time. Is it possible to search the wikitext sources instead of the rendered page? [11:00:11] I'm asking because this would be useful while we cheer on the people who were making that central repository thing for refs .. was it Wikicite .. do that well and editors will be benefited, redundant work removed, quantity of ref details improved and overall we'll be able to see the bigger picture about what the editors are claiming to base their edits on. [18:34:43] for Iamthehuman1, the answer is yes if you use something like CirrusSearch: https://www.mediawiki.org/wiki/Help:CirrusSearch#Insource