[07:47:59] marostegui: sobanski: To make things with ParserCache more complicated, I'm planning to deploy a change today or early next week that would reduce entries by 10% (with my estimates, it'd be 30 times of entries caused by Discussion Tools) [07:48:13] unless you tell me to hold off :D [07:50:57] Would it reduce all entries? I didn't exactly get the bit in brackets, about 30 times... [07:52:41] Overall, I am now hesitant to make any changes to the pc environment before we're running stable on the new HW in both DCs for a while. Could you coordinate this with kormat? [07:53:02] See also https://phabricator.wikimedia.org/T288998#7293988 ;) [07:56:01] I'm basically deploying a change that effectively reduces entries by wikidata to next to zero [07:56:23] I did some measurements in T280599#7242881 [07:56:23] T280599: Reduce DiscussionTools' usage of the parser cache - https://phabricator.wikimedia.org/T280599 [07:57:06] which out of 550M PC entries we have 55M of them are wikidata [08:01:12] of course these just stop being added and slowly the old entries get cleaned up meaning it'll take a month to be properly cleaned up [08:15:20] 3 weeks, currently [08:16:37] Amir1: can you point me to the task for your wikidata/pc change? [08:17:01] kormat: T285987 ^^ [08:17:01] T285987: Do not generate full html parser output at the end of Wikibase edit requests - https://phabricator.wikimedia.org/T285987 [08:17:22] ah yeah, it's 22 days now [08:19:30] so, in an ideal world we'd have one major change running against pc at a time [08:20:00] currently we have: new h/w, discussion tools, whatever performance team is doing that caused the major disk usage spike, and your thing [08:20:39] Amir1: given that you're the only one promising to _reduce_ the data, however, i'm inclined to say go ahead nowish [08:21:22] Thanks. I keep track to make sure nothing goes bad, I can wait until your hw work [08:21:26] When is the ETA of that? [08:26:28] Amir1: here's the rough timeline for h/w + discussion tools: https://phabricator.wikimedia.org/T280599 [08:26:37] oh, plus raising retention [08:27:54] okay noted [08:27:57] thanks! [08:28:24] I will ping you once it passes code review and get ready to be deployed [08:28:47] which I think would take a bit [08:38:20] 👍 [10:08:44] sobanski: remember i was saying it would be too expensive to analyse PC usage? turns out i was being pessimistic. https://phabricator.wikimedia.org/P17046 [10:12:44] kormat: cool [11:04:30] marostegui: I have good news and bad news about flaggedrevs template table (the 168GB one) [11:05:01] the good news is that I found out what's wrong, it's 99.99% useless data and can be deleted [11:05:17] it's basically templatelinks but for the whole history of the wiki. [12:29:33] Amir1: my calculations say that wikibase (`wb=3` - i'm assuming) entries in parsercache account for 22% of rows, and 34% of bytes [12:30:08] kormat: commons doesn't count :D [12:30:12] you need to remove that [12:30:21] commons has this weird MCR thingy [12:30:52] you never make things simple, do you [12:31:00] oh and another thing about discussion tools, I will tell you soon [12:32:20] ok, new figures: 5.8% of rows, 7.3% of bytes [12:33:13] not too bad :D [12:35:33] * kormat sobs [12:38:44] sobanski: fun fact: the space analysis i've been doing for DT is completely pointless [12:38:59] they are not measuring what i thought they were. at all. [12:40:29] the bad news: there's no way at all for us to calculate the amount of space that DT is using. the good news: whatever it is, it's likely to be faar smaller than the 7% figure i came up with. [12:43:14] I recommend that after a week, we should turn the parser option of that wiki to default enabled, that would reduce the size of PC [12:43:33] avoiding too much fragmentation [12:45:46] Amir1: can you tell if that's been done for, say, arwiki? [12:46:10] yeah, I can check [12:48:46] nope, it's set to null https://github.com/wikimedia/mediawiki-extensions-DiscussionTools/blob/613b0a9b27a7dd639fe1d5a88c4116ba6570e358/includes/Hooks/ParserHooks.php#L109 [12:49:28] but it shouldn't be enabled with the config enabling of dt because PC entries need to expire [12:54:43] damn, i was doing so well understanding you up till right now :) [12:54:50] ah [12:55:08] you're saying that dt shouldn't be enabled for a wiki at the same time as it's added to the default set? [12:55:17] for... reasons [12:58:00] yeah, because then the user might get the old entry (saved as canonical) which wouldn't work here [12:58:45] ok.. so we'd want to wait a full retention period (3 weeks) before doing that? [12:58:59] for talk pages it's one week [12:59:12] 💡 [13:02:58] Amir1: thanks for your patience <3 [13:03:40] https://wikitech.wikimedia.org/wiki/Parser_cache says it's 10 days. I wrote that (-_-) but it was based on what Timo told me (^_^). Or in other words, I would recommend double checking, unless Amir1 actually checked it right now, in which case I should update that page. [13:04:05] kormat: seriously, this doesn't need any patience, you'll be tested about patience in a couple of months :D [13:04:12] let me double check [13:04:16] :D [13:05:22] $wgDiscussionToolsTalkPageParserCacheExpiry = 86400 * 10; [13:05:35] ten days is correct, I don't know why i thought it's seven [14:51:11] Amir1: at the db level, is there a way to correlate a dt and non-dt version of a page? [14:51:50] they should have the same page id [14:51:56] the int value in the key [14:52:19] hmm. that's what i was _hoping_, but so far it hasn't paid off [14:53:06] how so? [14:53:18] * Amir1 likes payments [14:53:34] well, i'm doing a query to see, say, all keys like `enwiki:%dtreply%` [14:53:55] then select one of those, and query for keys like `%` [14:54:00] er, with trailing % [14:54:08] and i only find the original dtreply entry [14:54:28] i've tried this about 10 times so far [14:55:22] oh, I think it's because it's consistent hashing, so it'll end up in another host [14:55:25] right? [14:55:27] (i'm assuming the int is what maps an entry to a pc shard + table) [14:55:33] 😬 [14:55:46] everything is terrible [14:55:51] I think it's hash of the whole key (can't say for sure though) [14:55:55] welcome to Wikimedia :D [14:56:29] it used to be worse, it used to be a hash of the ip [14:58:28] kormat: in the test db environment, is dbctl meant to work? [14:58:35] Emperor: nope [14:58:47] ah, OK :) [14:59:24] Emperor: it relies on etcd, which we don't have set up in pontoon [14:59:33] kormat: so if I want to find out which hosts are running databases on whic ports...? [14:59:50] [I mean, I could just ssh into each machine in turn and ask it, but] [15:00:09] Emperor: `sudo cumin O:mariadb::misc::db_inventory` in this case [15:00:19] they're all single-instance hosts, running on 3316 [15:00:34] `sudo db-replication-tree zarcillo0` is probably closer to what you'd want [15:01:21] Amir1: good news! the id number is not unique ;) [15:01:37] :D [15:01:42] maybe it's scoped to the wiki [15:01:49] it should be the page id [15:01:56] page_id from page [15:02:10] is now where i should nod like i understand? [15:02:59] at least i'm guessing that `enwiki:pcache:idhash:14596159-0!dtreply=1!thumbsize=7!tmh-videojs!responsiveimages=0` and `svwiki:pcache:idhash:4596159-0!canonical` are different things ;) [15:03:53] yup they are :D [15:04:28] the first one is PC entry of Talk:Babrra_massacre in enwiki [15:05:25] what a fortuitous random page [15:05:26] the other being pc entry of Piriqueta_aurea in svwiki [15:11:13] terrible thing i'm currently doing: [15:11:14] `$ time for table in pc{000..255}; do out=$(mysql.py -BN -h pc1009 parsercache -e "select keyname from $table where keyname like 'enwiki:%4596159%'"); [ -n "$out" ] && { echo $table: $out; }; done` [15:11:35] new discovery: there are things which aren't idhash entries. like `enwiki:pcache:idoptions:14596159` [15:13:46] these are parser output values of a page, e.g. wikidata item (for fast retrieval) [15:14:18] kormat: what I did was that I basically dumped the keys and size of values on a txt file and had fun with it with python [15:14:59] * kormat stares suspiciously at this use of the word "fun" [15:21:25] Amir1: success! [15:26:54] https://phabricator.wikimedia.org/P15202#87344 [16:05:17] sobanski: FYI: https://phabricator.wikimedia.org/P17050 (cc: Amir1 for fact-checking) [16:08:15] kormat: looks correct, one nitpick, the canonical entry exist if there is an edit on the page, not all the time which is weird because if I make an edit with DT enabled, it'll create canonical for me which would be useless [16:08:42] Amir1: oh - simply viewing the page doesn't create a canonical entry? [16:09:36] it would if you're not logged in or your settings all are default [16:09:53] oh lordy. ok. [16:09:54] but otherwise, it looks for your desired output and create that only [16:10:38] (at least that's on paper, in reality, I think we do rendering multiple times during edit, god knows what's happening) [16:11:21] Amir1: do you mean that if you (with DT enabled, because that's how you roll) make an edit it'll create _2_ PC entries? canonical+your custom version [16:11:53] yup [16:12:10] wooonderful [16:12:23] one due to edit (coming from DerivedPageDataUpdater::doParserCacheUpdate) [16:12:30] one when you refresh the page [16:13:06] I mean you can make it with a gadget and close the page and never refresh it and no one looks at it, then it'll be only canonical [16:13:45] * kormat winces, tries not to think about it [16:24:24] Amir1: post updated to include the canonical qualification [16:24:37] Thanks <3