[07:47:59] <Amir1>	 marostegui: sobanski: To make things with ParserCache more complicated, I'm planning to deploy a change today or early next week that would reduce entries by 10% (with my estimates, it'd be 30 times of entries caused by Discussion Tools)
[07:48:13] <Amir1>	 unless you tell me to hold off :D
[07:50:57] <sobanski>	 Would it reduce all entries? I didn't exactly get the bit in brackets, about 30 times...
[07:52:41] <sobanski>	 Overall, I am now hesitant to make any changes to the pc environment before we're running stable on the new HW in both DCs for a while. Could you coordinate this with kormat?
[07:53:02] <sobanski>	 See also https://phabricator.wikimedia.org/T288998#7293988 ;)
[07:56:01] <Amir1>	 I'm basically deploying a change that effectively reduces entries by wikidata to next to zero
[07:56:23] <Amir1>	 I did some measurements in T280599#7242881
[07:56:23] <stashbot>	 T280599: Reduce DiscussionTools' usage of the parser cache - https://phabricator.wikimedia.org/T280599
[07:57:06] <Amir1>	 which out of 550M PC entries we have 55M of them are wikidata
[08:01:12] <Amir1>	 of course these just stop being added and slowly the old entries get cleaned up meaning it'll take a month to be properly cleaned up
[08:15:20] <kormat>	 3 weeks, currently
[08:16:37] <kormat>	 Amir1: can you point me to the task for your wikidata/pc change?
[08:17:01] <Amir1>	 kormat: T285987 ^^
[08:17:01] <stashbot>	 T285987: Do not generate full html parser output at the end of Wikibase edit requests - https://phabricator.wikimedia.org/T285987
[08:17:22] <Amir1>	 ah yeah, it's 22 days now
[08:19:30] <kormat>	 so, in an ideal world we'd have one major change running against pc at a time
[08:20:00] <kormat>	 currently we have: new h/w, discussion tools, whatever performance team is doing that caused the major disk usage spike, and your thing
[08:20:39] <kormat>	 Amir1: given that you're the only one promising to _reduce_ the data, however, i'm inclined to say go ahead nowish
[08:21:22] <Amir1>	 Thanks. I keep track to make sure nothing goes bad, I can wait until your hw work
[08:21:26] <Amir1>	 When is the ETA of that?
[08:26:28] <kormat>	 Amir1: here's the rough timeline for h/w + discussion tools: https://phabricator.wikimedia.org/T280599
[08:26:37] <kormat>	 oh, plus raising retention
[08:27:54] <Amir1>	 okay noted
[08:27:57] <Amir1>	 thanks!
[08:28:24] <Amir1>	 I will ping you once it passes code review and get ready to be deployed
[08:28:47] <Amir1>	 which I think would take a bit
[08:38:20] <kormat>	 👍
[10:08:44] <kormat>	 sobanski: remember i was saying it would be too expensive to analyse PC usage? turns out i was being pessimistic. https://phabricator.wikimedia.org/P17046
[10:12:44] <sobanski>	 kormat: cool
[11:04:30] <Amir1>	 marostegui: I have good news and bad news about flaggedrevs template table (the 168GB one)
[11:05:01] <Amir1>	 the good news is that I found out what's wrong, it's 99.99% useless data and can be deleted
[11:05:17] <Amir1>	 it's basically templatelinks but for the whole history of the wiki.
[12:29:33] <kormat>	 Amir1: my calculations say that wikibase (`wb=3` - i'm assuming) entries in parsercache account for 22% of rows, and 34% of bytes
[12:30:08] <Amir1>	 kormat: commons doesn't count :D
[12:30:12] <Amir1>	 you need to remove that
[12:30:21] <Amir1>	 commons has this weird MCR thingy
[12:30:52] <kormat>	 you never make things simple, do you
[12:31:00] <Amir1>	 oh and another thing about discussion tools, I will tell you soon
[12:32:20] <kormat>	 ok, new figures: 5.8% of rows, 7.3% of bytes
[12:33:13] <Amir1>	 not too bad :D
[12:35:33] * kormat sobs
[12:38:44] <kormat>	 sobanski: fun fact: the space analysis i've been doing for DT is completely pointless
[12:38:59] <kormat>	 they are not measuring what i thought they were. at all.
[12:40:29] <kormat>	 the bad news: there's no way at all for us to calculate the amount of space that DT is using. the good news: whatever it is, it's likely to be faar smaller than the 7% figure i came up with.
[12:43:14] <Amir1>	 I recommend that after a week, we should turn the parser option of that wiki to default enabled, that would reduce the size of PC 
[12:43:33] <Amir1>	 avoiding too much fragmentation 
[12:45:46] <kormat>	 Amir1: can you tell if that's been done for, say, arwiki?
[12:46:10] <Amir1>	 yeah, I can check
[12:48:46] <Amir1>	 nope, it's set to null https://github.com/wikimedia/mediawiki-extensions-DiscussionTools/blob/613b0a9b27a7dd639fe1d5a88c4116ba6570e358/includes/Hooks/ParserHooks.php#L109
[12:49:28] <Amir1>	 but it shouldn't be enabled with the config enabling of dt because PC entries need to expire
[12:54:43] <kormat>	 damn, i was doing so well understanding you up till right now :)
[12:54:50] <kormat>	 ah
[12:55:08] <kormat>	 you're saying that dt shouldn't be enabled for a wiki at the same time as it's added to the default set?
[12:55:17] <kormat>	 for... reasons
[12:58:00] <Amir1>	 yeah, because then the user might get the old entry (saved as canonical) which wouldn't work here
[12:58:45] <kormat>	 ok.. so we'd want to wait a full retention period (3 weeks) before doing that?
[12:58:59] <Amir1>	 for talk pages it's one week
[12:59:12] <kormat>	 💡
[13:02:58] <kormat>	 Amir1: thanks for your patience <3
[13:03:40] <sobanski>	 https://wikitech.wikimedia.org/wiki/Parser_cache says it's 10 days. I wrote that (-_-) but it was based on what Timo told me (^_^). Or in other words, I would recommend double checking, unless Amir1 actually checked it right now, in which case I should update that page.
[13:04:05] <Amir1>	 kormat: seriously, this doesn't need any patience, you'll be tested about patience in a couple of months :D
[13:04:12] <Amir1>	 let me double check
[13:04:16] <kormat>	 :D
[13:05:22] <Amir1>	 $wgDiscussionToolsTalkPageParserCacheExpiry = 86400 * 10;
[13:05:35] <Amir1>	 ten days is correct, I don't know why i thought it's seven
[14:51:11] <kormat>	 Amir1: at the db level, is there a way to correlate a dt and non-dt version of a page?
[14:51:50] <Amir1>	 they should have the same page id
[14:51:56] <Amir1>	 the int value in the key
[14:52:19] <kormat>	 hmm. that's what i was _hoping_, but so far it hasn't paid off
[14:53:06] <Amir1>	 how so?
[14:53:18] * Amir1 likes payments
[14:53:34] <kormat>	 well, i'm doing a query to see, say, all keys like `enwiki:%dtreply%`
[14:53:55] <kormat>	 then select one of those, and query for keys like `%<idnum>`
[14:54:00] <kormat>	 er, with trailing %
[14:54:08] <kormat>	 and i only find the original dtreply entry
[14:54:28] <kormat>	 i've tried this about 10 times so far
[14:55:22] <Amir1>	 oh, I think it's because it's consistent hashing, so it'll end up in another host
[14:55:25] <Amir1>	 right?
[14:55:27] <kormat>	 (i'm assuming the int is what maps an entry to a pc shard + table)
[14:55:33] <kormat>	 😬
[14:55:46] <kormat>	 everything is terrible
[14:55:51] <Amir1>	 I think it's hash of the whole key (can't say for sure though)
[14:55:55] <Amir1>	 welcome to Wikimedia :D
[14:56:29] <jynus>	 it used to be worse, it used to be a hash of the ip
[14:58:28] <Emperor>	 kormat: in the test db environment, is dbctl meant to work?
[14:58:35] <kormat>	 Emperor: nope
[14:58:47] <Emperor>	 ah, OK :)
[14:59:24] <kormat>	 Emperor: it relies on etcd, which we don't have set up in pontoon
[14:59:33] <Emperor>	 kormat: so if I want to find out which hosts are running databases on whic ports...?
[14:59:50] <Emperor>	 [I mean, I could just ssh into each machine in turn and ask it, but]
[15:00:09] <kormat>	 Emperor: `sudo cumin O:mariadb::misc::db_inventory` in this case
[15:00:19] <kormat>	 they're all single-instance hosts, running on 3316
[15:00:34] <kormat>	 `sudo db-replication-tree zarcillo0` is probably closer to what you'd want
[15:01:21] <kormat>	 Amir1: good news! the id number is not unique ;)
[15:01:37] <Amir1>	 :D
[15:01:42] <kormat>	 maybe it's scoped to the wiki
[15:01:49] <Amir1>	 it should be the page id
[15:01:56] <Amir1>	 page_id from page
[15:02:10] <kormat>	 is now where i should nod like i understand?
[15:02:59] <kormat>	 at least i'm guessing that `enwiki:pcache:idhash:14596159-0!dtreply=1!thumbsize=7!tmh-videojs!responsiveimages=0` and `svwiki:pcache:idhash:4596159-0!canonical` are different things ;)
[15:03:53] <Amir1>	 yup they are :D
[15:04:28] <Amir1>	 the first one is PC entry of Talk:Babrra_massacre in enwiki
[15:05:25] <kormat>	 what a fortuitous random page
[15:05:26] <Amir1>	 the other being pc entry of Piriqueta_aurea in svwiki
[15:11:13] <kormat>	 terrible thing i'm currently doing:
[15:11:14] <kormat>	 `$ time for table in pc{000..255}; do out=$(mysql.py -BN -h pc1009 parsercache -e "select keyname from $table where keyname like 'enwiki:%4596159%'"); [ -n "$out" ] && { echo $table: $out; }; done`
[15:11:35] <kormat>	 new discovery: there are things which aren't idhash entries. like `enwiki:pcache:idoptions:14596159`
[15:13:46] <Amir1>	 these are parser output values of a page, e.g. wikidata item (for fast retrieval) 
[15:14:18] <Amir1>	 kormat: what I did was that I basically dumped the keys and size of values on a txt file and had fun with it with python
[15:14:59] * kormat stares suspiciously at this use of the word "fun"
[15:21:25] <kormat>	 Amir1: success!
[15:26:54] <kormat>	 https://phabricator.wikimedia.org/P15202#87344
[16:05:17] <kormat>	 sobanski: FYI: https://phabricator.wikimedia.org/P17050 (cc: Amir1 for fact-checking)
[16:08:15] <Amir1>	 kormat: looks correct, one nitpick, the canonical entry exist if there is an edit on the page, not all the time which is weird because if I make an edit with DT enabled, it'll create canonical for me which would be useless 
[16:08:42] <kormat>	 Amir1: oh - simply viewing the page doesn't create a canonical entry?
[16:09:36] <Amir1>	 it would if you're not logged in or your settings all are default
[16:09:53] <kormat>	 oh lordy. ok.
[16:09:54] <Amir1>	 but otherwise, it looks for your desired output and create that only
[16:10:38] <Amir1>	 (at least that's on paper, in reality, I think we do rendering multiple times during edit, god knows what's happening)
[16:11:21] <kormat>	 Amir1: do you mean that if you (with DT enabled, because that's how you roll) make an edit it'll create _2_ PC entries? canonical+your custom version
[16:11:53] <Amir1>	 yup
[16:12:10] <kormat>	 wooonderful
[16:12:23] <Amir1>	 one due to edit (coming from DerivedPageDataUpdater::doParserCacheUpdate)
[16:12:30] <Amir1>	 one when you refresh the page
[16:13:06] <Amir1>	 I mean you can make it with a gadget and close the page and never refresh it and no one looks at it, then it'll be only canonical 
[16:13:45] * kormat winces, tries not to think about it
[16:24:24] <kormat>	 Amir1: post updated to include the canonical qualification
[16:24:37] <Amir1>	 Thanks <3