[22:06:42] TimStarling: I'm curious if you know of any historical reasons not to enable wgCompressRevisions by default? [22:07:59] 3rd party users inspecting the text table using their own tools, old_text LIKE '%foo%' etc. [22:09:06] there is the documented caveat about zlib support [22:09:46] right. I suppose if in theory php is compiled without zlib, it'd be important for wgCompressRevisions=true to degrade to storing uncompressed still. [22:10:20] woudl being able to turn wgCompressRevisions off still suffice for third parties that want to use ReplaceText and such? I don't have a sense of how common this is. [22:14:49] probably [22:15:00] is there a maintenance script that decompresses compressed rows? [22:15:21] that might be useful if you find yourself with a partially compressed wiki and want it to be uncompressed [22:15:32] right, if post-upgrade you realize the default changed [22:15:50] I was just looking at compressOld vs recompressTracked [22:17:32] from what I can tell, one is meant to replace the other, where compressOld is presumalby more naive/very slow for a large farm, and recompressTracked does it in two phases that are easy to paralllise and continue. Not sure why we need both, maybe we don't? [22:17:55] (and of course T106386 ) [22:17:56] T106386: Compress data at external storage - https://phabricator.wikimedia.org/T106386 [22:17:58] I used compressOld on rationalwiki recently, it still works [22:18:30] I'm assuming this task is mainly for using 'gzipconcat' specifically since we do compress already afaik. [22:18:36] recompressTracked requires external storage and is generally more difficult to set up [22:19:02] Hm.. I thought the use of ES was transparent. [22:19:12] diff compression gives a much better compression ratio than concat [22:20:07] I have heard https://pecl.php.net/package/xdiff is unmaintained, but it is super awesome and we should probably just update it if we want to recompress [22:20:31] the problem with concatenating then gzipping is the 32KB dictionary size [22:20:37] right we have both compress and diff as options. [22:20:41] concat* and diff. [22:20:57] if the article is longer than 32KB you're pretty much back to single revision compression [22:21:41] yeah, that makes sense. But concat zip is "easier" and more portable I suppose assuming pure php diffing would be hard. but then agian, neither concat nor diff is proposed to have as default, so we can afford to do something that isn't easy out of the box and available by default using diff. [22:21:54] it'd probably still save a lot, but yeah diff makes sense [22:22:35] xdiff is an exercise in implementing computer science papers, it's pretty heavy going [22:23:02] we have a pure PHP version of its patch algorithm which we use in production, to display the text it compressed, that was easy enough [22:24:04] LZMA would work but there is no PHP extension for it [22:24:20] and decompression would require the library, unlike xdiff [22:25:17] TimStarling: do we have non-zero blobs stored with diff history blob then? I thought we either didn't use it in prod or since relocated them with plain gzip. [22:25:46] but given we ran compress old at least once in our history, with you, I guess we used diff then [22:25:47] makes sense [22:26:06] and we just copied the old data around as-is. [22:26:10] during host migrations [22:27:31] yes we have many diff history blobs [22:28:03] compression ratio was ~95% [22:35:31] git internally seems to use xdiff as well, or a fork of it anyway [22:38:30] although the latest official release seems to have a zero-byte NEWS file, and a 2 line entry in ChangeLog from 2006. [22:40:37] just to confirm, http://www.xmailserver.org/xdiff-lib.html is the correct upstream? [22:40:46] seems so [22:41:10] it's what php/php-text-xdiff and git/git both point to [22:41:22] I don't see a debian package for xdiff [22:41:26] libxdiff that is [22:41:38] although there is an abandoned fork at https://packages.debian.org/search?searchon=sourcenames&keywords=mgdiff [22:43:00] yes that upstream is linked from https://github.com/php/pecl-text-xdiff [22:45:52] * legoktm adds to list of interesting projects [22:45:57] the CI for php-xdiff installs its dependency from https://windows.php.net/downloads/pecl/deps/ [22:46:09] that's not at all weird. [22:46:13] no non-Windows CI coverage [23:02:58] that's where the auto-built LuaSandbox DDL gets lua5.1 from fwiw