[02:51:38] ref T108434, https://phabricator.wikimedia.org/T108434#10672231 [02:51:40] T108434: Some rows (from the year 2004) in SQL databases have text in latin1 encoding - https://phabricator.wikimedia.org/T108434 [02:51:57] wrote `mwscript ~/findLegacyEncodingRows.php --wiki frwiki --table actor` to find bad entries. [02:52:06] https://gerrit.wikimedia.org/r/1130770 [23:47:49] Amir1: Ran on testwiki incl comment table this time, found non-zero, which I did not expect on testwiki [23:48:01] they're all cases where we seem to have trimmed comments incorrectly, I think? [23:48:49] Afaik we don't trim comments directly, since that would be surprising I think. If too much data is submitted via API or via UI workaround (or lack of maxlength support, in theory), we presumably reject the edit [23:49:03] but these are auto-generated summaries were we include an excerpt of the page content [23:49:13] and that excerpt is of course trimmed to the first 100 chars or whatever [23:58:21] Everything in mediawiki is a rabbit hole