[06:59:05] User:沈澄心 has just put up a couple of interesting test cases for Z11040. An example is Z17022 which Python currently passes, but Javascript fails. Since this function is fairly widely used, we should decide which language we want to keep at this version of "String length", and which to send to a new function (name?). I take it the Python matches better with a UTF-32 code [06:59:06] point? [07:15:04] We should do a string to code point list composition. I’ll look at that later. I guess the other function is length in bytes? (re @Toby: User:沈澄心 has just put up a couple of interesting test cases for Z11040. An example is Z17022 which Python currently passes, but ...) [07:19:06] Oh, hang on… it worked first time 😏 (re @Toby: User:沈澄心 has just put up a couple of interesting test cases for Z11040. An example is Z17022 which Python currently passes, but ...) [07:37:51] There are lots of details here: https://hsivonen.fi/string-length/ [07:49:32] Z12257 (protected) is doing nothing (re @Toby: User:沈澄心 has just put up a couple of interesting test cases for Z11040. An example is Z17022 which Python currently passes, but ...) [15:35:36] Ah! My moment to shine! I was just reviewing our types for Byte and Code point, and boy am I glad that they're not open for business yet. And I read the Unicode standard earlier this year in order to answer this and related questions. [15:40:51] I think we need several 'length' functions on strings: [15:40:52] 1. Length as in the number of bytes, maybe split by normalization (i.e. whether NFC or NFD) and which Unicode encoding (UTF8, UTF16, ...) [15:40:54] 2. Length as in the number of code points, maybe split by normalization [15:40:55] 3. Length in the number of glyphs (which, I would argue, is probably the thing most users may want) [15:40:57] 4. Possibly length in the number of glyphs and invisible markers [23:38:12] So is the python function giving us #1 with UTF32, or #2? Or are they always the same? (re @vrandecic: I think we need several 'length' functions on strings: [23:38:13] 1. Length as in the number of bytes, maybe split by normalization (i.e. w...)