[00:11:48] I helped someone I know to submit an implementation, but they accidentally left a print statement in it, which returned some non-existent error codes. : https://tools-static.wmflabs.org/bridgebot/18a11c93/file_57053.jpg [00:55:45] but that's nothing to do with mediawiki. regardless of whether you apply nfc, nfd or neither, they're supposed to behave and be rendered the same way (re @mahir256: the answer (and the comment on the answer) to https://stackoverflow.com/questions/58559390/ describe the problem) [01:00:19] https://www.unicode.org/reports/tr53/ mentions using u+034f (combining grapheme joiner) when the order should be different. that seems to work with noto sans arabic, but I don't have many other arabic fonts (that I recognise as being arabic) that I can test with (mixed results with the ones I do have, but I don't know how good they are in general either) [02:48:17] https://tools-static.wmflabs.org/bridgebot/2b0e3410/file_57056.jpg [02:48:17] https://tools-static.wmflabs.org/bridgebot/697f2731/file_57057.jpg [02:51:39] Amiri on the left and PakType on the right, both render the same way with or without the combining grapheme joiner [02:51:40] the main dilemma though is that it should not be necessary to include that when the shadd should always come first when typing, and the normalization does the opposite. if you go back and edit this you have to delete two characters and put the shadd back in if you want to keep that and just drop kasra/zer which is counterintuitive [03:51:04] I'd appreciate additional eyes across the tests I've just added to Z11193. For example, when you "remove interpunction" would you expect to keep or remove emojis? [03:57:06] I imagine it would depend what your goal is, and what counts as punctuation is language-specific anyway [03:58:06] also Z574 (re @Toby: I helped someone I know to submit a Python implementation, but they accidentally left a print statement in it, which returned a ...) [04:00:31] Agreed, so we really need a few functions to satisfy each different aim? Regarding language-specific, are these punctuation characters that just don't get used in another language, or characters that are punctuation in one but alphabetic in another? (re @Nikki: I imagine it would depend what your goal is, and what counts as punctuation is language-specific anyway)