[13:46:58] [telegram] Does anyone know how to get JavaScript regexes to handle Unicode characters beyond FFFF? [13:47:05] [telegram] If you do, resolving this bug should be easy: [13:47:06] [telegram] https://phabricator.wikimedia.org/T297351 [13:47:49] [telegram] does decomposing into upper/lower half surrogates (à la UCS-2) not work? (re @amire80: Does anyone know how to get JavaScript regexes to handle Unicode characters beyond FFFF?) [13:47:50] [telegram] (But maybe I'm naïve, as it often happens.) [13:48:18] [telegram] Maybe it does, but I haven't tried it because I'm not smart enough to know how is it done. (re @mahir256: does decomposing into upper/lower half surrogates (à la UCS-2) not work?) [13:49:19] [telegram] https://stackoverflow.com/q/3744721 [13:50:28] [telegram] huh, that error doesn't make sense to me even if it doesn't recognise the characters, since it's not the only text in the field [14:16:56] [telegram] in my browser console, text.match(/[\uD800-\uDFFF]/) is true, so it appears it's already using surrogate pairs... removing that range would allow any non-bmp character, excluding certain non-bmp ranges could be done by matching the surrogate pairs but first we'd need to decide which ranges to allow or block (re @amire80: Maybe it does, but I haven't tried it because I'm not smart enough to know how is it done.) [14:36:31] [telegram] What games are you all playing, can you teach me [14:36:46] [telegram] \uD806[\uDCA0-\uDCF2\uDCFF] should match a warang citi character, but it's actually looking for things to *exclude*, so [\uD800-\uDBFF][^\uDC00-\uDFFF] should match a high surrogate followed by anything other than a low surrogate, [^\uD800-\uDBFF][\uDC00-\uDFFF] should match a low surrogate preceded by anything other than a high surrogate, \uD806[^\uDCA0-\uDCF2\uDCFF] should match anything that starts with the sa