[01:01:24] TimStarling: Can you think of reasons that it might be infeasible or intractable for upstream php to change the pcre unicode behaviour on Mac? Like maybe Apple (or Homebrew) are responsible for doing this in ways mostly outside their control, or a compat angle that I'm not seeing? RE: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1174827 [01:03:42] it seems pcre is embedded in php-src, and in php_pcre.c it seems upstream quite clearly only sets _UTF when /u is used. Yet something is clearly making it behave differently. [01:04:06] I do note that Homebrew's packages compile php with `--with-external-pcre`. [01:12:07] Homebrew tends to use the latest versions of dependencies and indeed uses the latest pcre2 (10.45). I suppose it might be that pcre2 itself changed and that others would get this later. But I don't see anything obvious in the changelog about either 00A0 being added somewhere, or UTF/UCP being enabled by default in some way. [01:14:16] I'm guessing this is coming from elsewhere. There's some references to "user-defined character tables", which I suspect is referring to the operating system and/or other another library dynamically linked or read from some /etc/ path. My trail runs cold there though, as i can't seem to find how or where this happens. [01:16:05] I think PCRE does use the locale by default [01:17:57] https://www.pcre.org/original/doc/html/pcreapi.html#SEC14 [01:20:56] actually PHP uses PCRE2 now, but it's the same. And PHP does call pcre2_maketables() [01:23:59] in PHP git master see php_pcre.c line 753 [01:24:19] A trivial patch but it would need an RFC [01:26:12] I checked the RFC list and there doesn't seem to be any existing proposal for this [01:28:11] I see on Gerrit you said "lobby upstream" -- but you know I have experience with this with https://wiki.php.net/rfc/strtolower-ascii [01:30:14] the RFC process is clear enough, it just requires work [01:31:36] PHP RFCs typically require the code to already be written and in github, so you will have people nitpicking your both your PR and the text of the RFC [01:34:23] ack, I'm aware of your past RFC, hence asking you about whether this is feasible to pursue :) [01:34:32] happy to try my hand at it [01:35:35] ok, so if pcre_maketables is responsible, that means its functions like isspace() are doing the work? https://linux.die.net/man/3/isspace [01:36:31] and Darwin is making this return true for non-breaking space when UTF.8 locale is active, but e.g. GNU is not? /m tests hypothesis [01:39:26] yes PCRE uses isspace() -- https://github.com/php/php-src/blob/master/ext/pcre/pcre2lib/pcre2_maketables.c#L135 [01:40:55] g++> printf("%d\n", isspace(0xA0)); [01:40:55] b'0' [01:40:55] g++> setlocale(LC_ALL, "C.UTF-8"); [01:40:55] g++> printf("%d\n", isspace(0xA0)); [01:40:55] b'1 [01:41:23] yes I think an RFC is feasible