[10:27:25] A ton of things use IPUtils because of how IPs double as usernames (so e.g. if you want to check whether a user is truly anon or a normal user not autocreated for some reason, you need to check if their username is an IP). [10:28:07] CentralAuth needs to check whether the username its hooks are called on is an actual user or someone is looking at an IP talk page, etc [10:28:34] the IP range stuff is only used for blocks and CU, I think [11:07:51] IP range is used for XFF to determine the current IP for edit attribution, MyContribs, MyTalk, Logstash etc [11:08:11] Since we don't list every IP separately [11:08:46] That's probably the most voluminous call in terms of CPU cycles [11:09:25] Eg every request in Setup.php to compute the first call to wgRequest->getIP [11:11:53] https://performance.wikimedia.org/arclamp/svgs/daily/2025-11-12.excimer-wall.index.svgz?x=115.9&y=1205&s=getIP [11:16:53] Although the attribution cost is somewhat unfair since it's incurred only once so whatever calls it first gets the flame graph blame. [15:20:34] The IPUtils library and the way it is used in MediaWiki is very inefficient. There's a lot of redundant validation. [15:20:49] here's an example picked at random: https://github.com/wikimedia/mediawiki/blob/ba39f4f2/includes/HTMLForm/Field/HTMLUsersMultiselectField.php#L40 [15:21:06] if ( IPUtils::isIPAddress( $user ) ) { $parsedIPRange = IPUtils::parseRange( $user ); .... } [15:21:11] seems reasonable: validate then parse [15:21:58] take the case when $user is an ipv6 address. internally parseRange will: [15:23:04] IPUtils::toHex -> IPUtils::isIPv6 -> IPUtils::convertIPv6ToRawHex -> IPUtils::sanitizeIP -> IPUtils::isIPAddress (again!), IPUtils::isIPv4 (for exclusion) [15:24:08] if only there were ways to nerd snipe people to improve these situations [15:24:28] !bash if only there were ways to nerd snipe people to improve these situations [15:24:28] Lucas_WMDE: Stored quip at https://bash.toolforge.org/quip/Gn7RfZoBffdvpiTrbDzY [15:32:02] the library internally uses its public APIs in a nested fashion. since those APIs are public they treat all input with suspicion. the library could have private methods for e.g. converting to hex a value that has already been validated as an IPv6 address. but I don't know if I like that -- it's good for this library to be extremely defensive. [15:35:04] a better approach would be to have an IP object that the library parses/validates into its most specific form [15:35:14] Callers could then do $ip->isRange(), $ip->isIPv6(), etc., which would just read internal state instead of re-running validation [15:57:44] reverse flame graphs mainly show IPSet::newFromJson and IPSet->addCidr. https://performance.wikimedia.org/arclamp/svgs/daily/2025-11-12.excimer.api.reversed.svgz?s=IPUtils%7CIPSet [15:58:26] I guess these are quite fragmented so not proof of anything [16:02:20] newFromJson would almost certainly just be TrustedXFF [16:02:27] Yeah [16:04:24] There's a slightly counter-intuitive change there where we actually went from opcache'd php array *to* json. [16:04:35] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TrustedXFF/+/803268 [16:04:45] But given the prominence of addCidr this makes sense. [16:05:05] because it also went from unprocessed to pre-processed data [16:05:25] before is php array of unparsed inputs and the json state is fully parsed already so it gets to skip all that [16:05:38] https://gerrit.wikimedia.org/g/mediawiki/libs/IPUtils/+/227fb406d92d8d49f4db4c71112f8a1baedb628e/src/IPSet.php#248 [16:06:21] I was looking at preSend : isIPAddress -> 0.60%, isIPv4 -> 0.38%, isIPv6 -> 0.34%, sanitizeIP -> 0.50%, toHex -> 0.11% [16:06:52] https://performance.wikimedia.org/arclamp/svgs/daily/2025-11-12.excimer.all.fn-PreSend.reversed.svgz?s=IPSet%7CIPUtils [16:08:31] isIPAddress could probably be completely eliminated if there was an IP object with a constructor that combined validation+parsing by trying isIPv4() ? yes -> valid, no -> isIPv6()? yes -> valid, no -> invalid IP. [16:08:50] PreSend btw is not from reqStart to send, but preSend itself (i.e. the step right before send where we run presend deferred updates) if that makes sense. [16:09:10] ohhh, TIL. [16:09:58] it's a zoom-ined view because "all" will cut the peaks of that slice. It's a basically onlyh showing frames that contain "::preOutputCommit" [16:10:09] similar to what we do for EditAction [16:11:39] it's still significant though, I mean 7% of IPSet+IPUtils there in that reversed graph [16:12:06] but yeah it's 7% of the PreSend time, not the overall response time. [21:00:14] just in-process cache it? [21:00:48] not that many IPs get processed in a single request [21:01:11] an IP object is for sure nicer, but it would require sooo much refactoring [22:36:26] I suppose it's worth benchmarking whether the overhead of something like MapCacheLRU::get/set for each such call outperforms the overhead we're trying to save. [22:36:42] It could, but it's not obvious to me that it will. [22:37:56] rendering of action=history and e.g. talk pages with anon participants will be among examples with lots of different IPs/cache misses. [23:41:56] caching isn't free. the indirect costs of caching something are hard to measure, because they're smeared around. Time in IP* goes down, but what fell out of the in-process cache (or the CPU's caches) because you stuffed a bunch of extra data [23:48:11] for the example I pasted earlier (the one in HTMLUsersMultiselectField.php), you can avoid some of the extra work by skipping the IPUtils::isIPAddress() check and just attempting to IPUtils::parseRange() directly, since it does the validation as well (and returns a falsey value on invalid inputs). I haven't done a systematic audit of IPUtils usage in core but there are certainly more places [23:48:11] where this kind of change could be made [23:51:00] an IP object would be easier to hold correctly, so it will be a better bulwark against future regressions. But optimizing individual call-sites has the advantage of being a local change that's easy to apply and review, and maybe you capture most of the opportunity that way and call it a day