[01:13:38] https://wikis.world/@mediawiki/110545572936572178 :D [13:17:18] Krinkle: https://gerrit.wikimedia.org/r/c/operations/dns/+/930293 and see also https://people.wikimedia.org/~jameel/ProbeAnalysis/files/plots/plots1/baw/ :D [13:17:33] we're going to keep an eye on time to first contentful paint after we push this [13:18:49] cdanis: the parenthical number, that's sample count? [13:18:52] yep [13:18:54] k [13:19:08] and the datacenters are ordered by their current order in geo-maps [13:19:20] so if it doesn't look like a descending staircase, something is off :) [13:19:26] oh neat! [13:19:36] https://people.wikimedia.org/~jameel/ProbeAnalysis/files/plots/plots1/baw/Thailand%20(TH).png [13:19:47] indeed! [13:19:52] what does the Ø mean on the dreamers row? [13:20:02] those are outliers that are outside the inter-quartile range [13:20:13] https://people.wikimedia.org/~jameel/ProbeAnalysis/files/plots/plots1/baw/Bosnia%20and%20Herzegovina%20(BA).png [13:20:40] hm.. so this is p10, 25, 50, 75 99 and then outliers or something like that? [13:22:32] close to that [13:22:34] https://www.simplypsychology.org/wp-content/uploads/boxplot-outliers.png [13:23:03] ah I see, the non-outlier lines are more akin to a standard deviation than percentiles themselves [13:23:07] yeah [13:23:22] cool [14:57:01] Krinkle: any concerns/comments about the patch, or lgtu? [15:23:55] cdanis: geo-maps change? [15:27:11] yes :) [15:27:58] I'm not sure I can comment. Basing it on latency seems right to me. Two questions that come to mind is: 1) Did we have a cut off for sample count below which we discard the result? (i.e stick to RIPE). And 2) Were there cases where there was significant overlap? If so, how did we handle that? Like did we pick median, fastest, slowest. Depending on how big the ranges are, it may be worthwhile to pick one with a higher minimum if it means [15:27:58] the pages will still load faster for most users in that country. [15:28:31] That is to say, I'm guessing connectivity is not always the same for all ISPs in a country, or whatever other reason there may be for big differences. [15:29:00] E.g. if it's bipolar, the second hump may be more interesting. [15:30:13] re 1) not a formalized one, but these have a reasonably large sample size, and unlike RIPE Atlas data they do match the demographics of our actual users [15:30:32] and 2) these are the cases where we're pretty convinced that one choice is better-or-equal for ~all users [15:30:43] we're going to answer the harder questions as we get more data :) [15:30:49] k [15:31:17] the criteria used here was that all of 25th, 50th, and 75th were better [15:31:38] right [15:31:44] that's a nice starting point indeed [15:32:18] I'm not sure that under <200 data points would be as confident. [15:32:31] I have a higher opinion of RIPE probes than the average desktop/mobile device in whatever random state it is. [15:32:49] hmm, fair enough [15:32:53] there were several I clicked on in that directory that had relatively few data points [15:33:51] For our ASReport, we normalized by cpu benchmark. i.e. bucket latencies by cpu bench, and pick one cpu bucket for all metrics in the report for a given country. https://performance.wikimedia.org/asreport/ [15:34:31] that implicitly took care of many proxy factors like battery percentage, theoretical cpu frequency, background tasks, etc. [15:36:52] for the countries in the current patch: KE only has 154 data points, but there's even fewer RIPE Atlas probes there -- only 19 probes online [15:37:25] the rest all have >200 data points [15:37:28] how frequent do those probes collect data? [15:37:58] btw, are our sample counts page views or pings? ie. is it 3x? [15:38:19] sample counts are reports -- so yeah, 3 pulses per report [15:38:30] ok, so we don't discard 2/3 [15:38:42] discard/aggregate [15:38:59] yeah, these are all from the fetch performance api, and this is request_time [15:39:08] which didn't really vary across pulses [15:40:10] RIPE Atlas probes are collecting data every 240s, although it isn't clear to me that all the probes re actually involved [15:40:52] for example https://atlas.ripe.net/measurements/11645085/ [15:42:03] in that one example there's currently one probe from KE reporting data [15:43:07] .... and that one is actually an anchor, not a probe! [15:43:24] oh I was looking at the anchoring mesh measurement [15:43:44] searching google for that term brings me to https://github-wiki-see.page/m/RIPE-Atlas-Community/ripe-atlas-tips-and-tricks/wiki/Anchoring-Measurements [15:43:50] which is the most 90s page I've seen all year [15:44:06] ahahaha [15:44:13] yes, that's the measurement for all anchors pinging each other [15:44:17] I'm expecting there to be a "Buy this book to get rich" button somewhere down that page after 200 pages of scrolling [15:44:24] anchors are the devices that are supposed to be in the 'core' of the network, in datacenters [15:44:36] hm [15:44:45] https://github.com/RIPE-Atlas-Community/ripe-atlas-tips-and-tricks/wiki/Anchoring-Measurements is better [15:44:49] not sure what's up with that proxy domain [15:45:00] we actually have 0 probes from KE currently participating in the ping against our esams anchor [15:45:35] interesting, so the absolute numbers from those pings are going to be faster than a typical consumer, basically only measuring from the neaesr major IX to the DC. [15:46:00] but if they ping each DC, perhaps still good enough, assuming no differences between ISPs, so yeah, maybe not as valuable indeed. [15:46:09] anyway, our dataset is awesome :) [15:46:12] :D [15:46:29] are we planning some kind of anonimised machine-readable publishing of the dataset as well? [15:46:32] I say let's just deploy this and also watch how time to meaningful paint changes in those contries [15:46:38] eventually I hope so yes! [15:46:56] something at https://analytics.wikimedia.org/published/datasets/performance/ that we publish annually or quarterly could be pretty cool for others to use and buld whatever interesting stuff with [15:47:02] yeah totally [15:48:19] RE: Github 90s page - apparently GitHub decidedd to noindex all their /wiki pages due to spam, so this site third party domain proxies it and lets search engines index it anyway [15:48:33] https://github-wiki-see.page/ [15:49:48] it's deliberaly ugly to throw shit at github [15:50:08] and as open culture friendly way to encourage people who find these to link to the original once they found it [15:50:17] interesting way to go about it, works I guess xD [15:53:21] Krinkle> I'm expecting there to be a "Buy this book to get rich" button somewhere -- that gives me $DAYJOB-1 flashbacks. Keynetics was the parent company of both Kount (the thing I spent most of my time on) and Clickbank (the thing that unleashed a lot of long, ugly 'buy my ebook" pitches on the world) [20:43:15] Krinkle: https://phabricator.wikimedia.org/T337431#8936498 some numbers 🎉 [20:55:03] Amir1: nice. I assume these are wikitext edits with JS disabled or in some other way rule out stashing/preparsing race with edit stash? [20:55:49] (that's how I did the excimer profile a few days ago) [21:01:55] In profile `…2e`, SpamBlacklist::filter shows 62ms/49ms. In profile `…23` it shows 193ms/106ms. That suggests an increase. Am I looking at the wrong method? [21:03:31] FilteredActionsHandler::blockedDomainFilter is 2ms before, and 16ms after. That's impressive indeed! [21:04:01] I was about to say, that's for the spamblacklist itself [21:04:42] and most of that 16ms is just loading the list in mwdebug that doesn't have a warm apcu cache [21:05:53] Aye, but do we know why SpamBlacklist::filter increased? [21:05:54] I wonder if the domain should be LTR here https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%98%D9%87:BlockedExternalDomains [21:06:00] let me check [21:06:14] I guess also cold cache because of recent edit? [21:06:26] (recent edit on MediaWiki:Spam-blacklist) [21:07:36] 90ms is just reloading the mediawiki spam blacklist page [21:08:24] that leaves ~100ms/20ms [21:09:31] k [21:36:13] Amir1: I've finished my minor stack meanwhile with a few touch ups. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/930710/ (3 patches) [21:38:47] thanks. I'll review them asap