[10:42:09] errand+lunch [13:57:55] * pfischer just learned that the SUP’s HTTP client does not request compression [14:06:13] o/ [14:06:48] hm... don't remember setting accept-encoding explicitly in the past so might be the cases in other places too [14:06:50] o/ [14:08:18] I wanted to introduce a metric for response sizes and when implementing the test, I noticed that my gzipped payload is not decode properly. Might be an issue with how I instruct wirmock to send the response… 👀 [14:10:44] hm... I thought that gzip encoding would be pretty much standard and well supported in wiremock [14:11:03] but curious to see what we do in other places like the wdqs updater [14:11:48] By using aResponse().withBodyFile() I do not see any headers like “content-type” nor “content-encoding” [14:14:55] hm... wdqs does use two different http clients (apache and jetty)... [14:19:31] the sync apache http client used by wdqs does have a "disableContentCompression()" method which suggests that it's on by default, will double check actually running it [14:24:59] dcausse is it OK to restart blazegraph on the graph split hosts? I just merged your patch re: throttling [14:25:19] inflatador: thanks! yes please go ahead :) [14:29:13] dcausse ACK, just restarted bg on the test hosts [14:31:47] thx! [14:44:07] yes for the apache http client 4 accept-encoding: gzip,deflate is added transparently, the pending wiremock config/mappings do not reference anything regarding compression so might happen completely transparently [14:44:57] According to the wire mock docs, gzip has to be disabled explicitly and we do not do that. [14:45:53] The apache HTTP async client 5 (HC5) does not do anything automagically about encoding. [14:54:42] Can confirm that wire mock picks up a requested content-encoding correctly. Our client simply does not request it [14:54:45] sigh... til https://issues.apache.org/jira/browse/HTTPCLIENT-1822 [14:57:57] perhaps envoy can take care of this? might not be terribly hard to implement something on top of hc5 tho [15:03:10] No, for sure, I’m already on it [15:48:14] errand [16:00:30] \o [16:13:10] o/ [17:03:25] :( [17:05:12] done some poking arround, we can probably just lie there and then add a signature verification. It perhaps isn't the best idea ever, but should work [17:06:38] not sure though...wonder if we would have to go through some sort of external verification [17:09:47] * ebernhardson looks at https://meta.wikimedia.org/wiki/Special:WikiSets/14 and wonders if perhaps there are more wikis opted-out of global bots than remain [17:21:13] workout, back in ~40 [17:25:13] There's a discussion on the enwiki village pump about regexes and templates. It's an interesting power user use case, and I've just added some discussion and hints for less brutal queries (they called regex searching "kill-the-wiki", which isn't wrong!). Feel free to add/correct anything. h/t to quiddity for the heads up! https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Pages_with_most_transclusions_of_a_template [17:27:56] regex searching wont kill the wiki :P Sure it's slow, takes up compute, but the search cluster is up to ~2300 cores and we haven't increased the regex limits since it was ~750. We gate regex's at 10 total in parallel across the cluster, even on the largest wikis that will only add up to ~300 threads used [17:28:28] it will take forever, and then say sorry you don't get more results but there might be more results :) [17:29:07] it does seem like there is an idea in there somewhere, that i've seen a few times, to index more information about the parsing process and make it queryable [17:31:18] their regex does look a bit crazy though :) [17:34:06] I guess that if we indexed templates with a manual frequency a la weighted_tags this would solve their problem [17:35:25] can we range query the weighted tags? [17:35:40] oh, there is some custom query iirc [17:36:39] yea, the term freq token filter [17:36:40] if we did not add it it should be pretty straightforward [17:37:01] err, term freq filter query :) [17:37:41] you could get even score and have a kind of sort for free [17:37:45] lol, I know it won't literally kill the wiki, but having a query time out is a new experience for non-regex searchers. [17:38:01] might be something to consider, as long as the parser can tell us about all translusions it shouldn't be too hard to wire it all through [17:39:28] yes, btw I wonder if our stuff is compatible with parsoid and if not when will be the time for us to switch [17:41:14] oh, indeed. I suppose i haven't seen much on that lately but they must be making progress re-integrated that [18:07:51] back [18:23:11] dr0ptp4kt: the queries from Andrew are at stat1005:~andrewtavis-wmde/tasks/T349512_representative_wikidata_query_sample/blazegraph_queries_sample_2023_12_21.csv [18:24:01] and found a bug in the QueryRecorder, "java.lang.IllegalArgumentException: Comparison method violates its general contract", seems like I don't know how to write a comparator :( [18:24:38] dinner [18:26:25] * ebernhardson toys arround with implementing a custom SessionProvider that works off configuration. In theory it might be reasonably simple... [18:27:21] in a quick review it looks like all but one method is a noop, we only need to implement SessionProvider::provideSessionInfo. Maybe [18:27:41] (noop for our use case of no persisting sessions, no remembering across requests, etc.) [19:08:58] lunch, back in time for pairing [19:25:37] back [19:50:30] hmm, this seems to work. custom session provider is registered and action=query&meta=userinfo returns the expected user when the header is provided from a configured network range [20:40:37] heads up ryankemper i'll set some time to talk varnish and lvs and stuff for the wdqs test hosts (i see you're active on https://phabricator.wikimedia.org/T351650 and related tasks) - mainly to ensure i'm up to date and to see if there's any place support may be needed