[12:48:24] \o [13:17:37] dr0ptp4kt, pfischer: where are we on the validation of the new SPARQL endpoints? It seems a bit late to publish a communication this week, but could we plan on next Monday? [14:14:40] gehel: I have not made any progress on that end, I’am still busy with the weighted tags [14:20:38] ebernhardson: Regarding the rev_based flag of page_change_weighted_tags events: This is a parameter we have to pass through all levels of indirection, which feels wrong since it’s only relevant in case of writing to the stream. So even if we want to hide the decision between job queue vs event bus, we’d still have to make that fact transparent to users of the `CirrusSearch` API. [14:21:14] pfischer: hmm, that does seem awkward [14:22:28] pfischer: random (probably bad) idea, wrap the tags+flag into a model class and pass the model around? [14:24:11] ebernhardson: Hm. That parameter object would be used instead of the current parameters of `CirrusSearch.updateWeightedTags(…)`? [14:25:29] pfischer: plausibly, although also just passing an extra parameter down through isn't terrible. I suppose a model class would only be worthwhile if it could also simplify or make the api less error-prone in some way [14:29:15] ebernhardson: Sure, after all, there aren’t so many users, according to codesearch. Without named parameters in PHP API users would have to specify the default values by themselves, before overriding the last, new parameter $revBased. [14:33:45] afk school run, back in 30 [14:43:17] gehel: I can’t make it to our retro today. [14:43:47] pfischer: ack [14:44:00] reminder: we'll have Karen and Lani as guests in our retro [15:02:03] inflatador: if you want to join our retro - https://meet.google.com/eki-rafx-cxi [16:06:39] workout, back in ~40 [16:59:22] back [18:28:32] lunch, back in ~40 [18:30:10] kicked off rerenders for the problem with indexing wrong redirects (T372446). Pulled a set of pages that should hopefully cover it from the last cirrus dump into hadoop, will see how pushing an extra 413k pages through the topic will work out [18:30:11] T372446: Special:Search intitle search has weird redirect behavior - https://phabricator.wikimedia.org/T372446 [18:30:16] rate limited to 100 rerenders/sec [18:54:14] :( the rerender script needs some retries and reporting built in...died after ~30 min and not much hint of where it was to be able to continue :( [18:54:44] i guess alternateively, could script around the top level. split the input file into a few hundred files that take a minute or two each, instead of a big bulk operation that fails and restarts from begining [19:01:43] back [19:48:54] ebernhardson: did you run this as a backfill job? [19:50:33] pfischer: no, i updated the cirrus-rerender project to read from ndjson files, and generated an ndjson file using spark from the latest index dump [19:51:01] so it produces events to the regular rerender stream [19:57:38] i do wonder a bit why we get so many more broken connections these days, in sup and here. Maybe it's an artifact of the k8s transition but it used to be rare to see things like 'Connection reset by peer' when querying mediawiki from in the DC [20:05:40] meh, so now it failed on a 404 against /w/api.php :S [20:07:25] and for extra funsies, the url is mw-ai-int.discovery.wmnet, the host is in the headers and not reported in the exception message. [20:07:34] mw-api-int.discovery.wmnet even [20:07:56] i guess i'll split this to a file per wiki just to make retries more reasonable than redoing the whole thing.. [22:29:54] interesting...https://www.elastic.co/blog/elasticsearch-is-open-source-again [22:29:57] (AGPL) [22:38:28] lol [22:43:27] :sad trombone: Shay is being pretty fancy there in claiming that he was always FOSS. The AGPL was sitting right there when he picked the SSPL as his way to try and fight Amazon. [22:44:17] "the market confusion their offering was causing" is working pretty hard to stand in for "we were not selling as many SaaS subscriptions as we wanted" [22:47:22] I don't see this blog explaining how the licenses will be reconciled. Would our use case in WMCS still fall under the SSPL and thus non-Libre/non-OSI or can we host a SaaS elastic cluster under AGPL?