[06:28:26] hi, there are some phan issues with Flow, I assume they relate to the ElasticSearch upgrade. The stubs were seemingly out of date, so I tried removing them (since we clone Elastica for phan CI) but that fails to https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Flow/+/831242 [07:14:13] kostajh: thanks for reporting! I assume that dcausse or ebernhardson will be able to have a look today [07:15:38] gehel: sure. I'm confused as to why a bunch of things wouldn't be broken in production based on that change. [07:17:37] btw I built an arm64 image of ElasticSearch 7 and updated the docs in https://gitlab.wikimedia.org/kharlan/wmf-elasticsearch-arm64/ and https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/ElasticSearch [07:18:12] kostajh: thanks! [07:19:25] for Flow I suppose it's not acitivated in prod, I believe it was an attempt to provide better search for message threads but never got done completely IIRC [07:33:42] aha [07:35:11] T78789 is resolved, but unclear what that actually means [07:35:12] T78789: U9. Search indexed Flow data - https://phabricator.wikimedia.org/T78789 [07:50:21] Hi, I’ll be traveling to Montpellier today so I’ll have limited connectiv [07:52:48] pfischer: hey, safe travels! [09:12:10] ejoseph: damn, I got distracted! I'll be there in 3' [09:15:40] ejoseph: I'm there! Sorry [09:24:43] Hi Search team! We discussed before getting an index on outgoing link count; I'd like to understand how much work that would be for you. It's a nice to have feature for us (or rather, the problem it tries to solve is important for us, but despite having done some validation work, we aren't sure if it's actually going to solve that problem) so I want to avoid requesting that's a ton of work for you and then might turn out to be a kind of throwaway thing [09:24:43] on our side. [09:25:54] (The relevant task is T301096.) [09:25:54] T301096: Add a link: prioritize suggestions of underlinked articles - https://phabricator.wikimedia.org/T301096 [09:26:56] That's a question for dcausse / ebernhardson. My limited understanding is that weighted tags might be a good way to store such info. And the data should be available in our main indexing tasks. So some work to add it to our current indexing pipeline, but probably a reasonable complexity. [09:27:37] That being said, our next project is to rework our indexing pipeline, so now might not be the best time to introduce additional changes. [09:35:34] Oh, I see that Erik already replied on the ticket. I'll add this to our triage meeting today, so that we can have some discussion. [09:58:47] Thanks! FWIW we considered using weighted tags, but adding those is a one-time thing so if we added some sort of underlinkedness metric as a tag weight then 1) it would be based on the state of the article at the time of tagging and would not get updated over time as the article changes; 2) changing the metric would require recalculating all tags. It's a viable plan B but less appealing. [10:16:20] tgr_: no objections from me to add a new field like that, sole question perhaps would be to make sure that templates with a high number of links don't mess up with your ratio: byte_size/outgoing_links [10:18:56] note that this number (if using outgoing_links) will be the links to another wiki page (interwiki included) not external links [10:20:39] lunch [10:23:01] Yeah we are using outgoing internal links as a rough metric for an underdeveloped / poor quality article. (Hm, we didn't even consider using the ORES article quality rating, that's probably a lot more robust. But it only works on a small number of wikis so that would only be a small part of the solution.) Good point about templates; will think about it but I think that won't be an issue - if the link ratio is artificially high that means the page won't [10:23:01] be boosted even though it would be a good page for newcomers, which is fine as long as there are enough other good pages that get boosted. Most underdeveloped pages probably don't have navigation templates. [10:25:44] (The context is that patrollers find the link recommendation edits kind of frivolous, and for articles where an experienced editor already went through and linked the important terms, the model often ends up recommending links that aren't really relevant. Our hypothesis is that doing link recommendations for underdeveloped articles where experienced editors did not do much wikifying / formatting / linking yet, recommendations will be more relevant and [10:25:44] the edits will be seen as more useful by patrollers.) [10:26:20] Erik investigated using ores wp10 (article quality) in search ranking but was proven to be not that useful so we don't have it in the indices, does not mean it's not for other usecases tho, but yes lack of coverage makes it difficult to generalize [10:27:29] Oh, I thought all ORES data goes into weighted tags. Is that only done for topics? [10:28:04] tgr_: no it's only ores topics and drafttopic as of now :/ [10:28:37] tgr_: makes sense, from the search POV a new integer to index as doc value should be a no brainer, questions will be more around how to tune your threshold and combine it with other search signals if you need to [10:29:58] Yeah, but that's something we can play around with on our own. [10:44:35] errand [10:56:18] dcausse (and/or others): wcqs100[12] seem to be flapping in icinga: "CRITICAL - degraded: The following units failed: wcqs-updater.service" [10:56:31] I haven't looked into it at all, but something seems fishy! [11:33:29] hi search team, is the ElasticSearch 7 upgrade finished? Based on https://phabricator.wikimedia.org/T314189#8200929, it seems so? [11:34:52] ah, based on T308676, you're on week 3 of the rollout? [11:34:53] T308676: Elasticsearch 7.10.2 rollout plan - https://phabricator.wikimedia.org/T308676 [11:35:02] kostajh: yep, still ongoing [12:03:14] looking at wcqs [12:09:02] seems to be a bug at the RDF level :( [12:09:36] https://commons.wikimedia.org/wiki/Special:EntityData/M69231551.ttl & https://commons.wikimedia.org/wiki/Special:EntityData/M122879987.ttl shares the same sdcs:M122879987-8CE609A7-7191-4190-BA08-A1D1D1753E51 "statement" [12:11:17] will have to relax the consumer to allow that :( [12:14:15] dcausse: thanks! [12:52:51] greetings [12:54:52] o/ [12:55:20] dcausse sounds like you're aware of the WCQS issues, if I can do anything to help LMK [13:00:41] inflatador: we'll have to deploy a new version soon, working on a quick patch [13:00:53] root cause is T317530 [13:00:53] T317530: MediaInfo does seem to allow entities to share same statement IDs - https://phabricator.wikimedia.org/T317530 [13:03:25] ACK, can help deploy or puppet merge if needed [13:08:11] gehel: if you have a couple minutes: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/831538 [13:08:39] In the product / tech offsite. I'll try during the next break [13:15:43] * dcausse realizes he does not have wcqs in his IRC highlights... [14:04:17] dcausse: reviewed. LGTM (except for the CI failure, but that seems unrelate) [14:04:54] gehel: thanks! [14:05:00] yes it's lacking space [14:05:44] will prep a release, this should embark some fixes by Erik related to oauth as well [15:01:32] inflatador: deploying this https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/831591 should fix a problem with wcqs oauth and the updater service systemd alert [15:01:57] triage meeting is starting: https://meet.google.com/eki-rafx-cxi (dcausse, ejoseph, ryankemper) [16:34:35] meh, mjolnir failed last week with NaN's in the bucketizer. I had written a thing to look for NaN's late friday, now reviewing the output and it didn't find any :P [17:32:59] back [19:00:12] dcausse: wcqs deploy's all done. looks like the updaters are happy [19:02:16] nice [19:25:00] We ran into a weird scap issue, releng has requested us to run the wdqs scap deploy again. Should be a no-op, but FYI I am running scap deploy again shortly [20:00:05] ^^ this is done, everything appears to have gone smoothly [21:05:44] meh, things always vary :P hax'd up refinery_drop_mediawiki_snapshots to run without forking it ... expect their snapshot regex is ^snapshot='[0-9]{4}-[0-9]{2}(-[0-9]{2})?'$ and we use snapshot=20220801 [21:05:53] s/expect/except/ [21:12:26] * ebernhardson simply forks it and calls it a day...whatever [21:28:48] (☞゚ヮ゚)☞