[08:55:39] Trey314159: who should we contact if we want to write a blog post about the work that DPE SRE did on Airflow? [08:58:26] dcausse: IIUC, non of the jenkins/ZUUL jobs configured via integration/config does currently deploy snapshots. Should we do that? I still have trouble figuring out how {project}-maven-release jobs get associated with projects (by looking at integration/config). E.g. wikimedia-event-utilities-maven-release exists in jenkins but layout.yaml only associates maven-java8/11 jobs (clean install) with this project but no release. See [08:58:26] https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/refs/heads/master/zuul/layout.yaml [08:59:37] pfischer: did we ever deploy snapshots anywherE? [09:00:06] I don't think I ever relied on snapshots being on archiva [09:00:30] dcausse: apparently no, only the gitlab pipelines do so (deploying to gitlab package registry) [09:00:49] they deploy snapshots on every patch merged? [09:01:00] Yes. [09:01:02] oh ok [09:01:44] But we don’t have to, it’s just a habit. If we do not usually use snapshots, I can omit that. [09:02:46] I don't have strong opinions on this, I'm generally not a big fan of snapshots and it's been decades since I stopped depending on snapshots [09:07:16] personnaly I'm not opposed to them so if you see values in them please go for it [09:20:21] Alright, let’s save some disk space then and ditch deployment of snapshots. As I said it’s just a habit from previous workspaces. [09:46:06] Lunch [09:58:30] lunch [10:27:16] hello dcausse, gehel while we have been trying to find out why searching using mul labels of Wikidata items had apparently been not behaving as expected on beta wikidata (i.e. does not seem to work, https://phabricator.wikimedia.org/T392058), we have discovered that the culprit is our code. We expect that Wikidata is not affected because it didn't have Elastic index re-created since introduction of the [10:27:16] did, and then the search broke. While we look into fixing this, could you please give us Germans a heads up whenever you will plan a re-index of Wikidata's production elastic? It is not to say to not do re-indexing, as it might be needed for reasons, but as we expect it to lead to degraded functionality, knowing in advance would help us deal with it. [10:27:16] All temporary arrangement only [10:32:16] I am told my essay-long message got cut in between with some middle part missing. trying again in chunks, apologies for the noise [10:32:25] hello dcausse, gehel while we have been trying to find out why searching using mul labels of Wikidata items had apparently been not behaving as expected on beta wikidata (i.e. does not seem to work, https://phabricator.wikimedia.org/T392058), we have discovered that the culprit is our code. [10:32:36] We expect that Wikidata is not affected because it didn't have Elastic index re-created since introduction of the problematic logic - Beta did, and then the search broke. [10:32:45] While we look into fixing this, could you please give us Germans a heads up whenever you will plan a re-index of Wikidata's production elastic? [10:32:52] It is not to say to not do re-indexing, as it might be needed for reasons, but as we expect it to lead to degraded functionality, knowing in advance would help us deal with it. [11:54:23] leszek_wmde: we expect to do a full reindex of commons / wikidata as soon as the opensearch migration is completed. This probably means in a couple of weeks. [11:55:23] ebernhardson: I saw T392620. Do we actually use kibana on the relforge cluster? Or should we just drop it? [11:55:24] T392620: Replace kibana with opensearch-dashboards on relforge instances - https://phabricator.wikimedia.org/T392620 [12:01:55] leszek_wmde: looking [12:07:30] leszek_wmde: will reindex beta but do you have a link to the problematic code? [12:09:54] leszek_wmde: reindexing does not seem to introduce the mul code so I suspect there's a config issue on wikidata@beta [12:12:44] nvm had to your message till the end, you haven't fixed the issue yet so that's expected [12:12:50] to read* [12:13:53] going to link T392058 from our re-indexing as I suspect it's this task you'll be using to fix that issue [12:13:53] T392058: Unexpected Behavior: Unable to find items by mul terms - https://phabricator.wikimedia.org/T392058 [12:14:00] dcausse: yes, reindexing actually does break mul support [12:16:58] gehel: noted a couple of weeks. hopefully would be fixed by the nw [12:17:03] *by then [13:14:02] o/ [13:20:21] just merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/838182 (envoy for cirrussearch), so I think that means https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/838270 is next? [13:24:08] inflatador: looking [13:29:27] gehel, blog post processes are always changing, so for the Tech Blog, I'd follow the on-wiki instructions: https://www.mediawiki.org/wiki/Wikimedia_technical_blog_editorial_guidelines [13:48:51] Trey314159: thanks! [14:03:02] gehel: i specifically use the dev console on it from time to time, it's essentially a gui for writing elastic queries. Can run it locally though, thats what i stood up since that wasn't working [15:28:25] * ebernhardson ponders how to decide when exactly we flip the highlighting config for intitle vs insource [15:40:20] or maybe we don't? I guess this already assumes the extra plugin and cirrus highlighter are available, it's just further assumes we have a recent version with no extra config [15:47:12] I'm not sure... since it's automatically detected by the extra plugin not sure how you flip a option flag from php [15:47:34] the query auto-detects, but we have to set `flavor` in the highlighter to lucene_extended or lucene_anchored [15:47:44] it doesn't know about the trigram field [15:48:17] but when will you flip the flavor? [15:48:19] but i think it can just hardcode, intitle always does anchored, insource always does extended, and setup the mapping [15:48:40] basically an abstract method on BaseRegexFeature when it builds the HL fields [15:49:19] we might run with slightly inconsistent highlights the time we reindex no? [15:50:19] hmm, yea during the initial deploy it will be handling the regex's slightly different, but users don't know they can use anchors or char classes so maybe its ok? [15:51:06] essentially reboot cluster with the new plugins, then ship the patch that does mapping and highlight query building, then until the reindex comes around it will be inconsistent [15:52:11] yes I think it's completely fine [15:54:11] workout, back in ~40 [16:17:37] searching all content ns on crossproject searches we somehow have to make the assumption that searching default namespaces on the host wiki means they want default namespaces on the target crossproject [16:18:33] which then means you no longer have the ability to explicitly search only NS_MAIN on cross projects (NS_MAIN being often the default set of ns to search on) [16:18:57] hmm [16:19:09] Special:Search might know that we run a non-advanced search and could possibly pass a flag [16:19:14] the api I'm not so sure [16:19:38] It seems like a reasonable assumption that if you are searching the default namespaces of the current wiki, to do the default namespaces of the remote? [16:19:51] even if you manually selected them somehow [16:20:16] yes I agree, I'll try and ponder a better solution if someone complains [16:21:06] i almost wonder...it kinda feels like the kind of feature that on google or something they might simply not include the secondary results for a non-standard search [16:21:23] but they have a very different concept of secondary results than we do [16:21:34] yes... [16:22:14] and i suppose we still want commons file namespace searches [16:23:37] hmm so of course...load everything into my mw dev env and i'm not getting the matches on extended syntax...probably something simple [16:24:38] searching for files actually work nicely with crossproject searches, you get only local files on target wikis [16:25:07] oh, so i'm just silly. case sensitivity applies. Everything appears to work :) [16:25:17] \o/ [16:26:46] almost embarassing how simple most of this code is, it's spread out but otherwise nothing overly complicated [16:28:19] :) [16:29:16] I was hoping for some crazy automata rewrites/manipulation but no only "simple" string replacements :P [16:29:26] i did look into that...but it was complicated :P [16:31:38] fyi I'm going to attend an opensearch community meetup in ~30mins to see what's in there, haven't received much interest from my unified highlighter PR so far :( [16:32:17] tbf it still does not pass CI but it's because flaky tests :( [16:33:11] Could maybe poke the opensearch channel on relevancy slack? Or i suppose opensearch might have it's own chat thing, but maybe someone on relevancy would be able to push it along [16:33:42] oh right, forgot about this slack workspace, somehow I stopped watching it... :/ [16:34:13] the slack web app sends me there once a week when okta logs me out. It's quite curious, the wiki slack just reloads as the relevancy slack [16:35:38] apparently there is an opensearch slack as well: https://opensearch.org/slack/ [16:53:00] hmm...193 fixtures with the same trigram_anchored change...i wonder if somehow the fixtures could have a "base" fixture and a diff, such that sweeping changes don't touch all the diffs [16:53:49] but i worry getting too fancy with how fixtures and stored and loaded might be overcomplicating things [16:54:39] back [17:01:37] * inflatador joins opensearch Slack [17:04:00] Gotta love Slack...it doesn't want to sign me in on the desktop app and of course the help is useless [17:05:15] lol that opensearch meetup was two hours ago... I should learn how to read dates :P [17:05:28] :) [17:08:56] annd...the solution is to try the same thing 3 or 4 times [17:10:00] oh well, it's a cliché because it works ;P [18:05:44] heading out [18:10:17] .o/ [18:14:20] hmm, the template counting query is not great for big pages :S something like Template:Infobox has >3M usages, and i think this query has visit each to count them [18:14:49] test query takes 1.5-2.5s :S [19:08:47] back from lunch [19:33:24] * ebernhardson isn't sure why he cant trigger glent from the cirrusFallbackProfile url param... [19:57:19] CR to bring on our first eqiad OpenSearch hosts if anyone has time to look https://gerrit.wikimedia.org/r/c/operations/puppet/+/1142636 [20:05:12] ah, thank you for the +1! [20:10:08] https://wiki.eth0.nl/index.php/LackRack this looks pretty cool. Bonus points for shoehorning "Levenshtein distance" into a conversation about furniture [20:17:58] OK, our first eqiad OpenSearch host(cirrussearch1111) is up and receiving shards! [20:21:04] we're getting there! [20:33:46] Yeah, it's happening! Should I make a mwconfig patch to turn CODFW back on? [20:34:14] The envoy patch was merged and I also see https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/838270 , if there is anything else we need to do before turning on CODFW let me know [20:40:17] oh nice, i'll test it out and then schedule deployment for the config [20:45:10] hmm, nope i did something wrong :S ports 620[345] all forward to the same cluster [21:04:42] I've been playing around with MediaWiki logs and noticed there are massive poolcounter queue full error spikes from CirrusSearch [21:04:45] https://logstash.wikimedia.org/goto/353d1472844177dd1f80990b9f7ca76b [21:05:01] sorry if that's unimportant or known, didn't find a relevant task and figured it's worth mentioning [21:05:29] totally worth mentioning! I hadn't noticed [21:08:43] tgr_ same here, we have gotten some alerts for individual hosts with full queues, but it sounds like we need to look into adding or tweaking alerts for the pool counter [21:21:53] if i'm reading this puppet right...when search-chi and search-omega both declare search.discovery.wmnet as upstream, they both get the svc_name `search`, and only the first gets declared [21:22:45] it worked before because search.svc.eqiad.wmnet becomes "${service}_eqiad", but the discovery endpoints don't get the service name prefix [21:25:14] i wonder if the intent was for something like omega/psi/chi to each have their own discovery entry, of if it's an oversight [22:45:18] Hmm, that sounds like a problem re: envoy config . Should we roll back that patch? [22:46:05] I guess it doesn't matter until we set up ATS and discovery [23:02:59] I'm out for the day, but I created T393523 to look at the pool counter issues. [23:03:01] T393523: Investigate CirrusSearch "Pool queue is full" errors - https://phabricator.wikimedia.org/T393523 [23:03:31] We also just added 4 brand-new cirrussearch hosts to eqiad, so hopefully that'll make a difference