[00:05:07] ebernhardson: shipping https://gerrit.wikimedia.org/r/c/operations/puppet/+/756724 now [08:42:02] any objection if I release&deploy wdqs (attempt to fix metric issues with the wcqs updater)? [09:27:33] manually deploying the script fixes the metric and the logging issue for wcqs [11:00:19] hi folks, FYI there's CirrusSearchJVMGCOldPoolFlatlined firing on basically all elastic hosts afaics [11:00:27] dcausse: is that the issue you are referring to ? [11:00:45] godog: no, but was looking into this [11:00:49] also the alert email is stuck in moderation of discovery-alerts@ because size is > 40kb [11:00:50] not sure why it loops [11:01:16] can't understand why it's triggered when looking at the graph in prometheus... [11:01:23] the server seems fine... [11:02:49] mmhh I'm wondering if it is because of the new prometheus hosts I provisioned [11:02:57] I'm taking a look too dcausse [11:03:03] also it's strange that IRC only reports 2025 [11:03:27] looking at alerts.wmo I see other others [11:03:31] s/others/hosts [11:03:55] that's expected, alerts are grouped when firing "at the same time" to not spam notifications [11:04:04] ah [11:04:08] ok [11:04:11] though there's the instance name in 'summary' so it looks like only 2025 [11:04:26] ok I see [11:05:15] I was tempted to relax the for: 1m to for: 10m since this alert is looking at the past 24hours of data we don't need much reactivity [11:08:18] ok I've disabled alerting for the new instances, see if that changes anything [11:08:35] ok [11:10:13] yeah that was it [11:10:17] sorry for the noise! [11:10:33] np! thanks for taking care of this! :) [11:11:17] lunch [14:02:24] FYI I'm proceeding with flipping search certs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/756595 [14:09:49] greetings [14:10:01] godog ACK [14:13:21] o/ [14:51:06] ok the cert itself works, but I ran into an issue with relforge and its icinga https check [14:51:28] I'm thinking about a solution [14:52:34] the problem is that I've changed elasticsearch::tlsproxy to use search.svc.$site.wmnet as server_name, which obviously doesn't work for relforge [14:54:06] or at least that's what I think the problem is, still investigating [14:54:20] godog: thanks! [14:54:32] godog interesting, I see the alerts. I'm still a n00b, but let me know. The alert does sound like it's just missing an altname [14:56:41] inflatador: sure! I'll explain the problem out loud [14:57:13] I changed the parameters to elasticsearch::tlsproxy with https://gerrit.wikimedia.org/r/c/operations/puppet/+/756595/5/modules/profile/manifests/elasticsearch/cirrus.pp#68 [14:57:37] though what I didn't realize is that relforge also uses profile::elasticsearch::cirrus [14:57:51] (☞゚ヮ゚)☞ DO IT! [14:58:32] so now relforge hosts try to verify search.svc.$site.wmnet against their certs, which are for relforge.svc.$site.wmnet instead [14:58:41] https://icinga.wikimedia.org/cgi-bin/icinga/config.cgi?type=services&item_name=relforge1003^Elasticsearch+HTTPS+for+relforge-eqiad [15:03:21] need to take a short break, I think sth like https://gerrit.wikimedia.org/r/c/operations/puppet/+/757013 will "fix" it [15:03:31] :eyes [15:06:53] I'm going to be AFK for the next ~90m or so, but I did give the +1 [15:11:39] thank you inflatador ! [15:25:51] dcausse: vagarant up seem to be stuck here https://usercontent.irccloud-cdn.com/file/zepevvuj/Screenshot%202022-01-25%20at%204.24.24%20PM.png [15:27:03] ejoseph: it's downloading I guess? can you check with "ps" and see whether git is downloading something? [15:49:11] \o [15:49:54] * ebernhardson likes `htop`, `iotop` and `iftop` in addition to ps to figure out why something looks stucks. [15:51:08] if only there was stucktop we could use [15:51:54] volans: lol, there actually might be opportunity there :) [15:52:16] is UpdateSearchIndexConfig.php supposed to give a lot of "Index is unknown retrying..." [15:52:23] https://www.irccloud.com/pastebin/vWOIq7fy/ [15:52:58] RhinosF1: looking [15:52:59] RhinosF1: nope, it shouldn't. Are you using elastic 6? [15:53:53] RhinosF1: basically, you aren't getting reasonable responses from your elasticsearch. A request for /_nodes is not returning an array with a `nodes` result. [15:54:01] the Undefined index on ConfigUtils seem weird [15:55:04] ebernhardson: 7 [15:55:12] RhinosF1: you have to use 6 [15:55:24] RhinosF1: it should have failed index creation with a line that says that [15:56:05] i suppose this might not have got far enough, thats part of creating an index [15:57:00] should fail earlier but I guess it could not even understand the elastic version [16:01:07] ebernhardson: weird [16:02:20] 7 is finally on the roadmap, couple months, but today it's locked to 6 [16:05:11] right [16:11:43] how are links to rdf query service UI that include a query constructed? Like if you want to link someone to query results [16:12:34] i'm sure i've seen them but not finding one now :P [16:15:33] oh...do they really use query fragments? That will never work with redirects :( [16:18:22] oh indeed that's how the UI is rebuilt :/ [16:19:12] in theory it should be fine if they already have logged in, the non-interactive redirect loop will keep the fragment. But if they land on a login page at mediawiki that will eat the fragment [16:20:07] at least it will work right for bots using the /sparql/... apis [16:22:06] I think I've seen such behavior on other sites, you click a link from an email and then you have to login but then you have to re-click again, not ideal but not unheard of [16:22:47] right, the reason i wanted to fix it is because other sites do it and it's a bad experience :P [16:22:53] :) [16:22:54] but this will have to be good enough [16:23:38] anyways, last bit to test is user blocking, but otherwise i think wcqs is ready [16:24:10] cool! [16:28:56] dcausse: separate question, should a query_string query for "massacres" against title and title.plain still stem? Related to a ticket where intitle:"Massacres" is returning things without the plural [16:29:47] dcausse: specifically with the quotes, i was expecting cirrus to drop the non-plain fields when providing the query but we don't and it seems to work on my vagrant, but then testing against enwiki i can find results that seem like stemming happened [16:30:02] i guess, what i mean is the easy fix is do it in cirrus. But is query_string query supposed to be doing it? [16:30:16] ebernhardson: I thought it would search for the plain field only but remember a recent ticket where someone complained [16:30:44] dcausse: right i'm looking into that ticket and not understanding whats coming out of cirrus :) On my vagrant it works but on enwiki i can find examples that dont match [16:31:19] ok i can look more, i guess i still don't have a great understanding of what query_string query is going to do. It has a lot of magic built in [16:31:22] strange [16:33:31] there was some changes to the regexes recently, I would have looked at a regression there but if vagrant works as expected that seems to suggest that some query building component are not mixing well together... [16:33:58] intitle is a bit special as it retain the searched term in the main query [16:35:32] maybe i'm being too simple in vagrant, i only made the two pages with plural and non and seeing if they come back. hmm [16:35:49] https://en.wikipedia.org/w/index.php?search=intitle%3A%22test%22&title=Special:Search&profile=advanced&fulltext=1&ns0=1&cirrusDumpQuery [16:36:16] the filter on the title field explicitly add the "title" field [16:36:34] so if the stem is in the redirect or the title it'll find it [16:36:45] but not in the content [16:37:09] dcausse: hmm, wont the must and the filter combine as an AND, it has to match both? [16:37:19] * ebernhardson must be very rusty with elastic :P [16:37:55] no a query_string with fields=[] has to match in either one of these it's like a cross_field IIRC [16:38:05] oh [16:38:33] all words must be there but the field does not mattern [16:39:13] forget about all words it's a phrase [16:39:51] this is where I'm surprised [16:40:14] I thought we would only keep the two plains (title.plain, redirect.plain) here [16:40:20] but no :/ [16:40:28] dcausse: i was also expecting that, but in the code the only time we do that is when we find a ? or * [16:40:32] might be rare enought that we never noticed? [16:41:24] dcausse: hmm, maybe? If we think that's the case i can expand tha handling for ? and * to also do it for quoted values. Was my first thought but i wanted to figure out why it was happening before papering over it [16:41:26] rare as in where stem is only present in the title/redirect and never in the body as plain [16:41:42] yea that makes sense [16:41:53] it feels like a bug to me [16:42:07] so perhaps we can fix and see if others complain and revert? :) [16:42:22] lol, will see what happens [16:54:11] OK, finally back from doctor [17:56:28] * ebernhardson sighs at how much our integration test depends on particular term statistics [17:59:32] :/ [18:00:11] there are some words you don't want to add/remove in the test pages indeed :) [18:01:37] i do wonder how to fix that...I went to talk once for how someone else does this, and they only put 2 or 3 documents in the index at a time to avoid it...seems so time consuming though [18:05:04] yes... but not sure that tests we have on relevancy are that useful [18:05:39] it feels sometimes we just want to make sure that e.g. incoming_links are properly counted [18:05:44] i may have also been blaming the wrong thing, this time around it looks like the edit was in a batch by itself and just didn't make it in the 10s timeout [18:05:58] i suppose i just don't trust the stats :P [18:06:03] :) [18:07:16] whooohoo, got VMWare fusion for ARM on the Mac going, local debian VM is ready. Next step, those docker aliases ebernhardson mentioned yesterday ;) [18:07:24] :) [18:25:02] ejoseph: https://gerrit.wikimedia.org/r/c/search/extra-analysis needs a rebase but otherwize it should be good to go [18:25:09] meh [18:25:24] ejoseph: good link: https://gerrit.wikimedia.org/r/c/search/extra-analysis/+/737026 [18:42:29] quick lunch, back in ~30 [18:48:03] dinner [19:12:54] back [19:15:39] ebernhardson LMK if you have time to teach me some docker tricks today, I couldn't get the zuul test thing to work but that's probably due to my lack of docker skillz [19:16:02] also, this looks like a fun little search engine project: https://search.marginalia.nu/ [19:17:10] inflatador: sure, we can this afternoon. I have a meeting in 10min, but rest of the day i'm available [19:17:45] inflatador: say, 1pm PST (3pm CST) [19:20:13] works for me, I'll send an invite [19:58:36] lunch [20:54:09] back [23:48:41] ahh heres another fun one. wcqs2* have `-Dwdqs.event-sender-filter.event-gate-sparql-query-stream=wcqs-external.sparql-query` in the live command line options, but in kafka we have eqiad.cqs-external.sparql-query and same for codfw :( [23:49:03] codfw and eqiad are probably same, i only checked eqiad since my requests route there [23:49:18] err, i checked codfw....some day i'll learn to type