[11:13:05] lunch [13:40:51] \o [13:43:08] o/ [13:46:24] I think I miss a dashboard in gitlab like the one I have in gerrit to show incoming/outgoing review requests, just found one review request that's sitting in gitlab for 2+weeks... [13:49:18] there is probably something related to being assigned to MR's? But assigning doesn't feel the same to me [13:51:04] yes actually I was assigned as a reviewer so it should have been obvious to me [13:51:23] but we also tend to just ping in comments [13:52:16] it's partly because there can only be one assignment, but i don't want to just say "this is your problem now". Usually i want to give a few options and let whoever has the time do it [13:52:25] can't assign 2 reviewers sadly yes... [14:17:19] I guess we have https://gitlab.wikimedia.org/groups/repos/search-platform/-/merge_requests that lists all search platform projects, this misses other projects we contribute to but better than nothing [14:32:57] Hi folks! I have a quick question about search ... when searching for templates in VE there are two requests sent to the action api, one with `generator=search` with the search terms plus an asterisk, and one with `generator=prefixsearch` with just the search terms [14:32:57] I can see and understand the prefix query, but I'm not really sure what's going on with the search-terms-plus-asterisk case. Here's an example https://en.wikipedia.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=black%20dog*&cirrusDumpQuery [14:32:57] Is this just a kind of approximation of prefix search using a regex match on title.plain and all.plain for the last search term? [14:36:17] hmm though the results seem to be just effectively a prefix search on the last search term ... [14:36:48] cormacparle: foo* is searching for all terms starting with foo over all the the text fields [14:37:11] the title.plain:dog*^20 is just a small boost to rank higher title matches [14:37:31] unsure why VE also included this fulltext search query tho... [14:38:24] ranking for queries with special syntax is generally not great [14:38:52] yeah it doesn't really seem to add much to just use prefixsearch [14:39:09] I mean - just using prefixsearch seems like it would be just as good [14:40:04] would have to dig into phab/code history to understand why I guess, there might have been a valid reason at some point but I was not aware that it was doing such queries [14:40:45] maybe they also want the search to be able to match on the category of the template? would that make sense? [14:42:06] perhaps? [14:42:23] ok that's grand - thanks for your help David! [14:42:33] I can ask the VE team if I need to find out more [14:43:32] cormacparle: perhaps somewhere in T271802 ? [14:43:33] T271802: Better search - "wildcard word" rather than "title prefix" search, for template search dialogs (Template findability) - https://phabricator.wikimedia.org/T271802 [14:44:47] so they not only want title matches they want matches in the body as well [14:45:46] but since the fulltext api is not by definition a completion they append a * to take into account the fact that the user might not have finished typing the the last word [14:46:43] aha! [14:56:22] I'll be 10' late for retro [15:02:02] retro is starting: https://meet.google.com/eki-rafx-cxi, cc: ebernhardson, ryankemper, inflatador [15:03:36] gehel: working on cookbook stuff, won't be there [15:03:44] ryankemper: ack [15:54:40] dinner [16:22:48] related ticket for user behaviour insights: https://github.com/opensearch-project/OpenSearch/issues/15354 [17:35:51] * ebernhardson feels almost lost without all my bash history from deploy1003...time to copy it over :P [17:37:52] pfischer: unsure if you've seen but there are interesting discussions happening at T374341 (relates to spark <-> event platform) [17:37:53] T374341: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341 [17:39:12] dcausse: did you do anything to create topics in kafka initially? staging fails because cirrussearch.update_pipeline.update.private.rc0 doesn't exist in kafka [17:39:26] i guess i can find and use the kafka-topics thing [17:40:01] or i guess i can maybe deploy the producer first...can try that [17:41:07] ebernhardson: yes sadly it has to see at least one message :/ [17:42:10] hopefully we have access to office wiki to trigger an event if private wiki edits are rare [17:42:23] yea i'm testing that now, seems reasonable [17:43:03] 5 minute window means a bit of waiting, but not the end of the world [17:43:12] yes... [17:43:30] some kafka tooling is only available on the kafka nodes iirc [17:54:59] howdy [17:59:04] not seeing the private source show up :S made edits at :42 and :49 on officewiki, they should have processed by now. More investigation needed [18:04:25] dcausse: pfischer if you can implement a spark kafka sink that uses event platform stuff like the flink stuff does....I will celebrate you and think of you kindly! :D : D [18:08:30] oh, i'm a dummy...the updater can only read from eqiad or codfw, and since we only have staging in eqiad it was reading eqiad...fixing [18:15:36] hmm, that means the eqiad private topic wont be created auto-magically though [18:15:58] i guess i can just produce a dummy event manually [18:23:40] we're completely switched over to CODFW, right? Just wondering...load avg on the WDQS hosts is suspiciously low. Particularly in CODFW [18:24:00] https://grafana.wikimedia.org/goto/wGq2J_RHg?orgId=1 [18:27:41] inflatador: yes, although i'm not 100% sure on how that effects wdqs routing. Internal requests will be codfw, but external i'm not sure [18:38:35] There is a final approval for the deployment of Temp Accounts. We've discussed this before, but just to make sure: we don't have anything to do, we're ready (mostly: we don't care, we don't use identification anyway). Is this correct? [18:40:34] ebernhardson talked it over in #sre and the load avg dip is when they re-activated EQIAD as secondary. So I guess that spike is to be expected [18:41:03] if we assume that most of the traffic goes thru eqiad even when it's secondary, that is [18:53:01] gehel: afaik, that is correct. We simply don't care [19:03:38] inflatador: yes a switchover (every 6 months) we run on a single DC for one week then we re-enable multi-DC, this explains why wdqs@codfw is so low now [19:10:20] I wonder if we could make the kafka sources not fail if the topic does exist... [19:10:50] iirc flink should be able to discover topics/partitions on the fly [19:17:01] probably is some way, it's a bit tedious to require them to already exist but i suppose it would also catch errors [19:59:59] sigh..i keep missing the obvious things. but still not sure whats going on :P producer logs show it added readers for codfw.mediawiki.page_change.v1, but i failed to notice the other reader is eqiad.mediawiki.page_change.private.v1 [20:00:21] but why would it not be applying the same topic prefix filter? [20:00:51] it's not like there is special different code for private streams, it's all in the sources list and processed the same way [20:03:13] oh it's not even that...it's that we don't have officewiki in the staging wiki filter [20:03:27] * ebernhardson has been looking in the wrong directions for at least na hour :P [20:04:26] i guess thats why we take lunch breaks, come back after not looking at a thing and find something different :) [20:06:02] I once solved a problem simply by getting out of my chair, don't underestimate a good break [20:08:28] ya its a good strategy [20:42:52] so now the question is...why doesn't mirrormaker like this :P Docs say appropriately prefixed topics should be mirrored. kafka-main in eqiad has eqiad.cirrussearch.update_pipeline.update.private.rc0 kafka-main in codfw has codfw.cirrussearch.update_pipeline.update.private.rc0. But no mirroring [20:53:51] also didn't make it to jumbo [20:56:29] ottomata: any idea why new topics might not get mirrored by mirrormaker? eqiad.cirrussearch.update_pipeline.update.private.rc0 was created in main-eqiad, but isn't found in jumbo or main-codfw [20:57:07] all i'm finding in puppet is simple regex's which imply it should be fine [20:57:41] was the topic created by producing a message? or manual topic create? [20:58:13] ottomata: producing a message [20:58:21] hm [20:58:31] there is a single message in the topic. Same for a codfw prefixed topic in main-codfw, didn't mirror into eqiad [21:00:28] ottomata: i did cheat slightly, the events were produced with kafkacat instead of with a normal consumer, to ensure they would exist before i start flink (and because otherwise the eqiad. one wouldn't get a message until switchover) [21:01:08] sure shouldn't matter... [21:02:41] the fact that none of the stuff is flwoing in any direction is quite strange [21:03:06] is it possible that a single message somehow isn't enough to flush the producer batch buffer? it shouldn't be, it shoudl be time based as well [21:03:26] these were produced ~30 min ago, should have been long eonugh [21:03:45] linger.ms=1000 [21:03:45] yeah [21:04:34] oh ho... [21:05:23] hm ? [21:05:24] Sep 26 07:59:04 kafka-main2006 systemd[1]: /lib/systemd/system/kafka-mirror-main-eqiad_to_main-codfw@0.service:29: Ignoring unknown escape sequences: "^eqiad\..+" [21:05:39] no that is still man hours ago [21:05:41] many [21:06:00] nevermind [21:08:59] in all mirror makers there looks to have been a consumer rebalance at about 23:30 [21:09:02] for timing, eqiad was created ~20:32, codfw minutes later. I was doubtful it means anything, but in the mirrormaker graphs at the same time "consumer group join time" gets a bunch of new lines that go away [21:09:07] https://grafana-rw.wikimedia.org/d/000000521/kafka-mirrormaker?orgId=1&refresh=5m&var-datasource=codfw+prometheus%2Fops&var-lag_datasource=eqiad+prometheus%2Fops&var-mirror_name=main-eqiad_to_main-codfw&from=1727381343170&to=1727384943170&viewPanel=16 [21:09:18] haha [21:09:23] we are on the same track [21:09:25] :) [21:10:11] i'm not sure! [21:10:24] it felt like too many lines for my one event topics :P [21:10:32] haha indeed! [21:10:36] well, i suppose [21:10:42] mirror maker is a consumer [21:10:49] if a new topic or partition is added to the list of things it is consuming [21:10:54] it will probably trigger a rebalance? [21:11:00] i suppose it's possible [21:11:11] i have to run for the day. ebernhardson if you like you could enable canary events and have stuff auto produced to your topics...buuut that does not solve hte mystery [21:11:37] ottomata: ok i'll check that out, i have to double check that we filter canary events here [21:11:58] they were added to event stream config ~ 2 days ago [21:16:39] oh, these are still rc0 and (i think) provided directly in the jar, they aren't in the schemas repo yet. I suppose i'll add a task to see if they are stable enough to move [22:12:08] Not sure I completely trust this graph yet, but I think I have a way to break down CPU utilization between the categories and main WDQS graph https://grafana.wikimedia.org/goto/o4H23lRHR?orgId=1