[06:51:49] o/ dcausse: I started a spreadsheet that gathers msg rates and sizes for Janis: https://docs.google.com/spreadsheets/d/1Fp44MdLxUVlxi03MBD_64m0zQErny-9jUD5C6RGf_bU/edit?usp=sharing Is there any other topic that needs to be added? [07:32:51] pfischer: o/ looking [08:40:23] dcausse, pfischer, ebernhardson (and other Java engineers): I've started a slack thread in #wmf-java about serialVersionUID (based on https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/5/diffs). your inputs would be welcomed. [08:46:11] gehel: 👀 [10:19:41] lunch [10:52:29] lunch + errands [13:40:18] o/ [14:25:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/940160 puppet patch for the VM stuff, please take a look if you can... [14:36:03] nm, got a review already [14:53:29] \o [14:57:20] ebernhardson: if you have a chance today, can you look at Cindy? My tests pass—yay! But other tests are failing, and I'm out of my depth—not sure if my tests are interferring, or if Cindy is just in a bad mood. Gerrit 938329 [14:57:35] Trey314159: sure, indeed cindys error reporting is meh :) [14:58:21] i looked into replacing the reporting with an html generator that we could show results from, but the outputs of those are also terrible ... the problem seems to be that the testing framework (cucumber) is just terrible at reporting [14:58:37] all the tests have names that are like 200+ characters long ... its just meh [14:59:19] Thanks! I read the log, but couldn't see anything to indicate what matched that shouldn't have, and the specific test is unclear to me.. not like the one I was working on and not immediately clear from reading, so I really don't know what happened. [15:00:41] o/ [15:07:13] another, less critical puppet patch if anyone has time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/940180 [15:16:23] thanks dcausse [15:52:33] Does ES score matches in (section) headlines higher than matches in text body? In other words, is it aware of the structure of a wiki page? [15:53:20] pfischer: it knows some structure, we have the headlines broken out into a separate field that get matched and probably have more weight than the text field [15:53:35] the content prior to the first heading (the opening text) is also extracted and given more weight [15:53:44] workout, back in ~40 [15:53:45] but otherwise, we dont know for example what text is in what sectoin [15:56:01] huh, while trying to get a demonstration query i found one where it returns the same page twice :S [15:56:04] https://www.mediawiki.org/w/index.php?search=insource&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns12=1&ns100=1&ns102=1&ns104=1&ns106=1 [15:57:33] oh, i guess it's not the same.. I wonder why we have a Help:CirrusSearch and a Help:CirrusSearch/en [16:14:32] 👀 [16:26:00] Thanks, Erik! [16:41:01] back [17:32:50] working on the WDQS categories lag stuff https://phabricator.wikimedia.org/T342060#9021835 . all hosts seem to be running the categories update services, but lag is still 12 hrs and climbing. I ran `loadCategoriesDaily.sh` on wdqs2013 but it didn't seem to help, any suggestions? [17:33:10] by "all hosts I mean the new hosts, wdqs20[13-22] [17:33:38] inflatador: hmm, sec i'll check how that check works, but it might be that the timestamp is embedded in the dump and not something set to the time of import [17:34:34] ebernhardson FWiW, running the nagios check manually `/usr/local/lib/nagios/plugins/check_categories.py --lag` gives the same lag time [17:36:19] inflatador: ebernhardson: if we only load categories as a single daily dump, wouldn't 0 so the query is `SELECT (min(?date?) as ?mindate) { ?wiki schema:dateModified ?data }`. Neither of the load scripts make any mention of the dateModified triple. [17:36:47] If i grab a random tiny dump, like https://dumps.wikimedia.org/other/categoriesrdf/daily/latest/acewiki-20230720-daily.sparql.gz then embedded in it is a DELETE and INSERT for the dateModified [17:37:05] so basically ,we should expect the date modified to be set to the time the dump was made, so doing a new import and getting 12h of lag should be expected [17:37:45] ryankemper: i guess i was thinking the question was if we should expect the dateModified to be aligned with import or export time [17:38:46] I'm more concerned that the lag keeps climbing, but maybe I'm missing something [17:38:56] it should climb at the same rate that time passes [17:38:59] The lag is going to climb until tomorrow's daily dump [17:39:00] 1hr/hr [17:39:09] There's no real "updater" process with categories it's just a dumb dump load [17:39:20] With wikidata we have an actual updater b/c of the massive volume of events, comparatively speaking [17:39:46] yea categories are static-enough for the purposes here that we didn't invest much in the update process [17:39:48] In that case, do we care about lag < 24h? [17:40:04] inflatador: no, and that's why icinga is happy in that state IIRC [17:40:05] nope, probably accept lag up to 36h to avoid false positives? [17:43:09] Per ryankemper I guess we're already not alerting on that. I was just looking at what needs to change before we can pool the new servers. Looks like all of them except 2021 (which doesn't answer that query) are ready [17:43:22] Anyway, I'm heading to lunch but we can try & pool at today's pairing [17:43:26] kk [17:53:10] pfischer: made some adjustments to https://docs.google.com/spreadsheets/d/1Fp44MdLxUVlxi03MBD_64m0zQErny-9jUD5C6RGf_bU/edit#gid=0 but still needs some tweaks on the scenario with and without page content [17:54:16] ebernhardson: if you have a moment could you double check that the numbers are not totally off (esp. for existing topics) [17:58:14] dcausse: at first look, that all seems around what i would expect [17:59:43] thanks! [18:01:14] dinner [18:26:20] back [18:31:40] Will be at pairing in 2m, rebooting router [18:33:27] ACK [19:32:14] Trey314159: looks like the search for 'PhD' returns 'PageWithAcronyms' but isn't expected to [19:32:51] ebernhardson: That was an early problem that I fixed. [19:33:04] Trey314159: hmm, thats in the most recent fail for the patch [19:33:31] That's weird. I changed the search to "phd" to solve that [19:33:37] the full explain looks like this: https://phabricator.wikimedia.org/P49619 [19:33:46] suggests PhD was split into two terms, ph and d [19:34:05] i've set cindy to run again, but it should be the same commit hash [19:34:49] Trey314159: yup just verifid, this is using the 677e30b commit, which is the latest (PS7) [19:35:09] there are other failures there, sadly those are timeouts waiting for things to be properly initialized which we've never managed to fully resolve [19:35:19] but the PhD one wont be fixed by re-running the same code [19:35:52] But patch 7 has "phd" in the search https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/938329/7/tests/integration/features/acronym_fixer.feature [19:38:07] hmm, indeed it does...thats odd. [19:39:38] well, i guess wait a half hour and see what the re-run does [19:39:45] The diff between patch 6 and 7 has nothing to do with Cindy's test, but in patch 6 Cindy fails with "@clean @api @update @move Moved pages that switch indexes are removed from their old index if they leave a redirect" and in patch 7 Cindy fails with a BUNCH of errors. [19:40:05] yea those are sadly expected, you just have to re-run it [19:40:13] a bunch of errors means it failed to initialize something [19:40:40] usually that means we edited 30 pages, waited a minute and a half, and those pages weren't in the index [19:40:55] Is there a chance something is cached and not cleaned up between runs? THat's the only thing I can think of that explains the PhD/phd problem. [19:41:15] not really, all the containers are destroyed between runs [19:41:25] all that remains is the checked out code and the LocalSettings [19:42:26] So weird [19:44:02] (BTW, thanks for looking into this. More eyes—and in this case better eyes!—are always helpful!) [19:44:38] sometime i wonder if we should make it wait even longer...but it seems inane to already wait so long [19:46:37] Well, how much does waiting 3 minutes instead of 1.5 cost in the grand scheme of things, vs the time we spend wondering why Cindy failed, or knowing why and rerunning it and hoping things work out? I think I just talked myself into voting for upping 1.5 to 5! [19:46:59] :P [19:47:53] An alternate solution is to ignore the cirrussearch update pipeline, since we won't be using it in a few months, and instead dump some content directly into the indexes [19:48:08] but then again, external users will likely continue to use the embedded cirrussearch updates [19:48:53] but that still doesn't quite work, because the mediawiki database has to agree, or we turn on the dev options that ignore the database (potentially moving the tests further and further from being representative) [19:48:55] so...i dunno :P [19:54:47] Trey314159: it's happy now, i did nothing but click the X :p [19:55:08] Ugh... weird... thanks! [19:55:34] I guess there's always time for one more retry. [19:56:24] yea cindy is disapointing in that regard...it's never been able to reliably run all of the initialization every time. It's not really clear why, it could just be that mediawiki performance is degrading over time? i dunno [19:56:30] i would hope not, but who know [19:56:57] it could also be a real race condition of some sort, that we just pretend is a problem with the timing [20:07:57] Is there an easy way to check that the initialization failed and bail early? [20:10:24] in the log it says something like `Failed initializing : @suggest` [20:10:53] as for bailing early, no that's not possible with the way the underlying testing system is designed [20:11:05] each file, according to the framework, is it's own test and knows nothing about the others [20:11:36] i had to hack a rediculous extra process that we communicate with over unix sockets to be able to only initialize tags once instead of repeating it for each test file [20:11:53] * ebernhardson is generally not impressed with cucumberjs : [20:11:55] :P [20:18:59] I agree with your assessment of cucumber... [21:19:30] * ebernhardson wonders why some of the packageFiles in extension.json have a plain filename, and others define some callback and provide the filename to the callback [21:35:28] and the answer is...magic that elides them in certain contexts