[10:02:54] Errand + early lunch [10:54:04] lunch [13:17:44] search/cirrus-streaming-updater tests are failing, I guess cause they depend on an external state: [13:17:45] search/cirrus-streaming-updater [13:17:52] org.wikimedia.eventutilities.core.util.ResourceLoadingException: Failed loading resource. (resource: https://schema.wikimedia.org/repositories/primary/jsonschema/development/mediawiki/page/change/latest) [13:18:14] with changes made to the json schema not being gated to ensure downstream users pass maybe [13:18:37] or maybe it is all non sense :) [13:19:01] hashar: this one has moved to gitlab and probably forgots to be archived [13:21:15] :-( [13:21:45] I will archive it [13:22:32] thanks and sorry about that :/ [13:27:46] you'd want to update the parent pom similar to https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/975805 [13:28:10] 1.69 update the maven-javadoc-plugin to let it find `javadoc` when running under Java 9+ (ie under Java 11) [13:29:32] hashar: thank for the heads up, it's good practice to upgrade the parent whenever possible [13:39:29] o/ [13:57:07] Data reload finished for wikidata, it's on lexemes now [13:57:12] Wikidata dump loaded in 25 days, 13:32:17.263762 [13:57:56] \o/ Finally! And it is a success! [13:59:02] nice [13:59:07] 25days wow [14:00:36] last import (2023-02-02) was ~14days [14:01:06] At least it didn't fail a bunch of times like the last import [14:05:43] inflatador: could you update T241128 once done on wdqs1022? [14:05:45] T241128: EPIC: Reduce the time needed to do the initial WDQS import - https://phabricator.wikimedia.org/T241128 [14:06:33] maybe with a short note pointing to T336443 to possibly explain why the time is worse than previous attemps [14:06:34] T336443: Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 [14:08:53] but a bit concerned to achieve a reload of the splits < 10days with this host [14:09:04] dcausse ACK, will do [14:25:37] dcausse: no worries. The Gerrit > GitLab migrations are a bit inconsistents :) [14:36:08] hmm, this makes me think we should try enabling the performance CPU governor [14:38:41] Erik found some weirdness on disk related metrics at T336443#8845469 [14:38:42] T336443: Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 [14:39:09] but yes this perf degradation might be a risk for the project [14:39:37] I don't think it is the root cause, but it might help performance [14:40:36] do we have an idea of the root cause? can clock frequency explain a 40% perf decrease? [14:41:42] That's what we concluded last time, since the old hosts were also using the powersave governor [14:42:11] Hard to say without running another reload using the governor, so I think I'll try that [14:42:22] Probably use 1024 since it's pretty far behind anyway [15:54:55] back [15:55:25] looks like lexemes finished. The cookbook crashed but it I don't think it hurt anything [16:41:39] Hi, all. I was just running UpdateSearchIndexConfig.php for one of my wikis that uses Amazon's OpenSearch Elasticsearch engine and got an error when it tried deleting an index while an automated snapshot was running. The error output such an Elasticsearch error is always a CirrusSearch bug, but I would think in this case it's not really, but I [16:41:39] wanted to ask to be sure. https://gist.github.com/justinclloyd/c1189262c07f0f55629b8a434b70601a [16:54:18] justinl: hmm, the "Always a cirrussearch bug" basically means that its some condition we didn't program any particular handling for. [16:54:49] justinl: so in this case, indeed we don't have anything that handles the index not being deleted as it was expected to [16:55:33] Cool, thanks for the clarification. I just restarted the script and it should be fine, just bad timing that the automated snapshot started during that run. [16:56:35] yea, seems like one of those things that happens. I'm not sure what the fix would be, i suppose it could wait around until it is able to delete the thing. [16:57:44] dcausse: i don't quite follow, whats the array bit in your CR? Is it about parsing a yaml with array as the top level structure? [16:57:57] (instead of a map) [16:59:24] * ebernhardson is separately surprised to find that "y" and "n" are the canonical yaml boolean values, true/false are aliases [17:02:04] ebernhardson: sorry my bad I misread the error message, I thought the error message meant "map values unsupported" and was expecting "map/array values unsupported" [17:03:11] dcausse: ahh, ok i can try and make the text a bit clearer. It's supposed to be saying that yaml loaded some unsupported type (nested map/list mostly) [17:04:12] not sure it's needed, mostly me reading this too quickly [17:04:58] in theory we could make a custom Constructor implementation (like SafeConstructor) that fails to parse...but this seemed easier :) [17:07:29] what you have here is totally fine imo :) [17:08:00] have the boolean condition added, i had forgot about those. just running the test suite and should be ready to go [17:16:02] WDQS HW performance ticket at https://phabricator.wikimedia.org/T351662 . Does anyone have any ideas for benchmarking triples ingestion, short of running a full data reload? [17:16:50] inflatador: mostly just that. If we think the problem is a particular thing (in this case io capacity), it might also be reasonable to directly measure that and see if there is a difference [17:18:10] I've used fio in the past...never quite clear about the right config though. Will ask SREs [17:22:35] https://wikitech.wikimedia.org/wiki/Kafka/Kafka-main-raid-performance-testing-2019 might be useful [17:23:46] inflatador: seems reasonable, and doesn't look too hard to run a test [18:10:07] meh...Caused by: java.lang.IllegalStateException: weighted_tags already set [18:19:06] :/ [18:20:45] best guess is we are merging twice, but havn't yet figured out where [18:22:46] curiously, and not sure if this is relevant, but the error comes from fetch failure routing [18:23:33] * ebernhardson wonders if it failed regular encoding, got sent to fetch failure, then failed encoding a second time [18:41:22] No entries in the data reload cookbook logs since Oct 16th...hmm [19:03:32] so basically, above was correct. The fetch failed with badrevision, looks to have been deleted (rev_id now found in archive table). But the fetch failure encoding failed because weighted_tags was loaded into the weightedTags and fields properties [19:04:34] (which it then interpreted as being loaded from separate places needing merging, and we don't merge weighted_tags that late in the pipeline) [19:23:33] lunch, back in ~40 [19:27:41] * ebernhardson finds assert methods like 'isNotExactlyInstanceOf` somewhat amusing [19:54:54] ryankemper: oops, I killed that tab a bit too fast. Enjoy the rest of the day! [19:55:07] gehel: :P likewise! [20:24:28] heh, the transcoder test passes a .equals() test, but only because the encoding process mutated an underlying data structure so we aren't comparing against the initially loaded event [20:24:37] the round trip [22:01:13] happy Thanksgiving...see y'all in a week!