[08:45:59] dcausse: there is a discussion with Desiree, Emil and Joseph about Distributed Graphs at 2:30pm today. I'll forward the invite to you as you probably have a lot of valuable input into this! [08:46:24] This is very last minute and on a Friday, so feel free to decline! [08:48:49] gehel: won't be able to make it sadly... I have an appointement at the hospital at 2pm for my son [08:49:09] Good luck with the hospital! [08:49:25] but curious to know how the discussion goes [08:50:09] I'm not exactly sure. I think this is about re-visiting the possibility of having general sharding support in an RDF backend. [08:50:43] I vaguely remember that Virtuoso has some sharding support but that it wasn't really usable. Do you remember why? [08:52:10] it can't be "general", I remember discussions with Ahmon about that, I don't remember anything related to virtuoso tho [09:02:56] I think best on the market is Stardog, would be nice to ask Andrea about its sharding capabilities but I think it's all manual [10:23:39] lunch+errand [14:34:46] \o [14:36:53] o/ [14:40:29] dcausse: was able to read the cirrus dumps without executors dieing (in a small test case) by setting `--conf spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=128M --conf spark.executor.memoryOverhead=512m` [14:41:04] using spark3 and their new memory metrics, could see that the direct buffer pools where going to 500M+ and then yarn was killing it, probably because by default it only had 384M allocated for memory overhead [14:41:05] ebernhardson: thanks! was able too with 8G + 4G overhead :) [14:41:37] yea that works too, but i feel bad using tons of memory :) [14:42:06] i'm not sure what the downsides of limiting direct buffer pools is, could probably expand them and the memory overhead at same rate [14:43:13] going to try with your options I definitely feel bad using so much mem :) [14:44:12] with spark3 can turn on the new metrics with `--conf spark.metrics.conf.*.sink.console.class="org.apache.spark.metrics.sink.ConsoleSink" --conf spark.metrics.conf.*.sink.console.period=5`. Or even period=1, but then you can't use the pyspark shell because it's constantly emitting stuff :) [14:44:59] i suppose i should stuff these in the discovery analytics wikitech page somewhere [14:46:33] 22/10/13 16:59:06 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_8_1881 ! [14:46:46] just kidding, don't kick me :P [14:46:51] lol [14:59:07] it's already spitting way too much info to the console (jupyter result cell) for me, I don't remember it was like that a year ago, perhaps because I use the wmfdata module [14:59:57] ahh perhaps, i haven't looked into what wmfdata is doing [15:21:04] random unrelated question, does WDQS still end up limiting the edit rate on wikidata.org? [15:21:15] I'll be skipping unmeeting (yet again). ERC meeting conflicting... [15:21:44] ebernhardson: it's here: https://grafana-rw.wikimedia.org/d/000000170/wikidata-edits?orgId=1&refresh=1m [15:22:42] the max lag graph, if "wikibase-queryservice" reaches the 10s threshold it slows down bot edits [15:23:27] seemed to have happened on 16/9 [15:23:27] dcausse: so looks quite good lately. Mostly i'm trying to prepare some answers for a meeting with selena later, answering the question of what work are we proud of. I think we stopped limiting wikidata edits, but wanted to double check :)_ [15:23:48] yes, it's definitely better :) [15:24:22] it still slows down edits but mainly because we don't know how to ignore depooled servers from the various metrics we use [15:24:56] not because of the "updater" itself [15:25:04] ahh, ok yea that makes sense [16:04:40] And I'm off for vacation! Have fun! [16:31:29] Ohia! [16:31:30] Anyone remember seeing a failure like this before in CI ever? [16:31:31] 1) CirrusSearch\Search\RescoreBuilderTest::testTermBoosts with data set "multiple statements" (0.1, array(array(-2, -7)), array(array(-0.2, array(array('P31=Q1234'))), array(-0.7, array(array('P279=Q345'))))) [16:31:48] where expected 'weight' => -0.7 but got 'weight' => -0.7000000000000001 [16:40:12] (Im doing something all the way back on REL1_37 [16:42:12] welcome to floating points [16:46:23] :D, indeed, looking at the test it hasn't been patched in master, but also doesnt appear to occour there [16:46:34] I might just remove the test from the 1.37 branch [17:02:01] addshore: hmm, we have some things that try and limit precision of recorded fixture based testing, [17:02:22] addshore: it was recently changed, in the last month or two, i'm not sure if this particular test uses that though (looking) [17:02:24] I can only imagine this issue doesnt exist on master anymore [17:02:39] but I made https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842851 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842852 xD [17:03:02] Just for REL1_37. If you do manage to figure out an alternative small fix or sometihng to backport I can look at that too! [17:03:53] oh! i just realized how old REL1_37 is :P [17:04:50] ebernhardson: yeah :P [17:05:15] I was gonna wait for one or both of these patches to be green and then just merge the one that seems least evil [17:05:25] yea the recent changes probably wouldn't be useful, also this one isn't fixture based rather it's using some direct math [17:06:27] im guessing this has something to do with these tests running php 7.4 now too, rather than whatever we were at at the branch cut time [17:08:14] addshore: if it has a recent enough assertEquals, there is a $delta argument you could set to 0.01 or something: public static function assertEquals($expected, $actual, string $message = '', float $delta = 0.0, int $maxDepth = 10, bool $canonicalize = false, bool $ignoreCase = false): voi [17:08:27] i don't know when that was added [17:08:52] i think that applies to array comparisons [17:10:31] hmm, phpunit assertequals or somewhere else? [17:10:35] phpunit [17:11:03] I dont see it in the docs? https://phpunit.readthedocs.io/en/9.5/assertions.html#assertequals [17:11:10] oh, assertEqualsWithDelta ? [17:11:21] hmm, maybe i have old phpunit :) [17:11:42] I'll throw a patch up using assertEqualsWithDelta and see what happens too [17:13:23] yup, that should be the one. There is this: "The optional $delta parameter of assertEquals() is deprecated and will be removed in PHPUnit 9. Refactor your test to use assertEqualsWithDelta() instead" [17:14:07] * ebernhardson should apparently update deps [17:14:25] :D [17:14:38] well, lets see what happens with that one too https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842853 [17:14:57] if that one ends up green ill merge it, otherwise fallback to one of the others [17:15:03] sounds good [17:17:29] wrote a ticket for searchability too incase it crops up again (but i doupt it) xD im working in the past here [17:17:30] https://phabricator.wikimedia.org/T320827 [21:15:52] still don't understand this one .... latency spiked on eqiad again. Similar to before in per-index stats only commonswiki_file is showing outlier behaviour with a big spike in `time taken in cpu seconds per second` from 50 to ~300. But shard queries per second stays the same [21:17:36] somehow using 6x the resources for the same # of shard queries. For comparison, enwiki_content sees around 5k shard qps using 350 cpu sec/sec, commonswiki_file around 2k using normally 50 cpu sec/sec, but currently 300 [21:20:02] implies issuing similar query rates, but the received queries are much more expensive....may need to dig into the backend logs to understand whats different [21:24:32] the last 50 slow log entries in logstash are all for the query "the" :P [21:25:54] will have to write something to query the backend logs and see if it's really something as simple as lots of queries for very simple words [21:27:04] or if those queries are always commons and happen to hit the slow log once things are stuggling