[08:45:59] <gehel>	 dcausse: there is a discussion with Desiree, Emil and Joseph about Distributed Graphs at 2:30pm today. I'll forward the invite to you as you probably have a lot of valuable input into this!
[08:46:24] <gehel>	 This is very last minute and on a Friday, so feel free to decline!
[08:48:49] <dcausse>	 gehel: won't be able to make it sadly... I have an appointement at the hospital at 2pm for my son
[08:49:09] <gehel>	 Good luck with the hospital!
[08:49:25] <dcausse>	 but curious to know how the discussion goes
[08:50:09] <gehel>	 I'm not exactly sure. I think this is about re-visiting the possibility of having general sharding support in an RDF backend.
[08:50:43] <gehel>	 I vaguely remember that Virtuoso has some sharding support but that it wasn't really usable. Do you remember why?
[08:52:10] <dcausse>	 it can't be "general", I remember discussions with Ahmon about that, I don't remember anything related to virtuoso tho
[09:02:56] <dcausse>	 I think best on the market is Stardog, would be nice to ask Andrea about its sharding capabilities but I think it's all manual
[10:23:39] <dcausse>	 lunch+errand
[14:34:46] <ebernhardson>	 \o
[14:36:53] <dcausse>	 o/
[14:40:29] <ebernhardson>	 dcausse: was able to read the cirrus dumps without executors dieing (in a small test case) by setting `--conf spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=128M --conf spark.executor.memoryOverhead=512m`
[14:41:04] <ebernhardson>	 using spark3 and their new memory metrics, could see that the direct buffer pools where going to 500M+ and then yarn was killing it, probably because by default it only had 384M allocated for memory overhead
[14:41:05] <dcausse>	 ebernhardson: thanks! was able too with 8G + 4G overhead :)
[14:41:37] <ebernhardson>	 yea that works too, but i feel bad using tons of memory :)
[14:42:06] <ebernhardson>	 i'm not sure what the downsides of limiting direct buffer pools is, could probably expand them and the memory overhead at same rate
[14:43:13] <dcausse>	 going to try with your options I definitely feel bad using so much mem :)
[14:44:12] <ebernhardson>	 with spark3 can turn on the new metrics with `--conf spark.metrics.conf.*.sink.console.class="org.apache.spark.metrics.sink.ConsoleSink" --conf spark.metrics.conf.*.sink.console.period=5`. Or even period=1, but then you can't use the pyspark shell because it's constantly emitting stuff :)
[14:44:59] <ebernhardson>	 i suppose i should stuff these in the discovery analytics wikitech page somewhere
[14:46:33] <dcausse>	 22/10/13 16:59:06 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_8_1881 !
[14:46:46] <dcausse>	 just kidding, don't kick me :P
[14:46:51] <ebernhardson>	 lol
[14:59:07] <dcausse>	 it's already spitting way too much info to the console (jupyter result cell) for me, I don't remember it was like that a year ago, perhaps because I use the wmfdata module
[14:59:57] <ebernhardson>	 ahh perhaps, i haven't looked into what wmfdata is doing
[15:21:04] <ebernhardson>	 random unrelated question, does WDQS still end up limiting the edit rate on wikidata.org?
[15:21:15] <gehel>	 I'll be skipping unmeeting (yet again). ERC meeting conflicting...
[15:21:44] <dcausse>	 ebernhardson: it's here: https://grafana-rw.wikimedia.org/d/000000170/wikidata-edits?orgId=1&refresh=1m
[15:22:42] <dcausse>	 the max lag graph, if "wikibase-queryservice" reaches the 10s threshold it slows down bot edits 
[15:23:27] <dcausse>	 seemed to have happened on 16/9
[15:23:27] <ebernhardson>	 dcausse: so looks quite good lately. Mostly i'm trying to prepare some answers for a meeting with selena later, answering the question of what work are we proud of. I think we stopped limiting wikidata edits, but wanted to double check :)_
[15:23:48] <dcausse>	 yes, it's definitely better :)
[15:24:22] <dcausse>	 it still slows down edits but mainly because we don't know how to ignore depooled servers from the various metrics we use
[15:24:56] <dcausse>	 not because of the "updater" itself
[15:25:04] <ebernhardson>	 ahh, ok yea that makes sense
[16:04:40] <gehel>	 And I'm off for vacation! Have fun!
[16:31:29] <addshore>	 Ohia!
[16:31:30] <addshore>	 Anyone remember seeing a failure like this before in CI  ever?
[16:31:31] <addshore>	 1) CirrusSearch\Search\RescoreBuilderTest::testTermBoosts with data set "multiple statements" (0.1, array(array(-2, -7)), array(array(-0.2, array(array('P31=Q1234'))), array(-0.7, array(array('P279=Q345')))))
[16:31:48] <addshore>	 where expected 'weight' => -0.7 but got 'weight' => -0.7000000000000001
[16:40:12] <addshore>	 (Im doing something all the way back on REL1_37
[16:42:12] <moonmoon>	 welcome to floating points
[16:46:23] <addshore>	 :D, indeed, looking at the test it hasn't been patched in master, but also doesnt appear to occour there
[16:46:34] <addshore>	 I might just remove the test from the 1.37 branch
[17:02:01] <ebernhardson>	 addshore: hmm, we have some things that try and limit precision of recorded fixture based testing, 
[17:02:22] <ebernhardson>	 addshore: it was recently changed, in the last month or two, i'm not sure if this particular test uses that though (looking)
[17:02:24] <addshore>	 I can only imagine this issue doesnt exist on master anymore
[17:02:39] <addshore>	 but I made https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842851 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842852 xD
[17:03:02] <addshore>	 Just for REL1_37. If you do manage to figure out an alternative small fix or sometihng to backport I can look at that too!
[17:03:53] <ebernhardson>	 oh! i just realized how old REL1_37 is :P
[17:04:50] <addshore>	 ebernhardson: yeah :P
[17:05:15] <addshore>	 I was gonna wait for one or both of these patches to be green and then just merge the one that seems least evil
[17:05:25] <ebernhardson>	 yea the recent changes probably wouldn't be useful, also this one isn't fixture based rather it's using some direct math
[17:06:27] <addshore>	 im guessing this has something to do with these tests running php 7.4 now too, rather than whatever we were at at the branch cut time
[17:08:14] <ebernhardson>	 addshore: if it has a recent enough assertEquals, there is a $delta argument you could set to 0.01 or something: public static function assertEquals($expected, $actual, string $message = '', float $delta = 0.0, int $maxDepth = 10, bool $canonicalize = false, bool $ignoreCase = false): voi
[17:08:27] <ebernhardson>	 i don't know when that was added
[17:08:52] <ebernhardson>	 i think that applies to array comparisons
[17:10:31] <addshore>	 hmm, phpunit assertequals or somewhere else?
[17:10:35] <ebernhardson>	 phpunit
[17:11:03] <addshore>	 I dont see it in the docs? https://phpunit.readthedocs.io/en/9.5/assertions.html#assertequals
[17:11:10] <addshore>	 oh, assertEqualsWithDelta ?
[17:11:21] <ebernhardson>	 hmm, maybe i have old phpunit :) 
[17:11:42] <addshore>	 I'll throw a patch up using assertEqualsWithDelta and see what happens too
[17:13:23] <ebernhardson>	 yup, that should be the one. There is this: "The optional $delta parameter of assertEquals() is deprecated and will be removed in PHPUnit 9.  Refactor your test to use assertEqualsWithDelta() instead"
[17:14:07] * ebernhardson should apparently update deps
[17:14:25] <addshore>	 :D
[17:14:38] <addshore>	 well, lets see what happens with that one too https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/842853
[17:14:57] <addshore>	 if that one ends up green ill merge it, otherwise fallback to one of the others
[17:15:03] <ebernhardson>	 sounds good
[17:17:29] <addshore>	 wrote a ticket for searchability too incase it crops up again (but i doupt it) xD im working in the past here
[17:17:30] <addshore>	 https://phabricator.wikimedia.org/T320827
[21:15:52] <ebernhardson>	 still don't understand this one .... latency spiked on eqiad again. Similar to before in per-index stats only commonswiki_file is showing outlier behaviour with a big spike in `time taken in cpu seconds per second` from 50 to ~300.  But shard queries per second stays the same
[21:17:36] <ebernhardson>	 somehow using 6x the resources for the same # of shard queries. For comparison, enwiki_content sees around 5k shard qps using 350 cpu sec/sec, commonswiki_file around 2k using normally 50 cpu sec/sec, but currently 300
[21:20:02] <ebernhardson>	 implies issuing similar query rates, but the received queries are much more expensive....may need to dig into the backend logs to understand whats different
[21:24:32] <ebernhardson>	 the last 50 slow log entries in logstash are all for the query "the" :P
[21:25:54] <ebernhardson>	 will have to write something to query the backend logs and see if it's really something as simple as lots of queries for very simple words
[21:27:04] <ebernhardson>	 or if those queries are always commons and happen to hit the slow log once things are stuggling