[09:57:27] <zpapierski>	 dcausse: I'm not sure how to properly describe my scenario in cucumber - what are those annotations at the beginning of each .feature file?
[09:59:07] <dcausse>	 zpapierski: they're like before/after junit init/cleanup steps
[09:59:23] <dcausse>	 they're defined in support/hooks.js
[09:59:43] <dcausse>	 so for you there'll likely be a hook feeding the data
[10:00:52] <dcausse>	 e.g. @suggest, it creates a bunch of pages and then call the api cirrus-suggest-index
[10:02:42] <dcausse>	 well you perhaps don't need that actually since the data you need does not rely on the wiki database... simply initializing all you need some scripts un tests/jenkins (most likely resetMwv.sh) might be enough
[10:03:48] <dcausse>	 s/all you need some scripts un/all you need is some scripts under/
[10:04:26] <dcausse>	 so your scenario might just assume that the data is there
[10:05:33] <dcausse>	 could be as simple as "When I query completion search for f then foo is the first api result"
[10:07:08] <dcausse>	 you'll have then to define support function for this in e.g. step_definitions/search_steps.js
[10:08:05] <dcausse>	 the tricky part is find a "sentence" that won't be in conflict with existing regexes in other step definitions
[10:18:44] <zpapierski>	 I already defined the steps themselves, this far I understood, I just missed that hook thing, thanks
[10:18:56] <zpapierski>	 I think they're not conflicting
[10:22:27] <dcausse>	 lunch
[11:53:26] <zpapierski>	 meal 3 break
[13:22:45] <zpapierski>	 dcausse: how can I force cindy run? it didn't run on my latest change?
[13:23:02] <zpapierski>	 (or didn't report, not sure)
[13:29:20] <dcausse>	 zpapierski: I'm debugging cindy at the moment
[13:29:50] <dcausse>	 sometimes if the build is broken badly cindy won't even run
[13:29:57] <zpapierski>	 ah, ok :)
[13:30:05] <zpapierski>	 that wouldn't be surprinsing
[13:30:14] <zpapierski>	 s/surprinsing/surprising
[13:30:43] <dcausse>	 might happen if the maint scripts to initialize the env are broken for instance
[13:31:28] <zpapierski>	 huh, can't log into cindy host
[13:31:57] <dcausse>	 cirrus-integ.eqiad.wmflabs
[13:32:13] <zpapierski>	 "Connection closed by UNKNOWN port 65535" - familiar?
[13:32:23] <dcausse>	 zpapierski: you should rebase your path it's unlikely to work from its current base
[13:32:28] <dcausse>	 s/path/patch
[13:32:35] <zpapierski>	 ok
[13:33:51] <dcausse>	 "Connection closed by UNKNOWN port 65535" what is this ssh?
[13:33:56] <zpapierski>	 yep
[13:34:26] <dcausse>	 ssh config issues, bastion for wmflabs might not be setup properly?
[13:34:37] <zpapierski>	 I'm debugging my ssh connection
[13:35:09] <zpapierski>	 but I'm logging into other wmcs instances without issue
[13:36:04] <dcausse>	 I think I see this error when I wait for too long before entering my passphrase into the ssh-agent
[13:36:39] <dcausse>	 but maybe not...
[13:38:48] <dcausse>	 Warning: sizeof(): Parameter must be an array or an object that implements Countable in /vagrant/mediawiki/extensions/CirrusSearch/maintenance/UpdateQueryCompletionIndex.php on line 63
[13:38:52] <dcausse>	 zpapierski: ^
[13:38:57] <zpapierski>	 huh
[13:39:01] <zpapierski>	 thx
[13:39:58] <dcausse>	 cindy is running let see if it reports something (sadly it won't report such errors but only the failed tests)
[13:42:05] <zpapierski>	 I messed up with the script, don't know yet why (copied from a working source)
[13:42:29] <zpapierski>	 hmm, I probably shouldn't do that per wiki
[13:44:22] <zpapierski>	 ok, there must be a reason why the rest of resources are provided from the root
[13:44:25] <zpapierski>	 will do the same
[13:55:40] <zpapierski>	 ok, let's see if that helps
[14:13:12] <dcausse>	 zpapierski: https://www.mediawiki.org/wiki/JetBrains_IDEs#MediaWiki_code_style might help, MW code style loves spaces
[14:13:48] <zpapierski>	 ahh, thanks for that - I started to change them manually
[14:22:49] <dcausse>	 zpapierski, ebernhardson1 T259674 and T258738 are ready for dev but assigned to you is this something you plan to work on soon? happy to help and tackle one of these
[14:22:50] <stashbot>	 T259674: Ship query completion indices from analytics to prod clusters - https://phabricator.wikimedia.org/T259674
[14:22:50] <stashbot>	 T258738: Build query-clicks dataset from SearchSatisfaction logging - https://phabricator.wikimedia.org/T258738
[14:24:25] <zpapierski>	 I think that would be awesome
[14:24:47] <zpapierski>	 we should have everything we need for that
[14:24:56] <zpapierski>	 need to relocate, be back for triaging
[14:33:56] <dcausse>	 break
[15:50:34] * ebernhardson1 will someday figure out why his laptop freezes for ~2 sec whenever starting the hadoop integration environment
[16:24:09] <dcausse>	 dinner
[16:49:50] <Trey314159>	 ryankemper: is now an okay time to do some reindexing?
[16:50:08] <Trey314159>	 it may run for the whole week...
[16:50:19] <ryankemper>	 Trey314159: fire away
[16:50:28] <Trey314159>	 Cool! Thanks!
[18:48:30] <ebernhardson>	 heh, started reindexing and cloudelastic started complaining :) I wouldn't worry about it yet though
[21:50:13] <hare>	 Is the Wikidata Query Service implemented with sharded Blazegraph servers? If not, is there a reason why that wasn't pursued?
[22:54:01] <bd808>	 hare: what would the partition function be for sharding wikidata?
[22:54:33] <hare>	 I have not the slightest idea
[22:56:32] <bd808>	 classically "sharding" means separating data by some dimension into separate buckets.
[22:57:09] <bd808>	 for a multi-tenant app, "customer" is a classic shard discriminator
[22:57:39] <bd808>	 but for a thing like wikidata it could be tricky to find a dimension to cut on
[22:57:44] <hare>	 I think Wikidata Query Service is single-tenant?
[22:57:46] <hare>	 Right
[22:58:03] <hare>	 the different parts of a triple (subject, property, object) might be one way
[22:58:17] <bd808>	 unless I guess blazegraph has some map/reduce distributed functionality
[22:58:27] <hare>	 That's what I assumed
[22:59:56] <bd808>	 I haven't been deep into blazegraph by my recollection from ~5 years ago was that it did not have any distributed query system. I remember it being a "scale up" app, not a "scale out" app
[23:01:03] <hare>	 If that's how it was five years ago that's probably how it is now
[23:01:11] <hare>	 I'm considering another approach that uses Redis
[23:01:40] <hare>	 I am pretty sure Redis supports sharding, but I don't know if it's as turnkey as I would want it to be
[23:02:11] <bd808>	 T206560 is probably relevant generally to blazegraph things
[23:02:11] <stashbot>	 T206560: [Epic] Evaluate alternatives to Blazegraph - https://phabricator.wikimedia.org/T206560
[23:02:14] <hare>	 The challenge is supporting a query service at the scale of Wikidata but without ultra-large servers
[23:03:07] <hare>	 I really hope they find something; Blazegraph has been in a very sorry state since Amazon poached it
[23:03:25] <hare>	 It really does feel like interacting with a black box
[23:03:35] <bd808>	 "the scale of Wikidata but without ultra-large servers" so an NP-hard problem?
[23:05:10] <hare>	 Perhaps...
[23:06:47] <hare>	 There's the indexing strategy, then there's knowing which shard the relevant data is in...
[23:06:58] <bd808>	 I really don't know of any useful graph database that doesn't hold the graph in ram in some way to allow full traversal
[23:07:36] <bd808>	 so you can have 100 small servers or one huge server, but ultimately you need it all in ram pretty much all the time
[23:08:31] <bd808>	 and 100 small servers adds in the need for query coordination servers to aggregate the partial results from the shards
[23:09:13] <hare>	 The main benefit would be if your dataset/index occupies more RAM than can fit on a machine
[23:10:11] <hare>	 Facebook also uses plain MySQL/memcache for their graphs https://engineering.fb.com/2013/06/25/core-data/tao-the-power-of-the-graph/
[23:10:40] <bd808>	 facebook also has ~40 engineers who write custom storage engines for mysql :)
[23:10:59] <ebernhardson>	 i also build a copy of some facebook graph tech. Scales horizontally on elasticsearch servers, but has significant drawbacks compared to SPARQL: https://wikitech.wikimedia.org/wiki/Tool:Wikibase_Unicorn
[23:11:20] <ebernhardson>	 (minimal, POC copy :P)
[23:12:43] * ryankemper takes a note to check that out later, sounds neat
[23:14:39] <ryankemper>	 hare: but yup it is definitely like interacting w/ a blackbox. We've basically accepted we have to move off blazegraph long-term on the team, it's just a question of exactly when that happens and what we move to. to your point, it is basically abandonware as far as amazon is concerned
[23:16:40] <ryankemper>	 I don't know enough about the internals to really say if there's a way to partition it out like you're saying but I think bryan's point about needing to store a bunch of stuff in RAM for efficient traversal is definitely the case
[23:17:05] <ryankemper>	  so from an operational perspective it's definitely a scale-up-not-out service and given that we know we're not going to stay on it forever, trying to figure out ways to make it more horizontally scalable probably wouldn't have a great return on investment
[23:17:26] <ryankemper>	 (although arguably if there were a way to magically make it capable of scaling like how elasticsearch was that presumably obviates the need to actually move off it...but I digress)
[23:18:16] <ebernhardson>	 from my prior research, the problem isn't just storing it in ram but communictaion between steps in the database. To have a fast graph database it needs to lookup in memory all the edges, not send a network request that comes back 2ms later. This fan-out of computation between nodes ends up dominating the compute time. 
[23:20:48] <ebernhardson>	 the difference with elasticsearch is all shards are independant. In elasticsearch each shard does its thing, reports to a coordinator. The coordinator gets the response and does a sort/slice. With a distributed graph each step in the graph that isn't within the shard has to go over the network, lots of communication
[23:34:14] <hare>	 I think I come here and ask about this every three months or so, so I appreciate your continued consultation