[06:27:41] <zpapierski>	 My wellness is mostly used up by offsetting deficiencies of my medical plan...
[09:40:31] <gehel>	 lunch, see you later!
[10:34:27] <dcausse>	 lunch+errand
[11:39:14] <zpapierski>	 lunch 
[13:44:49] <tanny411>	 dcausse and others too: A wikidata question: what is the 'context' column?
[13:44:50] <tanny411>	 I can see its bascally the item which is being described, ?item ?prop [?propStatement ?value], but i am bit confused what context means for properties.
[13:44:50] <tanny411>	 How deep of a link can 'context' indicate? Any docs will be super helpful!
[13:46:04] <dcausse>	 tanny411: where did you see this 'context'?
[13:46:47] <tanny411>	 Oh, in the wikidata dataset. the columns are context, subject, predicate and object
[13:46:53] <dcausse>	 oh
[13:47:46] <dcausse>	 "context" here refers to "quads" as opposed to triples
[13:48:27] <dcausse>	 the wikidata dataset does not use them, we use it in the hive rdf table to help grouping per entity
[13:48:55] <dcausse>	 say you want all triples related to an entity you can filter on context = Q123
[13:49:14] <dcausse>	 instead of following reification paths
[13:49:29] <tanny411>	 hmm..dont know much about quads. I can see how it will work for items. not sure about properties.
[13:50:30] <dcausse>	 a quad is just a triple with one additional value ("context" here)
[13:51:17] <tanny411>	 Okay
[13:51:26] <dcausse>	 for properties it's basically the same: context = P31
[13:52:16] <dcausse>	 context = Q42 helps you reconstruct most of what https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl?flavor=dump produces
[13:52:36] <dcausse>	 same for context = P31 with https://www.wikidata.org/wiki/Special:EntityData/P31.ttl?flavor=dump
[13:53:36] <dcausse>	 this would be terribly costly to extract otherwise
[13:54:13] <tanny411>	 Ah, thats helpful. 
[13:56:00] <tanny411>	 Thanks!
[13:57:40] <dcausse>	 the only part that's not well attached are values & references, (prefix v: and ref: in the ttl output). These can belong to multiples entities and thus cannot be assigned to a particular one, so here the context is equal to http://wikiba.se/ontology#Reference or http://wikiba.se/ontology#Value
[14:00:06] <dcausse>	 when working on this dataset it's generally to isolate asap a smaller dataset to work with and call DataFrame.cache() so that spark will have a lot less data to work with when doing more fine-grained filtering/grouping
[14:00:22] <dcausse>	 when working on this dataset it's generally *wise to isolate asap a smaller dataset to work with and call DataFrame.cache() so that spark will have a lot less data to work with when doing more fine-grained filtering/grouping
[14:22:17] <tanny411>	 Yes, understood!
[15:05:46] <gehel>	 ebernhardson: the python build should be fixed. Should we merge those 2 CRs on Mjolnir and see how it breaks?
[15:11:07] <ebernhardson>	 gehel: sure
[16:59:31] <dcausse>	 dinner
[17:01:24] * ebernhardson wasn't sure what a comma splice is...good someone did :)
[18:06:16] <ebernhardson>	 ryankemper: fishing for a review on https://gerrit.wikimedia.org/r/693205. Basically have the daemon listen on two new topics we will produce to. 
[18:12:01] <ryankemper>	 ebernhardson: looking
[18:16:30] <ryankemper>	 ebernhardson: looks good, I went ahead and merged it
[18:18:45] <ryankemper>	 should we merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/697836/1 now, or is there some testing that should be done first?
[18:19:58] <ryankemper>	 ah I guess presumably there would need to be another patch on the producer side before we're ready to remove the old topics
[18:20:13] <ebernhardson>	 ryankemper: not yet, we need this config, then i'll ship a mjolnir patch which also changes the swift container of same name to match these new names, then airflow stuff to produce to new swift container / kafka topic. Then finally this one
[18:20:39] <ryankemper>	 ack
[18:20:56] <ebernhardson>	 part of this is required, having a second topic, renaming i'm just doing because if i'm creating a second topic it might as well have names reflecting what they do
[22:40:34] <ryankemper>	 the reimages won't be complete today, but I'm pretty confident that every downed wdqs host will be reimaged so we'll be back at full capacity thankfully