[06:27:41] My wellness is mostly used up by offsetting deficiencies of my medical plan... [09:40:31] lunch, see you later! [10:34:27] lunch+errand [11:39:14] lunch [13:44:49] dcausse and others too: A wikidata question: what is the 'context' column? [13:44:50] I can see its bascally the item which is being described, ?item ?prop [?propStatement ?value], but i am bit confused what context means for properties. [13:44:50] How deep of a link can 'context' indicate? Any docs will be super helpful! [13:46:04] tanny411: where did you see this 'context'? [13:46:47] Oh, in the wikidata dataset. the columns are context, subject, predicate and object [13:46:53] oh [13:47:46] "context" here refers to "quads" as opposed to triples [13:48:27] the wikidata dataset does not use them, we use it in the hive rdf table to help grouping per entity [13:48:55] say you want all triples related to an entity you can filter on context = Q123 [13:49:14] instead of following reification paths [13:49:29] hmm..dont know much about quads. I can see how it will work for items. not sure about properties. [13:50:30] a quad is just a triple with one additional value ("context" here) [13:51:17] Okay [13:51:26] for properties it's basically the same: context = P31 [13:52:16] context = Q42 helps you reconstruct most of what https://www.wikidata.org/wiki/Special:EntityData/Q42.ttl?flavor=dump produces [13:52:36] same for context = P31 with https://www.wikidata.org/wiki/Special:EntityData/P31.ttl?flavor=dump [13:53:36] this would be terribly costly to extract otherwise [13:54:13] Ah, thats helpful. [13:56:00] Thanks! [13:57:40] the only part that's not well attached are values & references, (prefix v: and ref: in the ttl output). These can belong to multiples entities and thus cannot be assigned to a particular one, so here the context is equal to http://wikiba.se/ontology#Reference or http://wikiba.se/ontology#Value [14:00:06] when working on this dataset it's generally to isolate asap a smaller dataset to work with and call DataFrame.cache() so that spark will have a lot less data to work with when doing more fine-grained filtering/grouping [14:00:22] when working on this dataset it's generally *wise to isolate asap a smaller dataset to work with and call DataFrame.cache() so that spark will have a lot less data to work with when doing more fine-grained filtering/grouping [14:22:17] Yes, understood! [15:05:46] ebernhardson: the python build should be fixed. Should we merge those 2 CRs on Mjolnir and see how it breaks? [15:11:07] gehel: sure [16:59:31] dinner [17:01:24] * ebernhardson wasn't sure what a comma splice is...good someone did :) [18:06:16] ryankemper: fishing for a review on https://gerrit.wikimedia.org/r/693205. Basically have the daemon listen on two new topics we will produce to. [18:12:01] ebernhardson: looking [18:16:30] ebernhardson: looks good, I went ahead and merged it [18:18:45] should we merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/697836/1 now, or is there some testing that should be done first? [18:19:58] ah I guess presumably there would need to be another patch on the producer side before we're ready to remove the old topics [18:20:13] ryankemper: not yet, we need this config, then i'll ship a mjolnir patch which also changes the swift container of same name to match these new names, then airflow stuff to produce to new swift container / kafka topic. Then finally this one [18:20:39] ack [18:20:56] part of this is required, having a second topic, renaming i'm just doing because if i'm creating a second topic it might as well have names reflecting what they do [22:40:34] the reimages won't be complete today, but I'm pretty confident that every downed wdqs host will be reimaged so we'll be back at full capacity thankfully