[06:29:29] hello folks! [06:29:42] an-airflow1001 is having some problems with logs again [06:30:04] I can drop old logs if nobody is around, otherwise I can wait [06:30:07] dcausse: --^ [06:52:41] dropped :) [06:59:21] elukey: thanks :) [09:41:20] dcausse: do you remember where can I get media entity namespace, or perhaphs remember what it is? [09:44:21] zpapierski: mediainfo entities do not have their own namespace, they are part of a "revision slot" (MCR: Multi content revision) [09:44:44] should be the FILE namespace [09:44:54] hmm, I know little too little about MCR [09:44:55] for the entities we care about [09:44:55] thanks [09:45:50] so, I assume 6 [09:46:32] yes [10:21:22] lunch [10:24:52] lunch too [12:12:13] zpapierski: fyi, there are some remotely related discussions in T230862 [12:12:13] T230862: Create a way to filter only WB-related changes from Commons recentchanges - https://phabricator.wikimedia.org/T230862 [12:12:55] thanks for the heads - I was literally thinking about that [12:14:33] s/heads/heads up/ [12:15:09] that would be nice to have this for events, I'm guessing this was only done for RC api [12:19:28] yes I think so, it'd be great to have some hints in the events indeed [12:19:51] I looked for anything, but I didn't found anything useful [12:20:17] yes not sure the usecase was ever needed since now [12:20:34] btw - do you know if pageId is something I can rely on to be the same as entity ID form sdc? [12:21:04] commons MID is simply "M" + page_id [12:21:08] so yes [12:21:13] terrific [12:22:22] the tricky part is that a revision might not be structured data related but still impacts the RDF output (the "schema:version" triple is about the revision itself) [12:22:53] I assumed that this will simply produce empty diffs - isn't that the case? [12:22:56] aa [12:23:00] except the version [12:23:22] we could ignore that, not sure we should, though [12:23:34] taking all revisions after the mediainfo slot has been created is a good approach [12:23:42] it matches what the dump will produces [12:24:20] well, it requires less logic anyway, so I'm all for that :) [12:24:52] well not sure it requires less logic :/ [12:25:32] doesn't it? new revisions will produce at least a single triple change, with the version [12:25:39] how is it different from what we do now? [12:25:59] you don't want to produce RDF if the entity has no mediainfo slot [12:26:18] and you won't have this entity in the initial state [12:26:41] ah, ok - but that's the case whenever we'd create new updates for version change or not [12:27:10] and API, from what I understand, 404s when no mediaslot has been created [12:27:37] so from that perspective, it should be invisible to updater, at least after we deal with knowing what 404 means [12:27:44] (I know, that's not a small task) [12:28:19] relying on 404 is perhaps possible but might not be enough [12:28:34] how so? [12:30:56] say File1 revision 1 has no mediainfo slot it won't present in the RDF dumps thus unknowns from the flink state [12:31:57] you receive File1 revision 2 (parent : 1) the event will be buffered thinking that the revision 1 has been misordered [12:32:13] creating a lot of timers I'm afraid [12:32:20] I see [12:32:42] I wonder shouldn't we simply keep the revision for media items, even if there are no mediainfo triples [12:33:07] the dump process would not be able to do that I think [12:33:34] dump probably not, but at least we'd limit impact [12:34:12] so we'd need a hint in the events [12:34:27] or we need an extra MW-call prior the reordering [12:37:29] That's true, some additional info is needed [12:39:39] calling MW is always going to decrease the quality of the stream (eventual consistency/network errors) [12:43:38] not to mention additional latency :( [12:44:28] indeed [12:46:27] otoh, not super sure how hints in events should be added - just informing about MCR slot isn't enough, if we plan to keep version change on each revision, but only after mediainfo slot is created [12:46:42] or maybe it is... [12:47:18] if given change provides 404 on request and the triggering event wasn't mediainfo slot related [12:47:46] it should be because mediaslot doesn't yet exist (unless eventual consistency) [12:48:20] in any case, even in case of eventual consistency, we at most loose a version bump, which shouldn't be a big deal [12:48:29] I feel like I'm missing something here [12:48:41] anyway, break for now [12:49:19] gehel, dcausse : are we skipping today's sync? everyone from WMDE seems to be out [12:49:40] fine by me [12:49:46] makes no sense to keep it :/ [12:49:49] thanks for checking! [12:51:13] it depends on what we want: 1/ getting only mediainfo related changes: the revision history seems harder to reconstruct, 2/ getting all revisions only after the mediainfo slot has been created: we're inline with the dumps but the events must inform us that a slot is available [13:19:13] dcausse, zpapierski: did you receive task to grade (text parsing, java solution). I can't see your scorecards. [13:19:37] gehel: no, only a python notebook recently [13:20:42] I'm pinging Amanda about it. [13:21:08] I received two emails tho, but the first one is "Sorry, but we ran into an error loading this page." and thought it was the same as the one I received sometime later (the python one) [13:30:18] gehel: same here [13:54:39] relocating [14:48:03] zpapierski, dcausse: I've sent you the task by email [14:48:11] thx [15:01:30] ryankemper, ebernhardson: triage? [15:01:39] mpham: ^ [15:01:58] hi, sorry, i'm out today! [15:02:09] mpham: soo sorry! [15:59:21] dinner [16:22:41] relocating (and probably out for the day) [16:23:30] gehel: can I have some invite to the event you mentioned? I don't see it in mine of staff calendar [18:00:26] waiting on reviews of https://gerrit.wikimedia.org/r/c/labs/private/+/715570 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/715569 from sre but besides that the TLS certs for wcqs should be good to go (wrt https://gerrit.wikimedia.org/r/c/operations/puppet/+/713958/) [19:02:39] ebernhardson: sorry, I'm a few minutes late [19:02:55] apparently i should join now :) [19:03:00] :9