[08:01:58] hare: can't really tell you about runUpdate.sh, we actually never tested old updater with WCQS (new one will be tested soon), but if you want to load the data - this is what we use https://github.com/wikimedia/wikidata-query-rdf/blob/master/dist/src/script/wcqs-data-reload.sh [08:02:43] as for the exposing the mutation stream publicly, that's true, we have plans for that - T294133 , can't tell how far off we are though [08:02:44] T294133: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 [09:01:58] ejoseph: meeting? https://meet.google.com/ukb-kgxq-gvq [09:44:42] zpapierski: ket me know when you available [10:14:17] ejoseph: I'll grab a coffee and we can connect [10:14:38] ok [10:31:32] I'm here - https://meet.google.com/emi-rhtq-bws [10:44:07] ejoseph: once you're back around, let me know [10:48:59] code with me? [10:51:12] ok [11:02:37] lunch [13:27:34] break [14:22:14] Thank you all for your help with this [15:11:38] My hope is that with my work I take some of the pressure off you all [15:38:09] \o [15:42:01] o/ [15:59:09] Props on making the munge process multithreaded by the way, my CPU is singing [16:00:01] :) [16:23:06] this one's easy to multithread, but unfortunately indexing in Blazegraph can't be :( [16:29:59] * ebernhardson is partially amazed they managed to single-thread the indexing :P [16:30:26] Graph databases are particularly wonky, aren't they? Can't be neatly sharded like a relational structure [16:33:17] Yea they don't have nice clean boundaries like relational dbs, i suppose i'm more surprised because the server space has been going with more and more cores, single-threading writes cuts them out of so much possibility [16:34:51] cuts who out,exactly? nobody's maintaining that anymore :) [16:35:19] :P [16:37:42] zpapierski: for the 404 stuff, as far as i can tell from review we need to pass some sort of "is recent event" (eventTime ~= ingestionTime?) marker through the event stream, and then do more retries in the existing GenerateEntityDiffPatchOperation::fetchAsync? [16:38:10] should replace existing retry handling with the Retyer used in the consumer? [16:38:14] Retryer [16:39:29] Retryer still feels incorrect :) [16:39:35] it retries on 404s? [16:39:53] in any ways [16:39:59] currently it retrys all RetrableException in a fast loop [16:40:01] I's say you're correct [16:40:36] Using the Retryer that the consumer does we would instead declare some sort of wait between requests, and a wall clock timeout i suppose [16:40:39] actually, wait - that's processing time, not ingestion time [16:40:47] if you're doing T279698 yes but you should use the processing time not the ingestion time [16:40:48] T279698: WDQS should retry when getting 404s - https://phabricator.wikimedia.org/T279698 [16:41:40] yea thats the one. It does currently retry i suspect the problem is just that it's a fast-retry loop, so it probably fails 5 times in 30ms each and gives up [16:42:13] yeah, that will never work, but we could add a custom back-off to that [16:43:00] I guess thats the question with Retryer, since thats used in the consumer side are we trying to normalize and use similar code everywhere, or should this be one-off? [16:43:20] could always add a sleep based on the retry count instead [16:43:36] back-off factor? isn't there one already [16:43:37] ? [16:43:51] no, it's just a tail recursive function that calls itself on exception [16:43:56] ah, I see [16:44:16] did not know that, thought that it would sleep a bit [16:44:51] so the question is whether to generalize and put it at the Retryer level? [16:45:12] we could leverage timers [16:45:41] i wasn't sure yet if we should use the same retryer config that consumer is using, didn't look that close yet :) Suspect it would be different, but not entirely sure [16:46:21] But yes, i suppose i saw a generalized retry handling and then also one-off implementations of retry handling, and wondering if we should prefer the generalized solution so everything looks similar-ish [16:46:24] the Retryer seems fairly generic and has a "wait" strategy [16:46:36] but there might be impedence mismatches, i haven't written anything with Retryer yet :) [16:46:58] first time I open this class :) [16:47:15] that name feels weird :) [16:47:24] lol, ok so it wasn't all that intentional to have a generalied retryer :) [16:47:24] so I think it should only for the flink app [16:47:50] the other updater relies on 404 to say it's a delete [16:48:06] so you'll most certainly hit a contradiction at some point [16:49:12] so I think it's safer/easier to retry 404 only for the flink operator [16:49:18] ok [16:49:37] it currently retries all Retryable, not just 404. checking what those are.. [16:50:07] there's also a retry mechanism at the http client level [16:51:55] right, it's just recurring on retryableFetch now [16:52:09] hmm, so many levels of fun :) I'm not sure about retrying 404 at the client level though, it would have to target specific urls to avoid adding a bunch of delay elsewhere [16:52:25] (or maybe that client is only used for this one call, i dunno :P) [16:52:35] it' s not, at least not underneath [16:52:38] yes seems dangerous to alter the client [16:52:42] it's the old repository [16:52:52] we just wrapped it around [16:53:13] i can just extend whats there, not particularly hard. Just wasn't sure if thats what we wanted [16:54:33] anyway, timers should work here [16:54:47] actually I'm not sure if that's the case [16:54:55] not in a sync operator [16:55:04] they can't have states [16:55:11] yeah, that's what made me pause [16:55:42] hmm, since this is already inside an async future isn't all that handled elsewhere already? [16:55:47] wdym they can't have state? [16:56:09] ebernhardson: it is, but apparently standard future scala library doesn't have schedulers [16:56:11] it's a limitation of the async operator [16:56:17] oh [16:59:23] timers require state handling? [16:59:50] yes I think so [17:02:17] yeah, I think they are different types of functions [17:04:05] yes looks like it's a RichFunction which does not give you access to the timer service, only getRuntimeContext (which actually throw UnsupportedException) [17:05:02] it feels weird that this doesn't let you schedule another execution after some time, feels like a very probable use case, retrying after some time without blocking a thread [17:05:53] unless they assume async client should handle that [17:05:57] (maybe it's true) [17:06:07] yes from their API it's up to you to do the concurrent task [17:06:55] they provide no "async mechanism" other than keeping track of what was sent/received [17:09:10] scala doesn't have any timer, but java does, we could probably use that one [17:13:24] sounds easy enough, looks like all flink cares about with async is that we eventually call the functions on their ResultFuture [17:14:16] yep [17:14:58] i also suspect this isn't currently doing anything async :P [17:15:04] I wonder though - creating a seperate ScheduledExecutorService just for retriesfeels weird [17:15:57] imo it's fine to just sleep [17:17:01] there's java.util.Timer that could be handy to avoid ScheduledExecutorService [17:20:23] since the pipeline is blocked by the completion of these task I see no reason not to sleep, this executor service is not going to do more work if we don't sleep [17:21:27] really? I thought that other revisions might finish what they're doing? [17:22:40] the executor service should have enough to do all the concurrent task this async operator is configured to accept [17:22:47] past that it'll block [17:22:59] it looks like the Timer will use a single background thread anyways. Might as well sleep? [17:23:00] because we care about ordering [17:23:19] does ordering matter for seperate entities? [17:23:59] no but we don't have a partition per entity [17:24:08] right [17:24:44] but ordering doesn't mean that it's blocking [17:25:06] it's blocking til the concurrency limit of the async operator is reached [17:25:07] it's only blocking the results, not the operation [17:25:09] ryankemper: can we skip our meeting later today ? This week is pretty full. [17:26:52] dcausse: I'm confused (normal for concurrency) - if we have 4 operations that all will wait 10s because of eventual consistency, should it take 20 sec to finish? [17:27:01] on size 2 pool? [17:27:32] so there's the concurrency limit of the async operator [17:27:50] wouldn't it get down to 10s + small overhead if we release the thread while waiting? [17:27:50] that will block if it has pending_tasks > concurrency_limit [17:28:13] then there's the executor service [17:28:38] worst case is always pending_tasks threads running [17:29:02] so if the executor service has more threads than pending_task it cannot starve [17:29:43] actually it's 2 * pending_task because we fetch 2 revision per change [17:32:25] ok, so if limit is 2 now for async, and we add 10s waits at worse, won't that block our pipeline? [17:33:15] ah, it isn't [17:33:19] it's actually 60 [17:33:37] so, we apparently alreday have akka. is something like this viable: https://gist.github.com/viktorklang/9414163 [17:33:50] (it's a one-liner futures retry) [17:34:12] fine by me :) [17:34:57] anyway, need to drop off, I'll ponder the complexities of concurrency later [17:35:08] like when I want to induce a headache... [17:35:19] Dinner time [17:42:31] dinner [17:52:16] gehel: no worries! [17:54:57] ryankemper: thanks! [18:31:18] * ebernhardson isn't sure what to think of scala allowing ⇒ where => would be [23:12:04] * ebernhardson sighs at test case that stalls out and does nothing in maven, but runs fine in idea