[10:41:42] errand+lunch [14:19:00] .o/ [14:35:13] o/ [14:40:16] is "o/" a greeting? [14:40:36] new person here - I thought I'd use one of my "obvious question tokens" :D [14:43:20] dcausse i'm catching up wrt your plan to expose rdf streams via the EventStreams service. There are wrinkles around datacenter switchovers and managing state - and reconnect - of SSE clients. ottomata mentioned you have some plans on how to address that. Do you have any pointers? Very curious about your approach. [14:43:28] Infra SRE had a similar requirement https://phabricator.wikimedia.org/T376014#10253022 [14:45:06] gmodena indeed! my IRC client has some macros that do it called "wave1, wave2" etc [14:45:43] inflatador ack! nice :) [14:48:48] gmodena: I have not written down that yet, but the rough idea is to introduce a notion of "replicated topic" so that you can fallback to timestamp on the corresponding topic if they don't match [14:50:01] should be transparent to the client hopefully [14:50:46] but my approach is to solve an issue regarding streams that are active/active double compute [14:50:51] dcausse makes sense - with timestamp fallbacks delivery semantics would be at least once ,right? [14:51:08] hopefully yes [14:51:30] i think single compute is less of an issue in this specific case, since we don't switch those over [14:52:55] yes but I'm curious how that works if you use offsets in that case with single compute, client -> http://eventstream@eqiad, then http://eventstream switches to codfw [14:53:21] so you'll get the same set of topics but from a different kafka cluster for which the offsets are unlikely to match [14:59:07] That remains tricky. IIRC ottomata pointed me to https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes-java.util.Map- (to be used in conjunction with last-timestamp-id marker) [14:59:36] but tbh I did not think this problem through yet [15:08:16] yes me neither... need to learn a bit more about http eventstreams [15:11:50] happy to brainstorm if you'd like. On paper I'm (still) one of the EventStreams maintainers =) [15:14:53] sure, thanks! :) [15:28:25] dcausse: I've invited you to the WE3.1 steering committee, but feel free to skip if you have more interesting things to do. This should be mostly informational and I can fill you in afterward if anything comes up (ebernhardson too) [15:28:48] gehel: sounds good, I'll skip then :) [15:56:43] dcausse I applied puppet and restarted blazegraph on wdqs1011 if you want to check it out [15:59:06] inflatador: thanks! looking [16:00:30] the wed meeting is one hour later than usual? [16:00:44] looks like it, i was kind aplanning on normal time :) [16:00:55] yes me too, will join [16:02:38] Trey314159, gmodena we're in https://meet.google.com/eki-rafx-cxi?authuser=0 if you're interested [16:04:28] gmodena: https://gitlab.wikimedia.org/repos/data-engineering/kafkasse/#kafkasse [16:04:44] > KafkaSSE can be configured to always use timestamps instead of offsets in the EventSource event [16:04:44] id field via the useTimestampForId option. If this option is true, each EventSource event id [16:04:44] (which automatically is used for the Last-Event-ID header) will be set with the Kafka message [16:04:44] timestamp instead of offset. This is less precise than using offsets, but is better if [16:04:44] you need to hide the underlying Kafka cluster's message offsets to support multi-DC. [16:05:19] the Last-Event-ID header given back on each connection request will look like the example here [16:05:19] https://gitlab.wikimedia.org/repos/data-engineering/kafkasse/#kafkasse [16:05:21] oops [16:05:25] here: [16:05:25] https://gitlab.wikimedia.org/repos/data-engineering/kafkasse/#notes-on-kafka-consumer-state [16:05:31] but with timestamp instead of offset [16:05:50] I messed up my calendar. I though there was a P&T meeting today in place of the Wednesday meeting. [16:06:09] looks like blazegraph is running with the new arg `-Dwdqs.event-sender-filter.graph-name=wikidata_full` on wdqs1011 [16:06:16] as long as the topics and partitions used in each dc are the same, then the switch should be transparent. maybe you'll get some duplicates because of discrepancies in the timestamp offset lookup [16:06:24] but. for this use case, where the topics wil not be the same... [16:06:45] eventstreams or, maybe kafkasse, will need some changes [16:13:26] I'm running puppet across the w[cd]qs hosts and will do a rolling restart on all hosts shortly thereafter (we need to restart BG anyway, see T377938) [16:14:45] @dcausse what if...you just picked one of the topics and never used the other? or maybe made evenstreams configurably use a provided topic lookup function instead of this code? https://gitlab.wikimedia.org/repos/data-engineering/eventstreams/-/blob/master/routes/stream.js?ref_type=heads#L366 [16:15:48] if you just chose one, e.g. eqiad. topic in eventstreamconfig, or hardcoded and overridden in eventstreams service values file config to set topic, then it should just work. [16:16:14] you'd then be relying on the single compute of only one dc...but this seems okay for this purpose, no? [16:17:27] ottomata: yes but this does not solve the switch-over problem I think? [16:30:18] inflatador: all good for wdqs1011, I can the new field in the events now [16:30:59] dcausse ACK, I'm restarting wcqs now. If you think I should wait before restarting all of wdqs let me know. Otherwise I'll start once wcqs is done [16:31:42] inflatador: no, should be good to go for all the nodes [16:32:20] dcausse ACK, will move ahead then [16:48:04] workout, back in ~40 [18:46:55] dinner [19:05:04] lunch, back in ~40 [21:16:34] wel...been back for several hrs, but taking another break now ;) [22:08:16] oh yeah, and been back here