[08:57:32] dcausse: Just a quick question regarding your cross-index-move (https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/142/diffs#4724ccfa42b4c4396ecfede12d9b7661a5e702c4_0_16): Is there a reason (object references causing issues, etc.) against using UpdateEvent#toBuilder on source in DeleteCrossIndexMovedPage? [08:58:41] pfischer: you mean event.setTargetDocument(source.getMovedFrom().toBuilder().build()); ? [08:59:49] I believe there's no issue with our current pipeline but after a secound thought it sounded weird to share the same ref between two events [09:00:57] esp. that the object is mutable [09:05:48] Yeah, I remember that Erik ran into a reference-issue once. I meant line 16 in DeleteCrossIndexMovedPage, where the existing source is cloned to represent the PAGE_DELETE [09:09:20] ah you meant possibly reusing & mutating the source object after a first call to "collect" [09:10:33] perhaps it "works" if serialization happens synchronously being the collect method but this sounds dangerous [09:11:07] it would certainly be a problem is we ever enalbe object re-use (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/execution_configuration/) [09:12:34] s/is/if [09:14:20] Okay, that’s what I wanted to understand. It’s probably cleaner to start with a fresh instance, instead of unsetting unwanted references. Thanks! [09:55:12] hm.. seems like the consumer@cloudelastic is in a crashloop (java.lang.RuntimeException: SplitFetcher thread 0 received unexpected exception while polling the records) [09:55:40] root cause is in the saneitizer around org.wikimedia.discovery.cirrus.updater.consumer.sanity.SiteInfoMaxPageIdLookup.apply fetching some data from mediawiki [09:57:53] pfischer: I see you deployed the flink couple minutes before, could this be related to your deploy? [10:04:31] https://logstash.wikimedia.org/app/discover#/doc/0fade920-6712-11eb-8327-370b46f9e7a5/ecs-k8s-1-1.11.0-6-2024.26?id=R4JFXpAB0bw1V4rvppVg [10:05:44] something I don't get... java.util.concurrent.TimeoutException (extends Exception) is thrown but I don't see it anywhere in the method signature... [10:07:23] ah it's because of @SneakyThrows in HttpClientFactory.java:217 [10:10:14] seems like the in-process ratelimiter that is timing out [10:12:21] perhaps we should wrap that TimeoutException in an IOException to let the caller having a chance to retry? [10:17:01] lunch [10:41:45] dcausse: Yes, I wanted to see the effect of a reduced client-side rate-limiting since with 500 req/s per instance, we still got 429s. [10:45:24] Obviously, we can’t go too low with the hard-coded timeout of 5s to obtain a rate-limit permit [10:45:44] I re-deployed with a higher overall client-side rate-limit [11:26:16] Seems to work with 500 req/s overall (250 per instance) [13:40:39] dcausse / ryankemper: can we consider T349069 done? [13:40:40] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers - https://phabricator.wikimedia.org/T349069 [13:41:07] gehel: yes I think so [13:41:26] yup [14:11:20] dcausse: would you have time to jump in a meet to review the closing of our Graph Split KR? [14:11:36] gehel: sure [14:11:42] https://meet.google.com/bzt-hyht-sqh [14:15:31] ryankemper: do we have more work on T364077 ? [14:15:31] T364077: Adapt the wdqs data-transfer cookbook to operate with federated subgraphs - https://phabricator.wikimedia.org/T364077 [15:40:19] going offline, have a nice week-end [15:48:23] have a good weekend! [17:16:47] gehel: I suspect the ticket is done but I haven [17:16:59] 't run a test data-trasnfer cookbook yet so I'll do that before marking it done [18:37:33] ryankemper: thanks! [19:58:21] dr0ptp4kt I hit my data cap, so I'm on 2G speeds. Killer.sh is responsive but will dial into Meet on the phone and IRC preferred over slack/meet for comms [20:33:08] inflatador: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/