[09:48:57] weekly update posted: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-11-03 [13:51:47] rebooting cloudelastic1005 to clear up the GC alerts we've been getting [14:43:46] ebernhardson: I created a naive alternative serializer based on generic kryo serializer, writing a full-blown custom serializer that delegates to an underlying row-serializer turned out rather tricky. [14:43:48] https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/53 [14:48:40] pfischer this ticket came up on my radar; not sure if it's relevant but sharing just in case https://phabricator.wikimedia.org/T338231 [15:00:15] \o [15:06:00] pfischer: sadly i've had no luck narrowing down the source event, the logs did include one partially serialized form in one of the errors but it wasn't clear what it was for. wikiid was a blank string :S I'm hoping thats more in the bad ser/de area because i couldn't find a related event [15:25:59] i suppose i could setup a test case that doesn't typically run and takes a test file from kafkacat, shouldn't be too hard to collect all the events and run them through [16:00:55] workout, back in ~40 [16:51:13] back [16:52:40] \o [16:52:45] lunch [18:36:31] sorry, been back [18:39:20] compare-clusters.py is bombing out on mwmaint2002...error is `TypeError: memoryview: a bytes-like object is required, not 'str'` , and the error comes from `"/usr/lib/python3.7/subprocess.py"` ... too old of a python or what? [18:39:48] inflatador: sadly yea, that was written with python2 [18:41:23] ebernhardson np, will take a crack at updating. [18:45:16] appointment, back in ~90m [19:29:19] * ebernhardson manages to run producer in intellij debug while reading from kafka-main (via grepplabs/kafka-proxy + socks5). And now realize that UpdateEvent.KEY_SELECTOR NPE is because the UpdateEvent itself is null :S [19:54:53] certainly some form of corruption. An UpdateEvent with meta.uri of `https://de.wikipedia.org/wiki/Benutzer_Diskussion:Qcomp/Bell-Test` has a pageTitle that includes plenty of null's and contains `r_Diskussion:Qcomp/Bel` prior to the first null. Something before the page title read too much [20:04:46] but of course, find the event by the request_id, run it through a test case to deserialize from json and round-trip through the internal serialization ... and all is fine :P [20:06:49] feels like multiple threads writing to the same buffer, or other weirdness...hmm [20:10:45] ebernhardson: congrats on the kaka-main-2-local setup. I was hoping for something like that but when I tried it the last time a few years ago, that didn't work since the brokers advertised themselves under different domain names. Will look into that proxy. [20:11:57] ebernhardson: What do you mean by, the pageTitle contained nulls? [20:12:22] pfischer: i mean using the debug inspector on the pageTitle in UpdateEvent had at least 15 or 20 null chars in it [20:13:07] but i managed to pull that same event out of kafka and run it through a simple test round trip and everything is fine. Suggests it's not the content itself, but something else [20:15:23] mostly by setting breakpoints on the exceptions it throws, then going up the stack to PojoSerializer. I suppose this is mid-deserialization, maybe flink is re-using an object? [20:15:35] Hm, serializers may support reuse of objects, would be interesting to know, if the PojoSerializer [20:16:01] ... does that. Same thought [20:22:42] approximate process of setting up proxy: https://gitlab.wikimedia.org/-/snippets/102 [20:24:28] back [20:25:28] M3,Mnuch1n,$2 [20:27:11] ebernhardson: thanks. According to the docs, object reuse must be enabled explicitly via ExecutionConfig.enableObjectReuse() I can't look at the code right now, but afaik we don't call this method [20:29:50] pfischer: indeed, we dont [20:35:26] welp, better change that pw ;) [20:35:31] Just a wild guess: Maybe the S3/swift backend leads to the corruption. Could we deploy with local disks as savepoint store? [20:36:21] pfischer: hmm, i suppose i'm not sure but i'm assuming since i didn't configure anything thats what it does when running the application locally [20:37:41] that was my home desktop...KVM fail [20:37:59] inflatador: happens :) wasn't quite enough special characters to be perl [21:06:13] ;P