[07:23:00] thinking about this JsonRowSerializationSchema issue, Erik was right, might not have been classloading issue in the end but rather the use of == for these constant, flink should either use .equals() or BasicTypeInfo should implement readResolve() to return these constants... [08:34:28] Yes, at least the docs should state that type information should not be created after the graph has been created [10:33:14] lunch [12:59:07] .o/ [13:14:16] o/ [13:39:07] dcausse still working thru the flink config for helm. For values unique to codfw (like kafka brokers), should we put these in values.yaml and override? Or should we add them in each file (values.yaml, values-staging etc) [13:40:49] inflatador: there could be two layers of overrides if we want: values.yaml -> values-eqiad.yaml -> values-eqiad-wikidata.yaml [13:42:11] kafka brokers might be put in values-eqiad.yaml such that they are re-used for wikidata (wdqs) and commons (wcqs) [13:43:23] dcausse cool, I was indeed setting it up to have 2 layers of overrides. Will use values-eqiad/codfw for those config keys [15:40:35] i'll hop onto the call within 20 minutes (will be mostly in listen-while-code mode), but am going to take a little breather as we just wrapped another meeting [16:07:40] workout, back in ~40 [16:39:01] going offline [16:57:54] back [17:16:02] lunch, back in ~1h [18:01:01] back [18:37:33] heh, consumer fell over because it couldn't write to the fetch error topic [18:38:44] not sure how that's happening though, topics have always been auto created [18:49:44] oh, earlier on (and not captured in the flink ui): java.lang.OutOfMemoryError: Java heap space [18:49:58] from kafka-producer-network-thread [18:53:10] huh, also the producer is being created with plaintext even though it's set to ssl in values-staging.yaml :S fun times :) [19:18:25] ┐('～`)┌ [19:45:48] oh look, my test page in relforge now says streaming updater :) [19:55:23] we didn't really talk about next steps...i suppose will turn a few more wikis on in the producer but leave the consumer on testwiki [20:27:41] ^ㅂ^ [20:28:33] I think the goal was to have a single wiki working in staging? After that, I'll have to check [20:28:58] but if we've reached that milestone, congratulations! [20:31:01] i dunno if it's entirely working, but it does send a write from end to end :) [20:41:01] {◕ ◡ ◕} [20:41:45] I guess we need to think about the next steps then...would you consider https://phabricator.wikimedia.org/T347075 complete? (Fine if not) [20:48:44] hmm, i guess yes. its deployed [20:49:04] oh, i guess rerender events aren't on yet [20:49:37] or maybe it is, i see a patch to turn it on for testwiki merged yesterday [20:51:21] hmm, i only see canary events in kafka though. Not sure if it's because testwiki is low volume or it's not working [21:02:03] can at least see that the EventBusBridge handler is registered, nothing coming out of EventBus on testwiki in logstash [21:08:02] Assuming that works, what would we do next? Maybe add a few more low-volume wikis to the test case or something? [21:12:51] yea something like that, mid volume wikis. Enough that the updater is regularly doing something [21:29:55] Should I look up how many kafka events wdqs streaming updater gets vs cirrus search? I'm just wondering about how much compute we might need [21:48:32] thanks for making the dashboard BTW https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater?orgId=1 [21:48:57] it's a start anyway ;) I'll get a task up for that too [22:04:43] inflatador: hard to say, re compute. The wdqs updater has to do a variety of actual work on the data, the cirrus updater is a bit simpler in that respect [22:09:21] ebernhardson understood, I'll get a ticket up for resource estimation by EoW. Definitely SRE work but if you think of any good ways to measure/load test whatever, I'm all ears ;)