[07:11:38] dcausse: thanks for the fix on wdqs! And sorry you had to work so late. Please take half a day to recover! [10:22:31] lunch [13:16:57] o/ [15:44:18] Just destroyed rdf-streaming-updater in dse-k8s; patches for its permanent removal forthcoming [16:01:54] workout, back in ~40 [16:59:12] sorry, been back [17:39:47] lunch, back in ~40 [17:49:21] * ebernhardson is surprised to see the producer up to checkpoint 700+ with 0 restarts...so it might not be working but it's not dieing anymore [18:08:57] yea flink web ui confirms, 0 bytes received on all sources [18:20:05] aww, the ticket to allow changing logging levels at runtime in flink was declined as out of scope [18:20:52] apparently there is some way to change the configmap of a running container and have log4j pick that up...sounds fun :) [18:29:40] sigh, the answer is I'm not very smart :P This is staging-eqiad, it's reading eqiad.*, and there are obviously no edits there... [18:30:09] * ebernhardson finally realized after finding data in prometheus to finally verify it has registered consumers for the correct topics [18:40:50] sorry, been back [18:44:57] meh, the example events in the schema repo have $schema with a leading /, the production events for outlinks do not. But the production events for mediawiki page changes do have the leading / :P [19:43:00] curious...i can see the consumer lag for the producer moving up and down in prometheus, so relaly looks like it's consuming the inputs. But the UI doesn't show any data moving, and nothing in the output topic.. [20:07:32] weird...are you sure you're looking at the right metrics? [20:07:57] I guess it would be tough to mess than up on a net-new service [20:24:43] the problem turns out to be that metrics connections are getting blocked by networkpolicy [20:24:56] well, at least i think thats it :) If so this patch will fix the metrics: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/966926 [20:28:08] maybe i should make it conditional at first though, then we can deploy without risking breaking the other flink-app [20:37:57] * ebernhardson still can't explain why the output topics are empty if the consumer offsets are moving and flink isn't complaining [20:45:43] :eyes [20:47:55] should be conditional now, waiting on ci to verify that it only changes the chart version number for the other apps [20:48:17] it's not actually clear to me when the chart version number needs to change...i've kinda been incrementing it on any patch that touches the chart [20:49:45] If it's only the rdf-streaming-updater in k8s we don't care...in process of undeploying that guy anyway [20:50:32] it will change the chart version number in mw-page-content-change-enrich, but nothing changes for them [20:55:11] Yeah, looks good...+1'd [20:58:17] thanks [21:12:57] NPE from UpdateEventMerger, i guess thats back to progress :) Doubt thats related to the metrics, but the metrics now give 0's instead of `...` [21:15:36] * ebernhardson needs to learn how to get debug information out of this...no clue what the events that failed look like [22:35:00] * ebernhardson ponders adding some kind of context or provenance object to UpdateEvent that we can aggregate debug information into to include with errors post-event merge