[10:21:10] lunch [13:14:17] o/ [14:07:50] dcausse I couldn't get the savepoint to restore on Friday, created https://phabricator.wikimedia.org/T345957 . I think you saw this already, but it hangs forever trying to restore from savepoint [14:08:32] I still have suspicions about dse-k8s-services cluster health, but was wondering if there's a way to validate a savepoint? [14:09:38] inflatador: if the savepoint was borked flink would scream I think, if it hangs I suspect some network issues accessing its dependencies [14:12:00] dcausse sounds reasonable, esp. since Erik and I saw some DNS issues in that env. I'll focus on that for now [14:21:24] inflatador: happy to pair on this if you want [14:22:54] brouberol sorry, had to reschedule our 1x1, forgot I have weekly mtg at that time [14:23:41] no worries [14:29:38] inflatador/ryankemper: I'm doing some experiments about T342361 on wdqs1009. It is downtimed at the moment, with puppet disabled. If you need, you can re-enable. I'll try to get that completed by this evening [14:29:39] T342361: Examine/refactor WDQS startup scripts - https://phabricator.wikimedia.org/T342361 [14:31:17] pfischer, dcausse: SUP meeting in https://meet.google.com/pup-xwxi-oqw [14:51:05] errand [15:47:26] dcausse: could you have a look at https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/854572/, please? So far Gabriele only +1ed but he’s currently the only one with +2-permissions. I pinged him to ask how to move on with this patch. [15:47:48] pfischer: sure [16:04:30] pfischer: re Gabriele comment we could still use errored_stream_name in place of original_event_stream I suppose, but I agree that since now we late fetch we've lost track of the actual original event [16:08:15] wondering if should we wait for the "late fetch" patch to stabilize before trying to consolidate this? [16:13:16] actually we'll have to revisit this a bit esp. when you think about the process that'll have to read this stream and ship reconciliation events [16:26:13] dinner [16:49:11] dcausse do you remember why you were advised to use values-dse-k8s-eqiad.yaml instead of just values.yaml in the dse-k8s-services dir? I'm getting advice from e-lukey to get rid of that values-dse-k8s-eqiad.yaml file [16:50:10] also, have a patch up for adding egress rules that should hopefully unblock us https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/956474/ [17:03:44] having some kind of ear issue that's making me dizzy...going to take the rest of the day off [17:09:27] inflatador: whats the concern? I think the general idea (not 100% sure) was to make the values.yaml work in development environments, and have values-dse-... to add in all the prod specific stuff [17:11:21] ahh, well take the day and we can continue tomorrow :) [17:41:49] inflatador: like what Erik said and also because there might be different envs in wikikube (eqiad vs codfw vs staging) and difference apps (wdqs vs wcqs via helm releases?) and experimenting with how to re-use/extend some of these values might be interesting to learn while we're in dse-k8s [17:42:01] inflatador: take care, hope you get better soon! [19:00:49] inflatador: good luck! I how you feel better tomorrow!