[09:37:50] weekly update is out: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-10-06 [10:24:18] dcausse: I’m working on using flink’s async retry instead our own. Now I’m wondering: How much sense does it make to have a maximum lag of late events should have an influence on the retry-decision in a late fetch scenario where the events are delayed by the preceding window length anyways. That maximum lag would always have to > then the window size to be meaningful. [10:26:33] And what do we do with overly late events? Do we simply drop them? [10:39:11] pfischer: the window won't delay all the events, only the first one that "opens" the window will be delayed by the window time, an event assigned to the window right before it closes would still have an event time "close" to the processing time, I agree that it now becomes less likely to have events with an age < lag_to_retry_on_missing_rev [10:40:18] regarding late events entering the window they still pass thru (the late sideoutput of the window is re-injected in the main datastream) [10:41:23] if you believe they could cause harm another approach can be taken by routing them to a another kafka topic to feed a "reconciliation" mechanism [13:13:42] o/ [14:50:29] \o [14:53:05] o/ [15:06:27] dcausse looking at https://phabricator.wikimedia.org/T342149#9230869 now, I'll try pinging service ops in the ticket..also asked in k8s-sig IRC [15:12:38] hopefully we don't need special handling for cluster upgrades, since that was the whole point of using ZK instead of native k8s HA ;( [15:22:08] looks like there is a cookbook for cluster upgrades: https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/k8s/upgrade-cluster.py [16:05:38] Workout, back in ~40 [16:49:40] back [17:00:04] o/ [17:03:16] inflatador: think we can get the secret put in place in the puppet private repo today? [17:09:04] ebernhardson sorry, I guess I forgot. Is there a gerrit link or do I just need to commit it to the repo? [17:10:29] related to https://phabricator.wikimedia.org/T342620 it seems? [17:10:58] inflatador: sadly i cant see or make any changes to puppet private, best i can do is point you to the heira keys i see used in puppet [17:11:54] understood, I just want to make sure we're on the same page. It should be identical to https://gerrit.wikimedia.org/r/c/labs/private/+/949944/ ...I can commit the creds no problem [17:12:12] Just wondering if there's something else we need to do to get it to render in the right place for k8s to consume [17:12:13] most likely the existing secret key is in profile::thanos::swift::account_keys::search_update_pipeline and it needs to be copied to profile::kubernetes::deployment_server_secrets::services_secrets. I'm not sure the exact format of services_secrets, but there are probably examples to look at [17:12:35] inflatador: putting it in deployment_server_secrets will get it rendered into /etc/helmfile-defaults/private/ and the helmfile.yaml then refers to it [17:13:10] ACK, will get to work on that [17:15:21] if you run into some issues we can video call or whatever to figure it out [17:16:27] ACK, maybe around 1 PM your time if I can't figure it out by then? [17:25:57] sure [17:38:23] OK, I think I got it now...CR to public "private" repo forthcoming [17:39:44] looks like a lot of our stuff was never committed there...bah [17:41:34] (or anyone else's for that matter) [17:49:08] * inflatador begins to wonder if it's worth it to do a public "private" repo update [17:51:33] inflatador: i suspect this one wouldn't do anything, usually the public "private" repo is for things that puppet has to lookup so CI doesn't fail [17:51:43] or for cloud [17:52:06] ah, I thought it was important to keep track of what was where...in that case I'll skip it [18:09:39] ebernhardson OK, commited. Going AFK but should be back in ~1h [18:11:27] inflatador: thanks! [18:47:40] so IIRC the private repo that exists only on disk on the puppetmasters is where the actual secrets live, but some changes also require a parallel patch to the "public private" repo that just has dummy data for hiera and the like [18:50:31] oh yeah just realized e.bernhardson already said that :P [18:50:46] ryankemper , I thought it was all changes, but I was wrong about that [18:51:06] ebernhardson how'd it go, were you able to pull the secret? [18:59:26] inflatador: oh lemme check if they are there now [18:59:49] inflatador: hmm, no cirrus-streaming-updater directory in /etc/helmfile-defaults/private. :( [19:01:02] ah, I called it 'search-update-pipeline' as that's what it was called in thanos. Let me fix that up [19:01:44] of course that's not there either ;( [19:02:59] i suppose the directory name is arbitrary, i simply put the string in the helmfile. But it looked like convention was to match the service name [19:03:42] looks like I need to roll back anyway [19:03:46] https://meet.google.com/qtf-yoqt-tmu if you want to join [19:04:11] ryankemper ^^ Let's skip prom training this wk in favor of fixing the repo [19:04:33] inflatador: was about to propose the same [20:51:26] deployed cirrus-streaming-updater to staging, no success getting anything to run. Not sure why yet. But it's something :) [21:08:59] I poked in k8s-sig IRC but I'm sure it'll be a few days