[12:52:26] <elukey>	 jayme: o/ I was reading T353233, do you think that we should review ml-serve's control plane too?
[12:54:31] <jayme>	 elukey: you probably have way less nodes - but won't hurt to check I'd say
[13:02:27] <elukey>	 jayme: ack, anything relevant that I have to pay attention to? I'll try to review and possibly expand, we'll get 8 more nodes on each cluster next Q
[13:02:38] <claime>	 Don't let typha break :p
[13:02:51] <elukey>	 that is the goal :D
[13:03:11] <elukey>	 I mean beside memory/cpu usage etc..
[13:55:45] <jayme>	 elukey: I was only looking at CPU / Mem usage on the controll planes really and saw big CPU spikes during mw deployments for example
[13:56:04] <jayme>	 plus a elevated base CPU usage with the cluster growing
[13:56:40] <jayme>	 we'll probably switch from ganeti to hardware nodes soon'ish - also because of the IOPS requirements for etcd
[14:12:35] <elukey>	 ack thanks!
[15:59:29] * inflatador wonders if it would be possible to create "hi iops" ganeti tier...at least, something that's not RAID-5 ;)
[16:27:50] <klausman>	 jayme: I have a question about change 983191. While the kubeControllers explicitly have their CPU limit set to ~, the typa ones inherit the limit from the main.yaml file (which is also ~). Is there a reason why the limit is explicitly set for one but not the other?
[16:28:37] <akosiaris>	 FYI: for https://phabricator.wikimedia.org/T352906 I have bumped the global envoy image version to 1.23.10-2-s4-20231203 (it includes latest patches + WMF CAs). In a merely tangential change, I have bumped all charts to utilize the newer patch levels (x.y.Z) of mesh.configuration in order to utilize the CA bundle that include public CAs and not
[16:28:37] <akosiaris>	 just WMF CAs
[16:28:56] <akosiaris>	 I 've upgraded all of wikikube services, don't be alarmed if you notice those 2 changes in your own services
[16:30:05] <akosiaris>	 inflatador: the RAID5 thing there is just so there is some redundancy and we don't end up with a major and confusing to debug issue if 1 disk fails. It can be any RAID level we want, including JBOD, raid6, raid10, etc
[16:31:08] <akosiaris>	 we do got redundancy on the VM level, we can always just failover it to the secondary node and restore the service ofc. RAID5 isn't there for service redundancy.
[16:31:24] <elukey>	 akosiaris: for my own ignorance - ca-certificates.crt contains public and WMF-internal CA certs?
[16:31:42] <inflatador>	 akosiaris ACK, do we have ganeti tiers already? Or is that just a possibility?
[16:35:22] <akosiaris>	 inflatador: yes. The PoP ganeti clusters, IIRC have RAID1 (and just 2 disks) cause they are supposed to have just a couple of core services, essential to a PoP.
[16:35:35] <akosiaris>	 elukey: yes
[16:36:06] <elukey>	 super