[00:20:05] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) As in T300324#7752134, I've rolled out all the k8s services where Envoy version was the only diff. We're now up to 1.18 everywhere, except for k8s servi... [00:20:15] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) [06:46:18] good morning folks [06:46:28] going to reimage kubernetes1007 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/770440 [07:06:42] (way easier to reimage non-vm nodes :D) [07:57:32] folks kubernetes1007 back in service [07:58:12] I just restarted two appservers (mw144[8,9]) since there were some deployment issues, it seemed all opcache related (opcache info for one of the nodes https://phabricator.wikimedia.org/P22926) [07:58:28] while doing it I noticed that 3 appservers are currently depooled in eqiad [07:58:49] https://config-master.wikimedia.org/pybal/eqiad/api-https [07:59:21] but I don't find why in phab/SAL [08:07:25] anyway, I am going to reimage kubernetes1008 :D [08:55:22] 1008 done! uncordoned and repooled [08:55:38] 7 left to be reimaged [09:09:10] 10serviceops, 10Maps: Index integrity check on maps cluster - https://phabricator.wikimedia.org/T304405 (10Jgiannelos) [09:34:15] jayme: look what I found when trying to look up the error that I am getting when building istio [09:34:18] https://github.com/istio/istio/issues/32978 [09:34:19] loooool [09:34:28] this is why it seemed familiar [09:35:59] of course the branch seems not there anymore [09:38:32] the extra commit should be https://github.com/istio/istio/commit/98a6168b48fd02c2e3d7cc819067d50044004ba5 [09:54:15] the funny thing is that the 1.9.5-patch branch that we use in the istio-build docker image is not available anymore on github :( [10:34:09] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review, 10User-jijiki: Cleanup kartographer default styles in mediawiki config - https://phabricator.wikimedia.org/T298249 (10Jgiannelos) [10:35:34] oh, well. Thats funny and very unfortunate [10:35:56] elukey: do you have it locally by chance? [10:37:06] (I don't) [10:37:24] jayme: nope, but the extra commit is available, I am building the istio 1.9.5 release with it [10:37:30] it should be the same [10:37:37] ah, okay [10:38:02] should we migrate to a more up-to-date version though? yes :D [10:38:27] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10JMeybohm) [10:39:48] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10JMeybohm) Certs have been renewed (with cergen managed ones). Thanks @Joe for pairing! [10:56:43] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10Volans) Thanks! I think we can now destroy the ones in the Puppet CA mentioned in T304237#7790839 at this point. [12:54:46] 10serviceops, 10SRE, 10Znuny, 10Patch-For-Review: Move VTRS db passwords to a different hiera location - https://phabricator.wikimedia.org/T303272 (10jbond) The change has been made on the private repo ` git show b9303238 [12:52:... [14:09:48] 10serviceops, 10SRE, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2022 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) [14:10:31] 10serviceops, 10SRE, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2022 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) [14:11:03] 10serviceops, 10SRE, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2022 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) [15:29:37] folks quick poll [15:29:57] I am wondering if calico::cni in puppet could be generalized [15:30:11] so that multiple cni plugins can be added, etc.. [15:30:20] (in my case, the istio-cni ones) [15:31:15] ideally the path /etc/cni/net.d/10-calico.conflist could have a more generic name, I'd need to chain calico and istio [15:31:27] but if I change that name I'd probably need to restart the kubelets [15:31:48] and I am not entirely sure if this can be done safely without draining etc.. first [15:32:01] (never done it so if anybody has experience lemme know :) [15:32:12] otherwise I can generalize the puppet class and keep the old name [15:32:14] with some commend [15:32:18] *comment [15:32:24] (the old filename I mean) [15:36:00] or I can simply keep the calico::cni class and add istio-specific bits [15:36:05] for the moment at least [15:36:12] what is the general preference? [15:39:07] I'd opt for generalizing. AFAIK /restarting/ kubelet does not affect happy pods (should really test that in staging-codfw :)). But I've no idea if that also applies if things around cni change... [15:40:32] I guess that's an easy thing to test as well. Nevertheless I would make sure to restart kubelet one-by-one to not wrack multiple nodes if something goes wrong [15:43:19] definitely, we can do one-by-one [15:43:51] ok so I'll try to come up with something that can be rolled out per-cluster [16:20:51] do you even need to restart the kubelets? [16:21:09] * akosiaris trying to remember [16:22:00] akosiaris: o/ I supposed so if we change the config file no? [16:22:24] or does it pick what if finds under /etc/cni/.. every time a pod starts? [16:22:34] yeah, I don't think you need to. I think it does the latter [16:22:41] ah nice [16:22:52] also, no need to change the name, you can just add 10-istio.conf [16:24:02] e.g. look at a podman /etc/cni/net.d I have in a personal box [16:24:04] akosiaris: but I'd need to chain it to the calico plugin IIUC, so in my tests I added it in the same file [16:24:34] 87-podman-bridge.conflist [16:24:34] 87-podman-ptp.conflist [16:25:56] hmm if you want to chain it, then yeah you might need to alter the plugins: [] array [16:26:51] I don't necessarily want to chain it :D, but https://istio.io/latest/docs/setup/additional-setup/cni/ seems to indicate that it is meant to be chained with calico [16:27:04] (or other similar tools) [16:28:05] the install-cni daemonset-horror that we discussed a while ago parse a target calico cni config and injects istio cni json [16:28:12] The Istio CNI plugin operates as a chained CNI plugin. This means its configuration is added to the existing CNI plugins configuration as a new configuration list element. See the CNI specification reference for further details [16:28:18] yeah needs to be a chained plugin [16:30:36] I'll try to come up with a proposal for a more generic cni config [16:30:39] elukey: in that case, it probably is cleaner to amend calico::cni with a parameter like $istio_enabled defaulting to false [16:30:47] and add it to the list if it is enabled [16:31:50] akosiaris: jayme keeps pushing me to write more generic code so I blindly trust his opinion [16:31:54] since istio-cni is a dependant entity (can't be used on its own), that's probably ok [16:33:29] I am ok to extend the calico-cni class with some extra istio-related options, jayme wdyt? [16:37:29] fine with me as well (if what akosiaris says is correct :)). And also: lol :D [16:40:23] ack I'll try to send a code change, if it is too horrible we can change approach [16:41:12] nothing is ever horrible with you :p [16:42:29] too kind <3 [16:42:45] totaly unrelated: I send some CRs your way :D [17:19:49] 10serviceops, 10Wikimedia-Developer-Portal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) {T297167} has been completed with a "low" risk assessment. [17:26:08] 10serviceops, 10Wikimedia-Developer-Portal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) [17:26:24] 10serviceops, 10Wikimedia-Developer-Portal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) [17:27:28] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) [17:30:53] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) This deployment could be an early adopter of {T290966}. [17:34:45] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) @akosiaris just a ping to let you know that this should be ready to move forward from the point of view of the site being ready for... [20:41:23] 10serviceops, 10SRE, 10Wikidata, 10wdwb-tech: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10LGoto)