[07:39:48] <elukey>	 hello folks, I am powercycling parse1012, there was a cpu error in the racadm getsel
[07:41:08] <elukey>	 also depooled it
[07:43:21] <elukey>	 I don't see anything weird for the moment but I'll leave the pool action to you (in case you want to double check)
[09:08:09] <claime>	 elukey: thanks <3
[10:00:45] <elukey>	 folks if nobody opposes I'd merge https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/886862 and roll it out
[10:01:06] <elukey>	 it seems really harmless and causing a ton of spam 
[10:01:19] <elukey>	 (also pods are trying and failing to use non-kafka nodes)
[10:04:17] <jayme>	 fine by me
[10:05:15] <elukey>	 ok proceeding
[10:16:27] <claime>	 regarding parse1012, SOP for this issue (according to dell anyways) is to update the firmware, clear the log, and see if it happens again
[10:42:49] <elukey>	 change to eventgate-logging-external rolled out, all good afaics
[10:45:39] <claime>	 jayme: opinion on what to do with parse1012?
[10:47:40] <elukey>	 claime: if you want to be extra sure, we can open a task to the ops-eqiad folks to upgrade the firmware, or we can repool and keep watching it (if it re-happens soon we can depool it again and cut a task to dcops, likely to follow up with dell)
[10:48:05] <claime>	 elukey: Isn't there an upgrade-firmware cookbook we can run ourselves?
[10:48:20] <elukey>	 ah really?
[10:48:23] <claime>	 I think so
[10:48:26] <claime>	 Let me check
[10:48:53] <claime>	 cgoubert@cumin1001:~/cookbooks$ sudo cookbook -l | grep firm
[10:48:55] <claime>	     |   `-- sre.hardware.upgrade-firmware
[10:48:57] <claime>	 yup
[10:49:22] <elukey>	 very nice, then I think we can try it
[10:49:30] <elukey>	 let's check previous usages of it in phab
[10:49:33] <elukey>	 just to be sure
[10:49:36] <claime>	 yep
[10:50:21] <jayme>	 claime: sorry, I have zero context currently
[10:50:38] <claime>	 jayme: no problem.
[10:52:48] <elukey>	 what I'd do is the following - repool the node and keep it monitored, these issues may happen from time to time. If it re-happens, we can cut a task to ops-eqiad and ask to them what is the best option
[10:53:11] <elukey>	 I am worried about randomly upgrading firmwares without them knowing
[10:53:20] <claime>	 Understandably
[10:53:45] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 10Kubernetes: Kubernetes v1.23 multi master setup is broken - https://phabricator.wikimedia.org/T329826 (10JMeybohm)
[10:53:55] <claime>	 Let's do that then, I'll repool it
[10:54:14] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 10Kubernetes: Kubernetes v1.23 multi master setup is broken - https://phabricator.wikimedia.org/T329826 (10JMeybohm) p:05Triage→03High
[10:54:28] <elukey>	 super, I rechecked getsel on idrac and nothing new popped up
[11:05:25] <wikibugs>	 10serviceops, 10Kubernetes: Add a second control-plane to wikikube staging clusters - https://phabricator.wikimedia.org/T329827 (10JMeybohm) p:05Triage→03High
[11:19:10] <volans>	 pro-tip: cookbook -lv gives you also a one-line description of the cookbook :)
[11:19:39] <claime>	 thanks :D
[11:29:16] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: spicerack.mysql_legacy errors on get_core_masters_heartbeats when checking x2 - https://phabricator.wikimedia.org/T329533 (10Clement_Goubert) The above patch removes `x2` from the core databases, and removes the now unu...
[11:40:14] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: spicerack.mysql_legacy errors on get_core_masters_heartbeats when checking x2 - https://phabricator.wikimedia.org/T329533 (10Ladsgroup) Yes, that's the way we should do it given Manuel's comment above and my basic under...
[11:47:27] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10CDanis)
[11:49:25] <wikibugs>	 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: spicerack.mysql_legacy errors on get_core_masters_heartbeats when checking x2 - https://phabricator.wikimedia.org/T329533 (10Clement_Goubert) Thanks @Ladsgroup once the spicerack release is done I'll test the cookbook p...
[11:54:16] <wikibugs>	 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: What should happen to Toolhub during the 2023 DC switch? - https://phabricator.wikimedia.org/T329319 (10Clement_Goubert) That seems good to me, as long as you're ok with the downtimes and maintenances.
[16:39:22] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Kubernetes v1.23 multi master setup is broken - https://phabricator.wikimedia.org/T329826 (10JMeybohm)
[17:29:31] <wikibugs>	 10serviceops, 10Data-Persistence, 10Toolhub, 10Datacenter-Switchover: What should happen to Toolhub during the 2023 DC switch? - https://phabricator.wikimedia.org/T329319 (10bd808) >>! In T329319#8621299, @Clement_Goubert wrote: > That seems good to me, as long as you're ok with the downtimes and maintenan...
[22:49:46] <wikibugs>	 10serviceops, 10SRE, 10Traffic: Upgrade envoyproxy to 1.16.2 - https://phabricator.wikimedia.org/T271407 (10BCornwall) Envoy seems to be on 1.18.2 now. Can this be closed, or was there any other deployment need this ticket addresses?