[07:30:29] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9747135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemast... [08:33:53] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9747162 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster20... [11:11:12] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9747539 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: `kubestagemaster20... [11:18:26] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9747564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemast... [11:50:31] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9747660 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster20... [13:05:37] 06serviceops, 10CirrusSearch, 06Discovery-Search: Completion suggester can promote a bad build - https://phabricator.wikimedia.org/T363521#9747865 (10dcausse) tagging @serviceops for help regarding the connectivity issue and this new `delayed connect error: 113` error [13:07:29] 06serviceops, 10CirrusSearch, 06Discovery-Search: Completion suggester can promote a bad build - https://phabricator.wikimedia.org/T363521#9747888 (10akosiaris) >>! In T363521#9746443, @EBernhardson wrote: > Looking at the `Host overview` dashboard for mwmaint1002 for today can see that there were intermitte... [13:26:43] 06serviceops, 10CirrusSearch, 06Discovery-Search: Completion suggester can promote a bad build - https://phabricator.wikimedia.org/T363521#9747912 (10akosiaris) {F48789159} This is pretty concerning. What we also see is `unknown: Status code 503; upstream connect error or disconnect/reset before headers. re... [13:28:48] 06serviceops, 10CirrusSearch, 06Discovery-Search: Completion suggester can promote a bad build - https://phabricator.wikimedia.org/T363521#9747932 (10akosiaris) This https://sal.toolforge.org/log/TXIJEo8BGiVuUzOdIZbf lines up perfectly with the beginning of the errors, so I just reverted it. [13:31:39] dcausse: with the revert, I think that the logstash dashboard you linked no longer has errors [13:31:52] last one apparently was at 13:28:06 [13:32:26] looks like the mitigation worked, [13:32:27] akosiaris: thanks! so definitely something weird happening on these new hosts [13:32:32] yup [13:32:40] inflatador: ^ [13:32:45] akosiaris dcausse do you mind if we keep discussion in #sre ? 3 rooms at once not great ;) [13:32:53] sorry :} [13:32:57] oh, sorry, my bad [14:59:24] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464#9748138 (10JMeybohm) I ran a couple of very basic benchmarks (commands in the attached filed) against single node etcd instances running on: - A mediawiki application... [15:21:02] 06serviceops, 06SRE: Container Image policy for non-k8s uses - https://phabricator.wikimedia.org/T357441#9748210 (10BTullis) I won't reopen this ticket, but I would like to draw your collective attention to this ticket, if I may: {T363558} The use-case is very similar to that discussed here, but the questi... [15:25:51] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9748260 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: `kubestagemaster20... [15:29:59] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review: scap should optionally display helmfile diffs for review - https://phabricator.wikimedia.org/T362717#9748270 (10CodeReviewBot) dancy merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/295 Optionally collect, display,... [16:27:22] 06serviceops, 06Release-Engineering-Team, 10Scap: scap should optionally display helmfile diffs for review - https://phabricator.wikimedia.org/T362717#9748533 (10Scott_French) 05Open→03Resolved Many thanks to @dancy for the review. I think I'm happy with this simple solution for now. The fact that t... [22:55:15] 06serviceops, 06SRE, 13Patch-For-Review: upgrade deployment servers to bullseye / add bullseye support to puppet role - https://phabricator.wikimedia.org/T363415#9750086 (10Dzahn) 05Open→03In progress a:03Dzahn