[08:07:11] kamila_: claime: you have sent some wip/exercise changes for operations/mediawiki-config from early February. Looks like they can now be abandoned https://gerrit.wikimedia.org/r/q/project:operations/mediawiki-config+status:open+message:exercise :) [08:07:32] no harm / no rush :] [08:26:42] hashar: ack, will do, thanks [08:41:45] hashar: well since they have never been corrected because j.oe forgot, yeah, I could, or someone could take it upon themself to correct them. [08:42:28] 06serviceops, 06MediaWiki-Engineering, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9739418 (10jijiki) p:05Triage→03High I am marking this as High Priority because the current status is: * codfw is using a mcrouter da... [08:47:27] claime: up to you :) I have casually encountered those changes and thought they might have forgotten after some pairing / training session [08:48:14] You're not wrong that they've been forgotten x) [08:48:57] Done. [09:15:33] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9739549 (10Clement_Goubert) [10:25:21] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Co-locate kube-apiserver and etcd on new staging control plane nodes - https://phabricator.wikimedia.org/T363307 (10JMeybohm) 03NEW [10:43:30] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310 (10JMeybohm) 03NEW [10:43:59] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9739865 (10JMeybohm) [11:06:15] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Allow to address Kubernets API servers from NetworkPolicy - https://phabricator.wikimedia.org/T287491#9739912 (10jijiki) [11:22:26] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migration to containerd and away from docker - https://phabricator.wikimedia.org/T362408#9739929 (10akosiaris) >>! In T362408#9712356, @JMeybohm wrote: > @akosiaris could you please double check in your test environment that containerd will still enforce the def... [11:23:19] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9739926 (10akosiaris) Adding as info since it was requested in T362408#9712356 Just started up a simple k0s cluster with 1 controller and 1 worker ` r... [11:30:22] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9739949 (10MoritzMuehlenhoff) Looks good. We can't disable DRBD on instance creation currently, simply add it as usual an... [11:44:32] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9739995 (10akosiaris) Adding for bookwork ` root@containerd:~# ctr version Client: Version: 1.6.20~ds1 Revision: 1.6.20~ds1-1+b1 Go version: go1... [12:06:56] 06serviceops, 10Observability-Logging, 13Patch-For-Review: Logs from containers sometimes not visible in logstash - https://phabricator.wikimedia.org/T357616#9740096 (10fgiunchedi) The bandaid is in place (restart `rsyslog.service` every 4 hours, the 4 is a magic number, it can be tweaked). Let's see how we... [12:32:11] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Co-locate kube-apiserver and etcd on new staging control plane nodes - https://phabricator.wikimedia.org/T363307#9740250 (10JMeybohm) [13:34:58] 06serviceops, 06MediaWiki-Engineering, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9740514 (10MSantos) Moving to radar on our side. Please, let me know if there's any action we should take on this ticket. [13:59:58] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migration to containerd and away from docker - https://phabricator.wikimedia.org/T362408#9740630 (10akosiaris) [14:00:09] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migration to containerd and away from docker - https://phabricator.wikimedia.org/T362408#9740635 (10akosiaris) [14:07:07] 06serviceops, 13Patch-For-Review: Update cache.mrouter modules in deployment-charts - https://phabricator.wikimedia.org/T355237#9740648 (10jijiki) That is my doing, I shouldn't have marked this task as resolved. While I was doing some other work, I found that the definition of `cache.mcrouter.deployment` was k... [14:17:19] 06serviceops, 10MW-on-K8s: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284#9740671 (10Clement_Goubert) `sidecar-controller` regularly gets OOMKilled ([[ https://logstash.wikimedia.org/goto/4b3fee4d74ff235a44ebc183665cbd27 | Logstash ]], [[ https://grafana.wikime... [14:38:10] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9740718 (10akosiaris) I 've also just run kubelet 1.23 in standalone mode talking to containerd and indeed processes in containers run with `cri-contain... [14:38:35] 06serviceops, 10Data-Platform-SRE (2024.04.15 - 2024.05.05), 07Kubernetes, 13Patch-For-Review: Fix rendering issue in modules.app.job when cronjobs are enabled and private values are defined - https://phabricator.wikimedia.org/T362954#9740710 (10jijiki) @brouberol thank you for finding this. While I had sp... [14:41:42] 06serviceops, 10envoy, 10observability, 13Patch-For-Review: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568#9740720 (10akosiaris) Since mesh.configuration 1.7, envoy on WikiKube and other kubernetes clusters listens on IPv6 and IPv4 for both the TLS terminator and the servi... [15:10:23] 06serviceops, 10MW-on-K8s, 10TimedMediaHandler, 13Patch-For-Review, 07Video: Port videoscaling to kubernetes - https://phabricator.wikimedia.org/T355292#9740846 (10hnowlan) [17:42:49] 06serviceops, 10MW-on-K8s: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284#9741705 (10RLazarus) Thanks. At present the controller //monitors// all namespaces, but ignores pods other than in `mw-script`. So if I were estimating memory usage I'd base it on the tot... [19:49:55] 06serviceops, 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399 (10RobH) 03NEW [19:50:18] 06serviceops, 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install parsoidtest1001 - https://phabricator.wikimedia.org/T363399#9742501 (10RobH) [19:52:07] 06serviceops: parsoidtest1001 implementation tracking - https://phabricator.wikimedia.org/T363402 (10RobH) 03NEW [19:52:46] 06serviceops: parsoidtest1001 implementation tracking - https://phabricator.wikimedia.org/T363402#9742534 (10RobH) [19:55:51] 06serviceops, 13Patch-For-Review: etcdmirror does not recover from a cleared waitIndex - https://phabricator.wikimedia.org/T358636#9742604 (10Scott_French) [20:04:18] hi akosiaris jayme -- as discussed at the k8s sig, proposal at https://phabricator.wikimedia.org/T363407 [21:03:23] 06serviceops, 06SRE: upgrade deployment servers to bullseye / add bullseye support to puppet role - https://phabricator.wikimedia.org/T363415 (10Dzahn) 03NEW [21:04:16] 06serviceops, 06SRE: upgrade deployment servers to bullseye / add bullseye support to puppet role - https://phabricator.wikimedia.org/T363415#9742836 (10Dzahn) [21:53:10] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 12), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9742947 (10Scott_French) 05Open→03In progress Thanks, all, for the details shared thus far. Whi... [22:45:49] 06serviceops, 13Patch-For-Review: etcdmirror does not recover from a cleared waitIndex - https://phabricator.wikimedia.org/T358636#9743031 (10Scott_French) 05Open→03In progress All patches to support the migration described in T358636#9699378 are ready. Many thanks to @Volans for the reviews. The one mino...