[08:52:39] 06serviceops, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 07Kubernetes: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984#9595007 (10JMeybohm) With the next k9s upgrade we already have the following dependency problems: - We need to migrate to containerd before moving to... [11:18:36] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595448 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2314.codfw.wmnet with OS bullseye [11:18:43] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595449 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2315.codfw.wmnet with OS bullseye [11:18:49] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595450 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2316.codfw.wmnet with OS bullseye [11:18:59] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595451 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2320.codfw.wmnet with OS bullseye [11:19:05] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595452 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2321.codfw.wmnet with OS bullseye [11:19:18] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595453 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2322.codfw.wmnet with OS bullseye [11:27:55] 06serviceops, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): openjdk and spark Docker images fail to build from sources - https://phabricator.wikimedia.org/T358866#9595466 (10BTullis) [11:34:09] 06serviceops, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): openjdk and spark Docker images fail to build from sources - https://phabricator.wikimedia.org/T358866#9595476 (10BTullis) p:05Triage→03Medium [11:55:15] 06serviceops, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): openjdk and spark Docker images fail to build from sources - https://phabricator.wikimedia.org/T358866#9595559 (10MoritzMuehlenhoff) This looks like a bug indeed. The Java 11/17 packages in Bullseye/Bookworm _do_ ship /usr/share/binfmts in openjdk-... [12:02:01] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595578 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2320.codfw.wmnet with OS bullseye completed: - mw23... [12:04:02] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595603 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2316.codfw.wmnet with OS bullseye completed: - mw23... [12:04:22] 06serviceops, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): openjdk and spark Docker images fail to build from sources - https://phabricator.wikimedia.org/T358866#9595606 (10MoritzMuehlenhoff) This is already fixed in the Debian packaging repo, but not yet released: https://evolvis.org/plugins/scmgit/cgi-bin/... [12:06:05] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595615 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2322.codfw.wmnet with OS bullseye completed: - mw23... [12:08:39] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595627 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2314.codfw.wmnet with OS bullseye completed: - mw23... [12:11:46] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595628 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2315.codfw.wmnet with OS bullseye completed: - mw23... [12:13:15] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595630 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2321.codfw.wmnet with OS bullseye completed: - mw23... [12:44:54] 06serviceops, 06Infrastructure-Foundations, 06SRE: ferm sometimes fails to restart on Kubernetes workers via xtables lock held by kube-proxy - https://phabricator.wikimedia.org/T354855#9595712 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Deployed, puppet now restarts ferm.service if the sy... [12:53:13] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595751 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1350.eqiad.wmnet with OS bullseye [12:53:20] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595752 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1351.eqiad.wmnet with OS bullseye [12:53:26] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595753 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1352.eqiad.wmnet with OS bullseye [12:53:34] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595754 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1353.eqiad.wmnet with OS bullseye [12:53:41] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595755 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1354.eqiad.wmnet with OS bullseye [13:29:06] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595868 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1350.eqiad.wmnet with OS bullseye completed: - mw13... [13:31:54] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1352.eqiad.wmnet with OS bullseye completed: - mw1352 (**PASS**) - Down... [13:33:39] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1351.eqiad.wmnet with OS bullseye completed: - mw1351 (**PASS**) - Down... [13:36:43] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595897 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1353.eqiad.wmnet with OS bullseye completed: - mw1353 (**PASS**) - Down... [13:39:23] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9595910 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1354.eqiad.wmnet with OS bullseye completed: - mw1354 (**PASS**) - Down... [13:51:47] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9595953 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=63abb5d8-03a7-48ae-abcc-214900c13c28) set by akosiaris@cumin1002 for 2:0... [14:13:07] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9596041 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1010.eqiad.wmnet with OS bullseye [14:23:49] 06serviceops: Build and deploy LuaSandbox 4.1.2 - https://phabricator.wikimedia.org/T353414#9596098 (10Joe) Re-uploaded the packages to the right components. [14:28:24] 06serviceops: php7.4-fpm-multiversion-base Docker image fails to build - https://phabricator.wikimedia.org/T358867#9596106 (10Joe) p:05Triage→03High a:03Joe [14:39:04] 06serviceops, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): openjdk and spark Docker images fail to build from sources - https://phabricator.wikimedia.org/T358866#9596142 (10BTullis) a:05BTullis→03MoritzMuehlenhoff I've applied the workaround for the JRE image and built it successfully. Now I'm running `b... [14:45:33] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9596165 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1010.eqiad.wmnet with OS bullseye c... [15:00:57] 06serviceops, 13Patch-For-Review: php7.4-fpm-multiversion-base Docker image fails to build - https://phabricator.wikimedia.org/T358867#9596297 (10Joe) 05Open→03Resolved [15:09:00] 06serviceops, 10CommRel-Specialists-Support, 07User-notice: CommRel support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9596330 (10jijiki) @Trizek-WMF as per our off-phabricator discussion, the major change is that this is not a procedure we test anymore, but... [15:15:52] 06serviceops: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366#9596366 (10bking) ^^ Working on the above flink-operator change in T358879 . [15:17:12] 06serviceops, 10CommRel-Specialists-Support, 07User-notice: CommRel support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9596383 (10Trizek-WMF) [15:18:21] 06serviceops, 10CommRel-Specialists-Support, 07User-notice: CommRel support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9596390 (10Trizek-WMF) [15:19:25] 06serviceops, 10CommRel-Specialists-Support, 07User-notice: CommRel support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9596391 (10Trizek-WMF) The message was updated to remove the idea of a test. As everything is okay, I can continue with the next steps. [15:23:20] 06serviceops, 10CommRel-Specialists-Support, 07User-notice: CommRel support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9596410 (10jijiki) Looks alright! [15:29:28] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9596436 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1011.eqiad.wmnet with OS bullseye [15:32:15] 06serviceops, 06Infrastructure-Foundations, 06SRE, 07ARM support: Adoption of aarch64 (aka arm64) in WMF production? (SRE Summit 2022 Session) - https://phabricator.wikimedia.org/T320811#9596480 (10MoritzMuehlenhoff) p:05Triage→03Medium [16:03:17] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9596708 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1011.eqiad.wmnet with OS bullseye c... [16:04:59] 06serviceops, 10MW-on-K8s, 10Release Pipeline: Pushes to docker-registry fail for images with compressed layers of size >1GB - https://phabricator.wikimedia.org/T288198#9596715 (10elukey) @akosiaris Hi! Getting back to the issue, this time in a different form T359067. We have followed your suggestion for the... [16:05:32] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9596718 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1020.eqiad.wmnet with OS bullseye [16:08:06] jayme: o/ (when you have time) - I opened https://phabricator.wikimedia.org/T359067 to discuss with you and your team how it is best to proceed (without tearing down the registry :D). I see from irc logs that you already talked about the issue with Aiko, this is why I summoned you :) [16:11:44] elukey: will take a look [16:20:54] <3 [16:21:09] I also pinged Alex in the other task that was opened to serviceops [16:39:19] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597019 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1020.eqiad.wmnet with OS bullseye c... [16:41:08] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597042 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1021.eqiad.wmnet with OS bullseye [17:08:02] 06serviceops, 10Prod-Kubernetes, 06SRE: Kubernetes apiserver probe failures on restart - https://phabricator.wikimedia.org/T358936#9597218 (10RLazarus) [17:14:47] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597298 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1021.eqiad.wmnet with OS bullseye c... [17:16:34] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597325 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1022.eqiad.wmnet with OS bullseye [17:34:29] 06serviceops, 10Thumbor, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): Thumbor's use of poolcounter is rate limiting Kubernetes IPs - https://phabricator.wikimedia.org/T339863#9597447 (10MarkTraceur) [17:34:44] 06serviceops, 10Thumbor, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881#9597448 (10MarkTraceur) [17:36:53] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1012.eqiad.wmnet with OS bullseye [17:49:28] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597506 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1022.eqiad.wmnet with OS bullseye c... [17:52:56] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597521 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1023.eqiad.wmnet with OS bullseye [18:09:33] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597590 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1012.eqiad.wmnet with OS bullseye c... [18:26:26] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597622 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1023.eqiad.wmnet with OS bullseye c... [18:27:03] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597623 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1024.eqiad.wmnet with OS bullseye [18:40:50] 06serviceops, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9597703 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1024.eqiad.wmnet with OS bullseye e... [21:09:03] 06serviceops: etcdmirror does not recover from a cleared waitIndex - https://phabricator.wikimedia.org/T358636#9598268 (10Scott_French) Last Friday, I put together a simple stress test for etcd-mirror, with the goal of measuring replication delay under a range of background write rates from an "antagonist" workl...