[00:24:01] 06serviceops, 06SRE, 13Patch-For-Review: upgrade deployment servers to bullseye / add bullseye support to puppet role - https://phabricator.wikimedia.org/T363415#9762658 (10Dzahn) The gervert issue (can't find gerrit dsh group) appears to come from https://gerrit.wikimedia.org/r/c/operations/software/gerrit/... [08:14:53] 06serviceops, 10ops-codfw, 06SRE, 13Patch-For-Review: Degraded RAID on mw2382 - https://phabricator.wikimedia.org/T362938#9763009 (10JMeybohm) >>! In T362938#9761317, @jnuche wrote: > Scap failed to connect to this host today during the MediaWiki train while trying to preload the MW image: > `15:08:17 /usr... [08:25:13] 06serviceops, 10ops-codfw, 06SRE, 13Patch-For-Review: Degraded RAID on mw2382 - https://phabricator.wikimedia.org/T362938#9763050 (10jnuche) >> Would it be possibly to remove it temporarily from the list of K8s workers while work is done on it? > > Will do...but I think the right thing to do here is to f... [08:51:14] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Wikikube staging clusters are out of IPv4 Pod IP's - https://phabricator.wikimedia.org/T345823#9763123 (10JMeybohm) staging-eqiad has been migrated to `/28` blocks as well [08:51:24] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Wikikube staging clusters are out of IPv4 Pod IP's - https://phabricator.wikimedia.org/T345823#9763124 (10JMeybohm) 05Open→03Resolved [09:00:41] 06serviceops, 06Commons, 10MediaWiki-File-management, 06SRE, and 2 others: Frequent "Error: 429, Too Many Requests" errors on pages with many (>50) thumbnails - https://phabricator.wikimedia.org/T266155#9763167 (10IagoQnsi) >>! In T266155#9485678, @Bawolff wrote: > Just trying to think up solutions - if th... [09:38:41] 06serviceops, 10ops-codfw, 06SRE: Degraded RAID on mw2382 - https://phabricator.wikimedia.org/T362938#9763294 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e3dd1140-411c-45b4-a1c6-3961f47c4f12) set by jayme@cumin1002 for 7 days, 0:00:00 on 1 host(s) and their services with reason: Degra... [11:15:52] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763469 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1371.... [11:17:27] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763472 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1409.... [11:21:29] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763477 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1435.... [11:21:30] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763478 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1399.... [11:21:32] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763479 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1405.... [11:42:12] hello folks! [11:42:26] In a bit I'll switch lift wing codfw services to mw-api-int-ro [11:44:52] cool [11:49:21] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996 (10elukey) 03NEW [11:49:30] for awareness --^ [11:51:59] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763643 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1371.eqia... [11:53:59] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763647 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1409.eqia... [11:54:16] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9763646 (10MoritzMuehlenhoff) This certificate doesn't show up anywhere in certificate.manifests.d for cergen, though? [11:56:00] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763650 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1405.eqia... [11:57:27] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763652 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1435.eqia... [12:01:12] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9763678 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1399.eqia... [12:23:19] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9763711 (10MoritzMuehlenhoff) [12:29:03] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Wikikube staging clusters are out of IPv4 Pod IP's - https://phabricator.wikimedia.org/T345823#9763715 (10cmooney) Not sure if it might be worth taking a step back and weighing up what's happening here? As I understand it there is a /24 IPv4 allocation for... [13:25:45] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9763825 (10elukey) [13:31:18] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Wikikube staging clusters are out of IPv4 Pod IP's - https://phabricator.wikimedia.org/T345823#9763845 (10JMeybohm) >>! In T345823#9763715, @cmooney wrote: > Not sure if it might be worth taking a step back and weighing up what's happening here? > > As I un... [13:53:07] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9763940 (10JMeybohm) p:05Triage→03High a:03JMeybohm >>! In T363996#9763646, @MoritzMuehlenhoff wrote: > This certificate doesn't show up anywhere in certifica... [14:07:09] FYI, netbox sync is showing me a diff for parse1002 being set to fixed in netbox, I'm merging that along [14:25:35] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9764067 (10elukey) The cert is here: ` elukey@puppetmaster1001:/srv/private$ find -name *session* | grep -v cassandra ./modules/secret/secrets/ssl/sessionstore.dis... [15:21:46] 06serviceops, 06MediaWiki-Engineering, 10MediaWiki-libs-BagOStuff, 06MediaWiki-Platform-Team, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9764256 (10jijiki) [15:35:38] 06serviceops, 06MediaWiki-Engineering, 10MediaWiki-libs-BagOStuff, 06MediaWiki-Platform-Team, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9764301 (10jijiki) [15:35:39] 06serviceops, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar): mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690#9764302 (10jijiki) [15:43:36] as FYI Lift Wing codfw is now using mw-api-int-ro, I'll do eqiad on Monday if nothing explodes :) [15:43:41] latency is good so far [15:52:24] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9764372 (10elukey) @Eevans IIUC kask terminates TLS by itself for session store, is it right? Would it be a problem to move to the `mesh` k8s module, namely to use...