[09:02:34] I'm seeking a reviewer/input to move mw edit failures alert to prometheus (the mw part is done, metrics are in prometheus already) https://gerrit.wikimedia.org/r/c/operations/alerts/+/991007 [09:02:47] the graphite removal part is at https://gerrit.wikimedia.org/r/c/operations/puppet/+/991008 [09:37:46] 10serviceops, 10SRE, 10ops-codfw: Broken CPU on mw2394 - https://phabricator.wikimedia.org/T354193 (10Clement_Goubert) Repooled, thank you @Jhancock.wm [10:33:45] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10Clement_Goubert) 05Open→03Resolved >>! In T352893#9471788, @ayounsi wrote: > Nice !! >... [10:34:03] 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, 10netops: Test IP-renumbering on kubestage2002.codfw.wmnet - https://phabricator.wikimedia.org/T352883 (10Clement_Goubert) 05In progress→03Resolved [11:32:51] 10serviceops, 10conftool, 10Patch-For-Review: requestctl should fail with error if fails parsing yaml file - https://phabricator.wikimedia.org/T355256 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium a:03Clement_Goubert [12:19:47] 10serviceops, 10Infrastructure-Foundations, 10Release-Engineering-Team (Radar): Allow release engineering to delete images - https://phabricator.wikimedia.org/T354786 (10Clement_Goubert) p:05Triage→03Low [12:47:23] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Move 40% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T355532 (10Clement_Goubert) [12:48:40] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Move 40% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T355532 (10Clement_Goubert) p:05Triage→03High [12:53:06] 10serviceops, 10Patch-For-Review: Remove tls-proxy cpu limits on eventstreams - https://phabricator.wikimedia.org/T345243 (10Clement_Goubert) [12:53:08] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Limit the concurrency of envoy in service mesh - https://phabricator.wikimedia.org/T354532 (10Clement_Goubert) [12:53:10] 10serviceops, 10MW-on-K8s: mw-on-k8s tls-proxy container CPU throttling at low average load - https://phabricator.wikimedia.org/T344814 (10Clement_Goubert) [12:53:28] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Limit the concurrency of envoy in service mesh - https://phabricator.wikimedia.org/T354532 (10Clement_Goubert) [12:53:30] 10serviceops, 10MW-on-K8s: mw-on-k8s tls-proxy container CPU throttling at low average load - https://phabricator.wikimedia.org/T344814 (10Clement_Goubert) [12:53:34] 10serviceops, 10Patch-For-Review: Remove tls-proxy cpu limits on eventgate - https://phabricator.wikimedia.org/T345244 (10Clement_Goubert) [12:59:32] 10serviceops, 10MW-on-K8s, 10Quality-and-Test-Engineering-Team, 10SRE: Move testwiki over to mw-on-k8s - https://phabricator.wikimedia.org/T355534 (10Clement_Goubert) [13:40:36] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE ( 2023/24 Q3 Milestone 2): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10Gehel) [13:40:44] 10serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE ( 2023/24 Q3 Milestone 2), 10Kubernetes, 10Patch-For-Review: Improve how we address outside k8s infrastructure from within charts (e.g. network policies) - https://phabricator.wikimedia.org/T331894 (10Gehel) [13:41:49] 10serviceops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10Gehel) [14:27:08] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10Gehel) p:05Triage→03High [14:27:26] 10serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Kubernetes, 10Patch-For-Review: Improve how we address outside k8s infrastructure from within charts (e.g. network policies) - https://phabricator.wikimedia.org/T331894 (10Gehel) p:05Triage→03Medium [14:56:35] 10serviceops, 10Data-Engineering, 10Data-Platform-SRE, 10SRE, and 3 others: Upgrade Kafka to from 1.x to later version - https://phabricator.wikimedia.org/T300102 (10brouberol) [15:55:46] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1486.eqiad.wmnet with OS bullseye [15:55:56] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1495.eqiad.wmnet with OS bullseye [16:05:51] 10serviceops, 10collaboration-services, 10GitLab (CI & Job Runners): Create a staging apt repository for CI-based builds of Debian packages - https://phabricator.wikimedia.org/T347004 (10MoritzMuehlenhoff) [16:31:11] 10serviceops, 10MW-on-K8s, 10SRE: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1486.eqiad.wmnet with OS bullseye completed: - mw1486 (**PASS**) - Downtimed on Icinga/Alertma... [16:39:10] 10serviceops, 10MW-on-K8s, 10SRE: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1495.eqiad.wmnet with OS bullseye completed: - mw1495 (**WARN**) - Downtimed on Icinga/Alertma... [23:47:14] 10serviceops, 10SRE: Scap Error - https://phabricator.wikimedia.org/T355622 (10Mstyles)