[08:21:28] 10serviceops, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Modernize etcd tlsproxy certificate management - https://phabricator.wikimedia.org/T307382 (10MoritzMuehlenhoff) [08:34:38] 10serviceops, 10ops-eqiad: Broken PSU on mw1435 - https://phabricator.wikimedia.org/T332117 (10MoritzMuehlenhoff) [08:34:52] 10serviceops, 10ops-eqiad: Broken PSU on mw1435 - https://phabricator.wikimedia.org/T332117 (10MoritzMuehlenhoff) p:05Triage→03Medium [08:38:26] 10serviceops, 10ops-codfw: Broken PSU on parse2004 - https://phabricator.wikimedia.org/T332119 (10MoritzMuehlenhoff) [08:38:29] 10serviceops, 10ops-codfw: Broken PSU on parse2004 - https://phabricator.wikimedia.org/T332119 (10MoritzMuehlenhoff) p:05Triage→03Medium [09:49:43] hnowlan: are you doing device-analytics stuff? [09:50:37] I just failed to deploy it to staging because of a port allocation error [10:13:30] thumbor failed to deploy as well [10:14:03] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Refactor common_templates/0.2/default-network-policy-conf.yaml into a GlobalNetworkPolicy - https://phabricator.wikimedia.org/T275035 (10JMeybohm) I've depoyed the following services after updating their charts removing default-network-policy in staging: - ap... [10:16:49] jayme: yep, have a fix for that https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/898820 [10:16:54] thumbor I'm less certain about [10:18:11] won't you run device-analytics behind ingress anyways? [10:18:34] no, I figured the traffic levels might be of concern [10:19:02] ah, I must admin I'm not in the topic [10:19:47] I thought it's internal stuff for querying IP data outside of the user path [10:21:00] nah it'll be handling some of the status urls for AQS [10:21:32] ack [10:27:27] I'm deploying device-analytics namespace stuff to wikikube clusters, need to run helmfile anyways [10:41:23] cool [11:13:26] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate testwikidata to Kubernetes - https://phabricator.wikimedia.org/T331268 (10Clement_Goubert) 05In progress→03Resolved `test.wikidata.org` will now be progressively moved to mw-on-k8s as puppet runs happen. Feel free to open subtasks for a... [11:15:37] 10serviceops, 10Prod-Kubernetes: cert-manager created multiple CertificateRequest objects with the same certificate-revision - https://phabricator.wikimedia.org/T304092 (10JMeybohm) 05Open→03Resolved >>! In T304092#8693216, @JMeybohm wrote: > This is running in wikikube and ml-staging now. Will give it a c... [11:16:03] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update cert-manager to 1.10.x - https://phabricator.wikimedia.org/T325292 (10JMeybohm) 05Open→03Resolved cert-manager is on 1.10.1 on all clusters now. [11:16:05] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [11:38:53] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Migrate charts away from deprecated typology annotations - https://phabricator.wikimedia.org/T325066 (10JMeybohm) Merged changes to coredns config and the calico chart. Deployment of the latter I will piggyback on https://gerrit.wikimedia.o... [11:49:18] 10serviceops, 10SRE: Migrate conf2* hosts to bullseye - https://phabricator.wikimedia.org/T332010 (10Clement_Goubert) p:05Triage→03Medium [11:49:24] 10serviceops, 10SRE: Migrate dragonfly-supernodes to bullseye - https://phabricator.wikimedia.org/T332011 (10Clement_Goubert) p:05Triage→03Medium [11:49:30] 10serviceops: Migrate kafka-main to bullseye - https://phabricator.wikimedia.org/T332013 (10Clement_Goubert) p:05Triage→03Medium [11:50:01] 10serviceops: Migrate poolcounter hosts to bullseye - https://phabricator.wikimedia.org/T332015 (10Clement_Goubert) p:05Triage→03Medium [11:50:05] 10serviceops: Migrate docker registry hosts to bullseye - https://phabricator.wikimedia.org/T332016 (10Clement_Goubert) p:05Triage→03Medium [11:52:04] 10serviceops, 10Machine-Learning-Team, 10SRE, 10Language-Team (Language-2023-January-March), 10Service-deployment-requests: New Service Deployment Request: NNLB-200 for machine translation - https://phabricator.wikimedia.org/T329971 (10Clement_Goubert) p:05Triage→03Medium [11:53:29] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10Clement_Goubert) p:05Triage→03Medium [11:56:46] 10serviceops, 10mwcli: Create /nonexistent directory for nobody user in golang images - https://phabricator.wikimedia.org/T331209 (10Clement_Goubert) A possible workaround could be to add the correct stanza to `/etc/gitconfig` [11:58:03] 10serviceops, 10Sustainability (Incident Followup): Expand upon Kask/Sessionstore documentation - https://phabricator.wikimedia.org/T320398 (10Clement_Goubert) p:05Triage→03Medium [11:59:21] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 10Kubernetes: etcd cluster reimage strategies to use with the K8s upgrade cookbook - https://phabricator.wikimedia.org/T330060 (10Clement_Goubert) p:05Triage→03Medium [12:02:04] 10serviceops, 10Release-Engineering-Team, 10mwcli, 10serviceops-collab: Create /nonexistent directory for nobody user in golang images - https://phabricator.wikimedia.org/T331209 (10Clement_Goubert) Tagging #release-engineering-team and #serviceops-collab teams because it involves CI and gitlab. [12:05:41] 10serviceops, 10SRE: ICU transition towards ICU 67 - https://phabricator.wikimedia.org/T329491 (10Clement_Goubert) p:05Triage→03Medium [12:09:55] 10serviceops, 10Release-Engineering-Team, 10mwcli, 10serviceops-collab: Create /nonexistent directory for nobody user in golang images - https://phabricator.wikimedia.org/T331209 (10Clement_Goubert) p:05Triage→03Medium [12:24:52] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate testwikidata to Kubernetes - https://phabricator.wikimedia.org/T331268 (10Lucas_Werkmeister_WMDE) FWIW, I tried running our browser test suite against Wikidata. ` $ nvm use 14 $ node --version v14.19.1 $ MW_SERVER=https://test.wikidata.org... [13:32:49] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate charts away from deprecated typology annotations - https://phabricator.wikimedia.org/T325066 (10JMeybohm) coredns change applied to all clusters [14:23:23] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [14:26:06] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10herron) [14:26:40] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [15:00:40] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10bking) [15:02:00] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) [15:21:37] 10serviceops, 10MW-on-K8s, 10Scap: Add a flag to scap to force updating the /etc/helmfile-defaults/mediawiki/release/* files - https://phabricator.wikimedia.org/T332187 (10Clement_Goubert) [15:22:06] 10serviceops, 10MW-on-K8s, 10Scap: Add a flag to scap to force updating the /etc/helmfile-defaults/mediawiki/release/* files - https://phabricator.wikimedia.org/T332187 (10Clement_Goubert) p:05Triage→03High [17:07:34] 10serviceops: Migrate poolcounter hosts to bullseye - https://phabricator.wikimedia.org/T332015 (10akosiaris) [17:09:17] 10serviceops, 10SRE: Migrate conf2* hosts to bullseye - https://phabricator.wikimedia.org/T332010 (10akosiaris) [18:11:11] 10serviceops, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Eevans) [18:58:59] 10serviceops, 10Sustainability (Incident Followup): Alert on Kask error rate - https://phabricator.wikimedia.org/T320401 (10BCornwall) 05Open→03Declined T327960 alerts on 500s, so this has already been implemented. [19:37:54] 10serviceops, 10SRE, 10ops-eqiad: Broken PSU on mw1435 - https://phabricator.wikimedia.org/T332117 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr Reseated powercord