[10:14:31] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Metrics changes with Kubernetes v1.23 - https://phabricator.wikimedia.org/T322919 (10Jelto) I used the above list to grep through `operations/alerts` and found one usage of a deprecated metric i... [10:38:05] 10serviceops, 10Thumbor, 10Thumbor Migration, 10Platform Team Workboards (Platform Engineering Reliability): byte/str TypeError during svg conversion - https://phabricator.wikimedia.org/T325150 (10Clement_Goubert) [10:49:34] 10serviceops, 10Thumbor, 10Thumbor Migration, 10Platform Team Workboards (Platform Engineering Reliability): byte/str TypeError during svg conversion - https://phabricator.wikimedia.org/T325150 (10hnowlan) 05Open→03Resolved [10:49:43] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan) [10:50:03] elukey: updated changeprop is deployed on stagings! [10:58:55] hnowlan: nice!!! \o/ [11:00:00] I've defaulted to trace on them btw so there will be (some) more logs [11:01:55] thanks a lot for all the help! We are fixing an issue with the liftwing pods in staging, after that I'll try to test another event and see how it goes! [11:02:44] not changeprop specific but we should probably actively encourage the removal of puppet_ca_crt usage from charts (gonna do api-gateway today) [11:04:04] hnowlan: now that you mentioned api-gateway - how is it able to contact liftwing's api without the updated ca bundle? [11:04:29] it should in theory return the same issue as changeprop did [11:04:35] ... huh, yeah [11:05:26] oh wait, we just distribute the puppet_ca_crt in configmaps, but we *use* wmf-certificates [11:05:36] thought I was losing my grip on reality there for a second [11:06:16] okok perfect then it makes sense :D [11:16:28] going to pool thumbor in eqiad for a little bit [11:25:56] 10serviceops, 10Observability-Tracing: Add ingress to aux-k8s - https://phabricator.wikimedia.org/T325178 (10Clement_Goubert) [11:26:00] 10serviceops, 10Observability-Tracing: Rename aux-k8s-ingress service to k8s-ingress-aux - https://phabricator.wikimedia.org/T327756 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium [11:27:31] looks like the healthchecking fixes have fixed the pooling issues [11:27:52] depooling for bug fixes [12:09:00] I fixed up my DNS change for adding reverse DNS entries for staging-codfw. https://gerrit.wikimedia.org/r/c/operations/dns/+/883226 [12:09:37] I'd appreciate any more reviews, but if we're OK to go ahead we can see if it fixes the datahub issue. [12:42:54] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: allow mw-deployers to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Clement_Goubert) FYI this created warnings in `cross-validate-accounts`, CR incoming. [12:44:17] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: allow mw-deployers to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10MoritzMuehlenhoff) >>! In T305979#8557028, @Clement_Goubert wrote: > FYI this created warnings in `cross-valid... [12:51:51] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: allow mw-deployers to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Clement_Goubert) >>! In T305979#8557029, @MoritzMuehlenhoff wrote: >>>! In T305979#8557028, @Clement_Goubert w... [14:01:06] 10serviceops, 10Commons, 10MediaWiki-File-management, 10SRE, and 3 others: Frequent "Error: 429, Too Many Requests" errors on pages with many (>50) thumbnails - https://phabricator.wikimedia.org/T266155 (10TheDJ) The above patch should cause native lazy loading of images by the browser. This will cause few... [15:10:46] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert) [15:11:12] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert) [15:16:22] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert) [15:25:25] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10Clement_Goubert) p:05Triage→03Medium [15:35:29] 10serviceops, 10MW-on-K8s, 10Datacenter-Switchover: Prepare mw-on-k8s for switchover - https://phabricator.wikimedia.org/T327924 (10Clement_Goubert) [15:46:51] 10serviceops, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 9 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) [15:55:57] 10serviceops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) I'll check our db-related hosts and I'll get back to you tomorrow [15:58:45] Reverse DNS didn't fix the datahub issue :-( Back to the drawing board. [16:38:41] 10serviceops, 10Commons, 10MediaWiki-File-management, 10SRE, and 3 others: Frequent "Error: 429, Too Many Requests" errors on pages with many (>50) thumbnails - https://phabricator.wikimedia.org/T266155 (10PatchDemoBot) Test wiki **created** on [[ https://patchdemo.wmflabs.org | Patch demo ]] by TheDJ usin... [16:44:15] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests: allow mw-deployers to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Dzahn) Thanks @Clement_Goubert and @Muehlenhoff for follow-ups. [17:23:50] In order to get api-gateway away from aforementioned k8s puppetca dependencies, I'd like to include wmf-certificates in the fluent-bit image: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/883605/ [17:24:55] ( related https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/883636) [17:32:29] hnowlan: multiple other images do it already in docker-images/production-images repo. Go ahead [17:33:01] that being said, you got a typo ;-) [17:34:49] akosiaris: thanks, good catch! [17:38:16] 10serviceops, 10MW-on-K8s, 10Datacenter-Switchover: Prepare mw-on-k8s for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327924 (10Jdforrester-WMF) [18:42:13] 10serviceops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10RKemper) [18:45:16] 10serviceops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10RKemper) [18:50:39] 10serviceops, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10LSobanski)