[00:49:46] 06serviceops, 10Prod-Kubernetes, 06SRE: Kubernetes apiserver probe failures on restart - https://phabricator.wikimedia.org/T358936#9686689 (10ssingh) This happened today as well, at 00:35 UTC, when we were paged for this: ` 00:35:41 <+jinxer-wm> (ProbeDown) firing: Service kubemaster2001:6443 has failed pr... [08:12:02] 06serviceops, 10ChangeProp, 06Content-Transform-Team, 10Lift-Wing, and 6 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9687158 (10akosiaris) [08:19:14] 06serviceops, 13Patch-For-Review: Improve etcdmirror shutdown behavior - https://phabricator.wikimedia.org/T361762#9687196 (10Volans) nice finding! [09:14:51] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9687499 (10MoritzMuehlenhoff) [09:54:25] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9687715 (10MoritzMuehlenhoff) [10:06:46] 06serviceops, 06Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting, 13Patch-For-Review: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507#9687740 (10Jgiannelos) From staging: ` { "status": 500, "type": "internal_error",... [10:13:36] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9687751 (10MoritzMuehlenhoff) [10:39:23] 06serviceops, 06Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting, 13Patch-For-Review: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507#9687847 (10Jgiannelos) Things look better on staging: * For a fresh request (expecte... [10:41:36] 06serviceops, 10[DEPRECATED] wdwb-tech, 10Citoid, 06Content-Transform-Team-WIP, and 10 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118#9687858 (10Nikerabbit) [12:59:22] 06serviceops, 10iPoid-Service, 10Observability-Logging, 13Patch-For-Review: 14Logs from containers sometimes not visible in logstash - 14https://phabricator.wikimedia.org/T357616#9688075 (10JMeybohm) 05Open→03Resolved a:03JMeybohm 14The restarts do work properly and we've not seen "Too many open... [13:09:31] 06serviceops, 06Content-Transform-Team-WIP, 10Mobile-Content-Service, 10RESTBase Sunsetting, and 3 others: 14Setup allowed list for MCS decom - 14https://phabricator.wikimedia.org/T340036#9688380 (10akosiaris) 14I guess it's about time I ask if it is ok to remove those exceptions now and return 403 to... [13:10:03] 06serviceops, 10ChangeProp, 06Content-Transform-Team, 10Lift-Wing, and 6 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9688388 (10akosiaris) Next up. `mobile-sections`. It's deprecated per T328036 for a long time now. I 'll remove ru... [13:10:53] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835 (10WDoranWMF) 03NEW [13:10:59] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9688437 (10WDoranWMF) p:05Triage→03High [13:11:17] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9688441 (10WDoranWMF) [13:11:37] 06serviceops, 10ChangeProp, 06Content-Transform-Team, 10Lift-Wing, and 6 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9688445 (10akosiaris) >>! In T361483#9680093, @elukey wrote: >>>! In T361483#9680024, @akosiaris wrote: >>>>! In T... [14:36:58] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9688647 (10hnowlan) [14:43:03] 06serviceops, 10ChangeProp, 06Content-Transform-Team, 10Lift-Wing, and 5 others: Selectively disable changeprop functionality that is no longer used - https://phabricator.wikimedia.org/T361483#9688741 (10SLopes-WMF) [14:47:23] 06serviceops, 06Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting, 13Patch-For-Review: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507#9688831 (10Eevans) carltondance [15:31:54] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 10Sustainability (Incident Followup): scap should check if it is running within a tmux/screen - https://phabricator.wikimedia.org/T361724#9689119 (10CodeReviewBot) jiji opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requ... [15:33:28] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9689123 (10hnowlan) What external paths should we be routing to what internal paths for this service? [16:01:21] 06serviceops, 13Patch-For-Review: Package latest version of prometheus-memcached-exporter (v0.14.1) - https://phabricator.wikimedia.org/T350807#9689265 (10CodeReviewBot) jiji opened https://gitlab.wikimedia.org/repos/sre/prometheus-memcached-exporter/-/merge_requests/1 package prometheus-memcached-exporter fo... [16:07:41] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9689309 (10Scott_French) Additionally, two timeline questions: * When do you anticipate having a min... [16:23:14] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 11), 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9689413 (10Scott_French) a:03Scott_French [17:39:37] 06serviceops, 13Patch-For-Review: Improve etcdmirror shutdown behavior - https://phabricator.wikimedia.org/T361762#9689838 (10Scott_French) This will be deployed as part of T358636. [17:39:50] 06serviceops, 13Patch-For-Review: 14Improve etcdmirror shutdown behavior - 14https://phabricator.wikimedia.org/T361762#9689839 (10Scott_French) 05In progress→03Resolved [18:11:35] 06serviceops, 13Patch-For-Review: etcdmirror does not recover from a cleared waitIndex - https://phabricator.wikimedia.org/T358636#9689981 (10Scott_French) Thanks, Riccardo! (both for the follow-up here and code reviews) Next steps: * Release a new etcd-mirror package (a process I can largely follow from the... [18:29:15] 06serviceops, 06Release-Engineering-Team, 10Scap, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690071 (10dancy) [18:29:40] 06serviceops, 06Release-Engineering-Team, 10Scap, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690063 (10dancy) 05Resolved→03Open Scap already has a check for helm releases in `pending-upgrade` state but it looks like i... [18:58:14] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 10Sustainability (Incident Followup): scap should check if it is running within a tmux/screen - https://phabricator.wikimedia.org/T361724#9690204 (10dancy) There is a tmux/screen check for `scap stage-train`, but nothing else. This c... [20:04:55] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 10Sustainability (Incident Followup): scap should check if it is running within a tmux/screen - https://phabricator.wikimedia.org/T361724#9690603 (10hashar) I might have used `screen` back in the old days (like in 2005 or so) and might... [20:19:23] 06serviceops, 06Release-Engineering-Team, 10Scap, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690655 (10dancy) p:05Unbreak!→03Medium [20:34:11] 06serviceops, 10MediaWiki-General, 10MediaWiki-libs-Stats, 10observability, and 2 others: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685#9690698 (10lmata) [20:41:48] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690727 (10CodeReviewBot) dancy opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merg... [20:59:58] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690805 (10CodeReviewBot) dancy opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/267... [21:01:35] 06serviceops, 06Release-Engineering-Team, 10Scap, 13Patch-For-Review, 07Wikimedia-Incident: Helm was left in limbo due to interrupted deployment/rollback - https://phabricator.wikimedia.org/T361720#9690811 (10CodeReviewBot) dancy merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merg...