[09:40:07] 06serviceops, 06Language and Product Localization: Migrate language_and_product_localization jobs to mw-cron - https://phabricator.wikimedia.org/T388539#10732731 (10Clement_Goubert) [09:40:17] 06serviceops, 06Language and Product Localization: Migrate language_and_product_localization jobs to mw-cron - https://phabricator.wikimedia.org/T388539#10732732 (10Clement_Goubert) 05Open→03In progress [09:47:52] 06serviceops, 10MW-on-K8s: Move mwscript wrapper from base image to copy on build - https://phabricator.wikimedia.org/T391665 (10Clement_Goubert) 03NEW [09:48:16] 06serviceops, 10MW-on-K8s: Move mwscript wrapper from base image to copy on build - https://phabricator.wikimedia.org/T391665#10732755 (10Clement_Goubert) 05Open→03In progress p:05Triage→03High [10:34:21] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10732891 (10elukey) Thanks a lot for the ton of good details Scott! Reviewing your comments made me wonder something high level, lemme know what you think about it. We are currently facing two issues:... [10:43:08] 06serviceops, 10MW-on-K8s: Deploy statsd-exporter to mw-cron - https://phabricator.wikimedia.org/T391672 (10Clement_Goubert) 03NEW [10:43:14] 06serviceops, 10MW-on-K8s: Deploy statsd-exporter to mw-cron - https://phabricator.wikimedia.org/T391672#10732912 (10Clement_Goubert) 05Open→03In progress p:05Triage→03High [10:59:52] 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 07OKR-Work, 03Web Team Essential Work 2025: Transition parsoidtest1001 to PHP 8.1 - https://phabricator.wikimedia.org/T380485#10732947 (10Krinkle) [11:00:55] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move mwscript wrapper from base image to copy on build - https://phabricator.wikimedia.org/T391665#10732949 (10Clement_Goubert) [11:01:02] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move mwscript wrapper from base image to copy on build - https://phabricator.wikimedia.org/T391665#10732950 (10Clement_Goubert) [11:02:36] 06serviceops, 10MW-on-K8s, 06SRE Observability, 13Patch-For-Review: Deploy statsd-exporter to mw-cron - https://phabricator.wikimedia.org/T391672#10732953 (10Clement_Goubert) [11:03:30] 06serviceops, 10MW-on-K8s, 06SRE Observability, 13Patch-For-Review: Deploy statsd-exporter to mw-cron - https://phabricator.wikimedia.org/T391672#10732954 (10Clement_Goubert) [11:03:32] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review: Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#10732955 (10Clement_Goubert) [11:12:48] 06serviceops, 10MW-on-K8s: Contact team responsible for a job on failure in mwcron - https://phabricator.wikimedia.org/T377964#10732961 (10Clement_Goubert) →14Duplicate dup:03T385709 [11:12:52] 06serviceops, 10MW-on-K8s, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Periodic job alerting - https://phabricator.wikimedia.org/T385709#10732963 (10Clement_Goubert) [11:13:46] 06serviceops, 10MW-on-K8s, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Periodic job alerting - https://phabricator.wikimedia.org/T385709#10732965 (10Clement_Goubert) All teams and tags are now added to AlertManager. I'm keeping the task open to attach any further c... [11:13:58] 06serviceops, 10MW-on-K8s, 10Observability-Alerting, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): Periodic job alerting - https://phabricator.wikimedia.org/T385709#10732966 (10Clement_Goubert) 05Open→03In progress [11:16:24] 06serviceops, 10MW-on-K8s: Create a logstash dashboard for mediawiki periodic jobs - https://phabricator.wikimedia.org/T385594#10732982 (10Clement_Goubert) 05Open→03Resolved The logstash dashboard exists at https://w.wiki/DmpP Resolving this task, but there will be tasks created for mediawiki dev team... [11:17:24] 06serviceops, 06MediaWiki-Platform-Team: Migrate "startupregistrystats" maintenance script to k8s-mw-cron (mediawiki-platform-team) - https://phabricator.wikimedia.org/T388540#10732986 (10Clement_Goubert) 05Open→03In progress [11:18:08] 06serviceops, 10CampaignEvents, 06Campaigns-Product-Team, 10MW-on-K8s, 13Patch-For-Review: Migrate CampaignEvents jobs to mw-cron - https://phabricator.wikimedia.org/T385867#10732988 (10Clement_Goubert) 05Open→03In progress [11:19:28] 06serviceops, 10MediaWiki-Page-derived-data: Migrate MediaWiki-Page-derived-data jobs to mw-cron - https://phabricator.wikimedia.org/T388530#10732991 (10hnowlan) These jobs are running with the "--dfn-only" flag set so we should at the very least rename these jobs or update the description to outline that they... [13:04:07] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 07Kubernetes: top-level config key environments must be defined before releases in helmfile.yaml - https://phabricator.wikimedia.org/T387836#10733203 (10Gehel) [13:04:31] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), and 2 others: Fix installed key in dependend helmfile releases - https://phabricator.wikimedia.org/T387837#10733205 (10Gehel) [13:19:55] 06serviceops, 06Discovery-Search, 07Analytics-Data-Problem, 10Data-Platform-SRE (2025-04-12 - 2025-05-02): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10733422 (10Gehel) [13:27:15] 06serviceops, 10Discovery-Search (2025.04.11 - 2025.05.02): Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538#10733527 (10Gehel) [15:04:06] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10733816 (10Jhancock.wm) 05Open→03Resolved a:05Papaul→03Jhancock.wm @Clement_Goubert arrived and replaced. ran provisioning cookbook and it pings now. L... [15:04:43] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10733819 (10Clement_Goubert) Thanks for the resuscitation! [15:23:11] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10733862 (10ops-monitoring-bot) pool host wikikube-worker2142.codfw.wmnet by cgoubert@cumin1002 with reason: None [15:23:16] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: hard down for wikikube-worker2142 - https://phabricator.wikimedia.org/T391341#10733863 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1002 pool for host wikikube-worker2142.codfw.wmnet completed... [16:11:35] 06serviceops, 06Growth-Team, 10GrowthExperiments, 10MW-on-K8s, 13Patch-For-Review: Migrate GrowthExperiments maintenance jobs to mw-cron - https://phabricator.wikimedia.org/T385782#10734012 (10hnowlan) [18:25:24] 06serviceops: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#10734580 (10Scott_French) Thanks, Luca! This morning, I checked the swift-proxy access logs on ms-fe2010 and ms-fe2012, which are the two hosts emitting 416 responses in the 16:06 minute per the `swif... [22:14:49] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#10735259 (10Aklapper) 05In progress→03Open Resetting task status from "In Progress" to "Open" as this task has been "in progress" for m... [22:22:20] 06serviceops, 06Data-Engineering, 10EventStreams, 10Observability-Tracing, and 3 others: eventstreams regularly uses more than 95% of its memory limit - https://phabricator.wikimedia.org/T357005#10735487 (10Aklapper) 05In progress→03Open Resetting task status from "In Progress" to "Open" as this task h...