[00:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0000)
[00:05:29] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:10:06] <jinxer-wm>	 FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[00:10:29] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:15:06] <jinxer-wm>	 RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[00:18:49] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:19:41] <icinga-wm>	 PROBLEM - BGP status on cr1-drmrs is CRITICAL: BGP CRITICAL - AS6939/IPv4: Idle - HE, AS13030/IPv4: Idle - Init7, AS13030/IPv6: Idle - Init7, AS6939/IPv6: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:19:48] <wikibugs>	 (03CR) 10Btullis: "There is an intereting discussion about this here: https://wikimedia.slack.com/archives/C055QGPTC69/p1736582228252779" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1109705 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[00:25:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456548 (10phaultfinder)
[00:38:25] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1110886
[00:38:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1110886 (owner: 10TrainBranchBot)
[00:42:49] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:43:45] <icinga-wm>	 RECOVERY - BGP status on cr1-drmrs is OK: BGP OK - up: 103, down: 4, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:46:03] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:50:36] <wikibugs>	 (03CR) 10Eevans: [C:03+2] cassandra: rotate target_version 'dev' to '4.x' [puppet] - 10https://gerrit.wikimedia.org/r/1109767 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[00:54:41] <wikibugs>	 (03PS2) 10Eevans: cassandra: set target_dev to 4.x (no-op) [puppet] - 10https://gerrit.wikimedia.org/r/1109768 (https://phabricator.wikimedia.org/T380420)
[00:54:43] <icinga-wm>	 PROBLEM - BGP status on cr1-drmrs is CRITICAL: BGP CRITICAL - AS6939/IPv4: Idle - HE, AS6939/IPv6: Idle - HE, AS13030/IPv6: Idle - Init7, AS13030/IPv4: Idle - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[00:54:49] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:55:13] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1109768 (https://phabricator.wikimedia.org/T380420) (owner: 10Eevans)
[00:57:22] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1110886 (owner: 10TrainBranchBot)
[01:08:08] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1110888
[01:08:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1110888 (owner: 10TrainBranchBot)
[01:10:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456602 (10phaultfinder)
[01:27:39] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1110888 (owner: 10TrainBranchBot)
[01:32:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[01:42:49] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 207, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:43:49] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 208, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:08:00] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.44.0-wmf.12 [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1110894 (https://phabricator.wikimedia.org/T382363)
[02:08:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.44.0-wmf.12 [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1110894 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[02:28:30] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.44.0-wmf.12 [core] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1110894 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[02:34:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456703 (10phaultfinder)
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:46:15] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0300)
[03:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:14:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456733 (10phaultfinder)
[04:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0400)
[04:01:47] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110900 (https://phabricator.wikimedia.org/T382363)
[04:01:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110900 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[04:02:37] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110900 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[04:03:02] <logmsgbot>	 !log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.12  refs T382363
[04:03:06] <stashbot>	 T382363: 1.44.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T382363
[04:10:29] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:24:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456773 (10phaultfinder)
[04:54:00] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap sync-world: testwikis to 1.44.0-wmf.12  refs T382363 (duration: 50m 57s)
[04:54:03] <stashbot>	 T382363: 1.44.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T382363
[04:54:04] <wikibugs>	 (03PS2) 10Jdlrobson: Stop expanding sections by default on Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1107964 (https://phabricator.wikimedia.org/T376446)
[05:00:05] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0500)
[05:03:08] <logmsgbot>	 !log mwpresync@deploy2002 Pruned MediaWiki: 1.44.0-wmf.6 (duration: 03m 06s)
[05:05:23] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv4: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:32:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:09:21] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:10:23] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Idle - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:11:59] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:12:39] <jinxer-wm>	 RESOLVED: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:15:04] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] "LGTM. Let me know when you want to merge this." [puppet] - 10https://gerrit.wikimedia.org/r/993010 (owner: 10Reedy)
[06:22:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[06:24:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456848 (10phaultfinder)
[06:25:27] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:35:31] <wikibugs>	 (03PS1) 10TChin: Eventstreams: Bump image, use service-utils [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111105 (https://phabricator.wikimedia.org/T361769)
[06:49:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10456854 (10phaultfinder)
[06:57:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job routinator in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0700)
[07:00:05] <jouncebot>	 marostegui, Amir1, and arnaudb: It is that lovely time of the day again! You are hereby commanded to deploy Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0700).
[07:02:16] <wikibugs>	 (03PS2) 10Anzx: knwiki, knwikisource, knwikitionary, knwikiquote: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802)
[07:02:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job routinator in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:03:11] <wikibugs>	 (03CR) 10Anzx: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802) (owner: 10Anzx)
[07:03:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802) (owner: 10Anzx)
[07:07:38] <jinxer-wm>	 RESOLVED: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[07:21:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[2412-2415].codfw.wmnet
[07:24:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[2412-2415].codfw.wmnet
[07:24:58] <wikibugs>	 (03PS4) 10Anzx: hiwikisource: logo fix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961)
[07:25:08] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961) (owner: 10Anzx)
[07:25:29] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename mw241[2-5] to wikikube-worker22[12-15] [puppet] - 10https://gerrit.wikimedia.org/r/1110822 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[07:28:40] <wikibugs>	 (03PS3) 10Anzx: knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802)
[07:29:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2412 to wikikube-worker2212
[07:29:31] <wikibugs>	 (03PS5) 10Anzx: hiwikisource: logo fix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961)
[07:29:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:30:21] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[07:30:21] <icinga-wm>	 status
[07:32:39] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[07:32:39] <icinga-wm>	 status
[07:33:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2412 to wikikube-worker2212 - jelto@cumin1002"
[07:33:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2412 to wikikube-worker2212 - jelto@cumin1002"
[07:33:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:33:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2212
[07:34:07] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2212
[07:34:28] <wikibugs>	 (03PS4) 10Anzx: knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802)
[07:34:45] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2412 to wikikube-worker2212
[07:35:41] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2413 to wikikube-worker2213
[07:35:48] <wikibugs>	 (03PS6) 10Anzx: hiwikisource: logo fix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961)
[07:36:02] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:39:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2413 to wikikube-worker2213 - jelto@cumin1002"
[07:39:40] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on mw2414:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[07:39:41] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2413 to wikikube-worker2213 - jelto@cumin1002"
[07:39:41] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:39:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2213
[07:39:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2213
[07:40:32] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2413 to wikikube-worker2213
[07:43:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2414 to wikikube-worker2214
[07:43:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[07:43:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:47:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2414 to wikikube-worker2214 - jelto@cumin1002"
[07:47:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2414 to wikikube-worker2214 - jelto@cumin1002"
[07:47:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:47:43] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2214
[07:48:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2214
[07:48:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2414 to wikikube-worker2214
[07:49:08] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2415 to wikikube-worker2215
[07:49:29] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[07:52:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2415 to wikikube-worker2215 - jelto@cumin1002"
[07:53:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2415 to wikikube-worker2215 - jelto@cumin1002"
[07:53:10] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:53:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2215
[07:53:23] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2215
[07:54:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2415 to wikikube-worker2215
[07:54:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2212.codfw.wmnet wikikube-worker2213.codfw.wmnet wikikube-worker2214.codfw.wmnet wikikube-worker2215.codfw.wmnet on all recursors
[07:54:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2212.codfw.wmnet wikikube-worker2213.codfw.wmnet wikikube-worker2214.codfw.wmnet wikikube-worker2215.codfw.wmnet on all recursors
[07:56:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, though I'll let Clement vote" [puppet] - 10https://gerrit.wikimedia.org/r/1110872 (https://phabricator.wikimedia.org/T370527) (owner: 10Andrea Denisse)
[07:57:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1023 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P72020 and previous config saved to /var/cache/conftool/dbconfig/20250114-075741-root.json
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: Time to do the UTC morning backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T0800).
[08:00:05] <jouncebot>	 ottomata, gmodena and anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:16] <gmodena>	 o/
[08:00:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DC-Ops: InterfaceSpeedError - https://phabricator.wikimedia.org/T382485#10456880 (10Marostegui) 05Open→03Resolved Looks good! thank you! ` root@es1043:~# mii-tool eno8303 eno8303: negotiated 1000baseT-FD flow-control, link ok `
[08:01:25] <anzx>	 o/
[08:02:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2212.codfw.wmnet with OS bookworm
[08:02:56] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2212
[08:05:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:07:04] <wikibugs>	 (03PS2) 10Marostegui: orchestrator.conf.json.erb: Update whitelist [puppet] - 10https://gerrit.wikimedia.org/r/1110819
[08:07:04] <wikibugs>	 (03PS1) 10Marostegui: es1044: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111160
[08:08:37] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2212 - jelto@cumin1002"
[08:08:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2212 - jelto@cumin1002"
[08:08:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:08:47] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2212.codfw.wmnet 59.32.192.10.in-addr.arpa 9.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:08:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2212.codfw.wmnet 59.32.192.10.in-addr.arpa 9.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:08:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2212
[08:09:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2212
[08:09:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2212
[08:10:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2213.codfw.wmnet with OS bookworm
[08:10:29] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:10:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2213
[08:10:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:12:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P72021 and previous config saved to /var/cache/conftool/dbconfig/20250114-081246-root.json
[08:14:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2213 - jelto@cumin1002"
[08:14:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2213 - jelto@cumin1002"
[08:14:19] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:14:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2213.codfw.wmnet 60.32.192.10.in-addr.arpa 0.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:14:22] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2213.codfw.wmnet 60.32.192.10.in-addr.arpa 0.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:14:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2213
[08:14:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2213
[08:14:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2213
[08:15:08] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2214.codfw.wmnet with OS bookworm
[08:15:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2214
[08:15:45] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:17:02] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105879 (https://phabricator.wikimedia.org/T374021) (owner: 10Stevemunene)
[08:17:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: migrate ops instance to prometheus::instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1108746 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[08:17:45] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105878 (https://phabricator.wikimedia.org/T377956) (owner: 10Stevemunene)
[08:18:36] <wikibugs>	 (03CR) 10Muehlenhoff: "That won't work :-) Couple of comments inline how to clean that out." [puppet] - 10https://gerrit.wikimedia.org/r/1110845 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata)
[08:19:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2214 - jelto@cumin1002"
[08:19:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2214 - jelto@cumin1002"
[08:19:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:19:30] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2214.codfw.wmnet 61.32.192.10.in-addr.arpa 1.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:19:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2214.codfw.wmnet 61.32.192.10.in-addr.arpa 1.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:19:33] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2214
[08:19:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2214
[08:19:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2214
[08:21:48] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2215.codfw.wmnet with OS bookworm
[08:21:55] <moritzm>	 !log installing perl security updates
[08:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:59] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2215
[08:22:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[08:25:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2215 - jelto@cumin1002"
[08:25:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2215 - jelto@cumin1002"
[08:25:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:25:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2215.codfw.wmnet 62.32.192.10.in-addr.arpa 2.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:25:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2215.codfw.wmnet 62.32.192.10.in-addr.arpa 2.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[08:25:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2215
[08:25:43] <hashar>	 anzx: gmodena: did you get your patch deployed?
[08:25:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2215
[08:25:57] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2215
[08:26:05] <hashar>	 damn bot
[08:26:10] <hashar>	 that is so verbose
[08:26:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2212.codfw.wmnet with reason: host reimage
[08:26:25] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] Revert^2 "config: remove eventbus instrumentation setting" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110777 (owner: 10Ottomata)
[08:27:20] <hashar>	 gmodena: is there anything specific to do for your patch? I recognize that is a config change I revert solely because it got merged while I was deploying the train.  But beside that I don't know what it does
[08:27:32] <hashar>	 looks like that is just clean up?
[08:27:39] <gmodena>	 hashar not deployed  yet. Normally I'd do myself, but this is a patch you previously reverted (it landed during a deployment train) and wanted to ask for an ack
[08:27:46] <gmodena>	 hashar correct, it's a cleanup
[08:27:47] <hashar>	 ah yeah
[08:27:49] <hashar>	 please do!
[08:27:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P72022 and previous config saved to /var/cache/conftool/dbconfig/20250114-082751-root.json
[08:28:06] <gmodena>	 hashar ack. I'll deploy
[08:28:17] <hashar>	 this way I can review the other two patches. Thank you!
[08:28:30] <gmodena>	 hashar np. thanks for checking in!
[08:28:53] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Nicely done!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110883 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[08:29:38] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "I will deploy it. Thank you for the patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961) (owner: 10Anzx)
[08:29:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by gmodena@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110777 (owner: 10Ottomata)
[08:30:43] <wikibugs>	 (03Merged) 10jenkins-bot: Revert^2 "config: remove eventbus instrumentation setting" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1110777 (owner: 10Ottomata)
[08:31:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2213.codfw.wmnet with reason: host reimage
[08:31:31] <logmsgbot>	 !log gmodena@deploy2002 Started scap sync-world: Backport for [[gerrit:1110777|Revert^2 "config: remove eventbus instrumentation setting"]]
[08:32:32] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "I will deploy it. Thank you for the patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802) (owner: 10Anzx)
[08:32:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2212.codfw.wmnet with reason: host reimage
[08:34:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2214.codfw.wmnet with reason: host reimage
[08:36:49] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2213.codfw.wmnet with reason: host reimage
[08:37:07] <wikibugs>	 (03PS9) 10Filippo Giunchedi: prometheus: k8s instances migration to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087)
[08:37:07] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: add initial lv size to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1109680 (https://phabricator.wikimedia.org/T371087)
[08:38:53] <logmsgbot>	 !log gmodena@deploy2002 otto, gmodena: Backport for [[gerrit:1110777|Revert^2 "config: remove eventbus instrumentation setting"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:40:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2214.codfw.wmnet with reason: host reimage
[08:40:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4791/co" [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[08:42:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P72023 and previous config saved to /var/cache/conftool/dbconfig/20250114-084256-root.json
[08:43:47] <logmsgbot>	 !log gmodena@deploy2002 otto, gmodena: Continuing with sync
[08:45:55] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:51:52] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2212.codfw.wmnet with OS bookworm
[08:53:24] <logmsgbot>	 !log gmodena@deploy2002 Finished scap sync-world: Backport for [[gerrit:1110777|Revert^2 "config: remove eventbus instrumentation setting"]] (duration: 21m 52s)
[08:55:02] <logmsgbot>	 !log jynus@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on dbprov2006.codfw.wmnet with reason: os upgrade
[08:55:17] <logmsgbot>	 !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbprov2006.codfw.wmnet with reason: os upgrade
[08:55:37] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2213.codfw.wmnet with OS bookworm
[08:58:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P72024 and previous config saved to /var/cache/conftool/dbconfig/20250114-085802-root.json
[08:58:11] <wikibugs>	 (03PS1) 10Brouberol: airflow: replace the scheduler liveness check by a tcpSocket probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111162 (https://phabricator.wikimedia.org/T383651)
[08:58:59] <logmsgbot>	 !log jelto@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2215.codfw.wmnet with OS bookworm
[08:59:22] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2215.codfw.wmnet with OS bookworm
[08:59:25] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2215
[08:59:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2215
[08:59:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2214.codfw.wmnet with OS bookworm
[08:59:50] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: replace the scheduler liveness check by a tcpSocket probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111162 (https://phabricator.wikimedia.org/T383651) (owner: 10Brouberol)
[09:01:27] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: replace the scheduler liveness check by a tcpSocket probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111162 (https://phabricator.wikimedia.org/T383651) (owner: 10Brouberol)
[09:05:57] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:06:07] <anzx>	 hashar: if you are going to deploy , i am here
[09:06:24] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] prometheus: add initial lv size to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1109680 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[09:06:25] <hashar>	 yeah I will but the other one is still going on :/
[09:06:31] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:06:33] <icinga-wm>	 RECOVERY - BGP status on cr1-drmrs is OK: BGP OK - up: 107, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:07:41] <anzx>	 ok, i thought other one was finished
[09:08:23] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
[09:09:03] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
[09:09:12] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
[09:09:34] <wikibugs>	 (03PS1) 10Hashar: gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828)
[09:09:40] <hashar>	 let me check
[09:09:48] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
[09:09:51] <hashar>	 gmodena: your deployment is still going on sin't it?
[09:09:53] <wikibugs>	 (03CR) 10CI reject: [V:04-1] gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[09:10:25] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
[09:10:52] <wikibugs>	 (03PS2) 10Hashar: gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828)
[09:10:57] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
[09:11:11] <wikibugs>	 (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[09:11:40] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[09:12:03] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[09:13:45] <wikibugs>	 (03PS2) 10Brouberol: airflow: Allow specific task pods to access the kube-api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110883 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[09:15:51] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2215.codfw.wmnet with reason: host reimage
[09:19:39] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] thanos-query: write active queries to file [puppet] - 10https://gerrit.wikimedia.org/r/1110798 (https://phabricator.wikimedia.org/T383570) (owner: 10Filippo Giunchedi)
[09:19:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2215.codfw.wmnet with reason: host reimage
[09:21:34] <wikibugs>	 (03PS1) 10Fabfur: Added new stream config for haproxy_requestctl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392)
[09:22:24] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Added new stream config for haproxy_requestctl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:22:36] <icinga-wm>	 PROBLEM - BGP status on cr1-drmrs is CRITICAL: BGP CRITICAL - AS13030/IPv4: Connect - Init7, AS13030/IPv6: Active - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:24:21] <hashar>	 Finished scap sync-world: Backport for [[gerrit:1110777|Revert^2 "config: remove eventbus instrumentation setting"]] (duration: 21m 52s)
[09:24:22] <hashar>	 from the logs
[09:24:27] <hashar>	 at 8:53:24 UTC
[09:24:34] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es1044: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111160 (owner: 10Marostegui)
[09:24:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] orchestrator.conf.json.erb: Update whitelist [puppet] - 10https://gerrit.wikimedia.org/r/1110819 (owner: 10Marostegui)
[09:24:44] <hashar>	 which of course I have missed in the above log spam
[09:25:08] <hashar>	 anzx: I am doing your patches
[09:25:12] <anzx>	 ok
[09:25:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802) (owner: 10Anzx)
[09:25:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by hashar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961) (owner: 10Anzx)
[09:26:20] <wikibugs>	 (03Merged) 10jenkins-bot: knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111106 (https://phabricator.wikimedia.org/T382802) (owner: 10Anzx)
[09:26:22] <wikibugs>	 (03Merged) 10jenkins-bot: hiwikisource: logo fix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111109 (https://phabricator.wikimedia.org/T310961) (owner: 10Anzx)
[09:26:49] <logmsgbot>	 !log hashar@deploy2002 Started scap sync-world: Backport for [[gerrit:1111106|knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark (T382802)]], [[gerrit:1111109|hiwikisource: logo fix (T310961)]]
[09:26:55] <stashbot>	 T382802: Update wordmark and logo width for knwiki , knwikisource , knwikiquote , knwiktionary - https://phabricator.wikimedia.org/T382802
[09:26:55] <stashbot>	 T310961: Site logo cropped/not fully displayed on some projects - https://phabricator.wikimedia.org/T310961
[09:27:08] <wikibugs>	 (03PS2) 10Fabfur: Added new stream config for haproxy_requestctl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392)
[09:29:14] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:29:27] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es1044 [puppet] - 10https://gerrit.wikimedia.org/r/1111168 (https://phabricator.wikimedia.org/T382569)
[09:29:56] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Add es1044 [puppet] - 10https://gerrit.wikimedia.org/r/1111168 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[09:31:32] <logmsgbot>	 !log hashar@deploy2002 anzx, hashar: Backport for [[gerrit:1111106|knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark (T382802)]], [[gerrit:1111109|hiwikisource: logo fix (T310961)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:31:35] <anzx>	 hashar: checking
[09:31:36] <hashar>	 anzx: changes are on the test servers if you wanna test them
[09:31:39] <hashar>	 \o/
[09:31:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Add es1044 to dbctl depooled T382569', diff saved to https://phabricator.wikimedia.org/P72025 and previous config saved to /var/cache/conftool/dbconfig/20250114-093147-marostegui.json
[09:31:51] <stashbot>	 T382569: Productionize es104[1-6] - https://phabricator.wikimedia.org/T382569
[09:32:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 1%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72026 and previous config saved to /var/cache/conftool/dbconfig/20250114-093216-root.json
[09:33:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1022 T382569', diff saved to https://phabricator.wikimedia.org/P72027 and previous config saved to /var/cache/conftool/dbconfig/20250114-093315-marostegui.json
[09:33:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1022,1043].eqiad.wmnet with reason: cloning
[09:33:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1022,1043].eqiad.wmnet with reason: cloning
[09:34:00] <anzx>	 hashar: look good
[09:35:31] <logmsgbot>	 !log hashar@deploy2002 anzx, hashar: Continuing with sync
[09:36:14] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:36:17] <wikibugs>	 (03PS1) 10Marostegui: es1044: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1111172
[09:36:36] <icinga-wm>	 RECOVERY - BGP status on cr1-drmrs is OK: BGP OK - up: 106, down: 1, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:36:39] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es1044: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1111172 (owner: 10Marostegui)
[09:38:04] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2215.codfw.wmnet with OS bookworm
[09:38:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch Presto access to nftables-compatible firewall settings [puppet] - 10https://gerrit.wikimedia.org/r/1109411 (owner: 10Muehlenhoff)
[09:40:40] <wikibugs>	 (03CR) 10Volans: [C:03+2] enum: remove type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/1110778 (owner: 10Volans)
[09:42:30] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] P:conftool: allow the parsercache section flavor [puppet] - 10https://gerrit.wikimedia.org/r/1110880 (https://phabricator.wikimedia.org/T383324) (owner: 10Scott French)
[09:43:01] <wikibugs>	 (03CR) 10Jelto: "looks mostly good. But I don't understand the difference between `kubectl$(K8S_VERSION)' and `kubectl-$(K8S_VERSION)'. Why do we need both" [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[09:43:10] <logmsgbot>	 !log hashar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111106|knwiki, knwikisource, knwiktionary, knwikiquote: update logo, wordmark (T382802)]], [[gerrit:1111109|hiwikisource: logo fix (T310961)]] (duration: 16m 21s)
[09:43:14] <stashbot>	 T382802: Update wordmark and logo width for knwiki , knwikisource , knwikiquote , knwiktionary - https://phabricator.wikimedia.org/T382802
[09:43:15] <stashbot>	 T310961: Site logo cropped/not fully displayed on some projects - https://phabricator.wikimedia.org/T310961
[09:43:17] <wikibugs>	 (03PS3) 10Hashar: gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828)
[09:43:22] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp
[09:43:23] <anzx>	 hashar: could run  https://www.irccloud.com/pastebin/e9jWrN4j/
[09:43:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool pc4 T383398', diff saved to https://phabricator.wikimedia.org/P72028 and previous config saved to /var/cache/conftool/dbconfig/20250114-094350-marostegui.json
[09:43:54] <stashbot>	 T383398: Reorganize and clean existing pc1-pc5 sections - https://phabricator.wikimedia.org/T383398
[09:43:55] <jelto>	 !log homer 'lsw1-c3-codfw*' commit 'T377877'
[09:43:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:59] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[09:44:18] <wikibugs>	 (03CR) 10Hashar: "PCC failed with:" [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[09:44:24] <wikibugs>	 06SRE, 10Ceph, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10457073 (10cmooney) >>! In T371501#10453986, @dcaro wrote: > We still have to restart all the osd daemon processes to pick up the config chan...
[09:44:29] <wikibugs>	 (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[09:44:40] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.puppet.renew-cert for dbprov2006.codfw.wmnet: Renew puppet certificate - root@cumin1002
[09:44:44] <hashar>	 anzx: yes I will
[09:44:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 5:00:00 on pc[2014-2016].codfw.wmnet,pc1016.eqiad.wmnet with reason: reorganizing pc4
[09:44:50] <wikibugs>	 (03PS2) 10Muehlenhoff: Presto: Remove ferm support [puppet] - 10https://gerrit.wikimedia.org/r/1109412
[09:44:57] <hashar>	 I thought the logo got purged automagically
[09:45:01] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc[2014-2016].codfw.wmnet,pc1016.eqiad.wmnet with reason: reorganizing pc4
[09:45:12] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[09:45:47] <hashar>	 anzx: done
[09:46:53] <anzx>	 hashar: thank you 
[09:47:10] <hashar>	 \o/
[09:47:10] <jelto>	 !log homer 'cr*codfw*' commit 'T377877'
[09:47:12] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov2006.codfw.wmnet: Renew puppet certificate - root@cumin1002
[09:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72030 and previous config saved to /var/cache/conftool/dbconfig/20250114-094722-root.json
[09:47:54] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 120, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:48:32] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1109412 (owner: 10Muehlenhoff)
[09:48:38] <wikibugs>	 (03PS3) 10Volans: api: allow to abort before run() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105351 (https://phabricator.wikimedia.org/T365454)
[09:48:43] <wikibugs>	 (03PS4) 10Hashar: gerrit: restore IP addresses in ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828)
[09:48:51] <wikibugs>	 (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[09:50:45] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2212-2215].codfw.wmnet
[09:50:46] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Reorganize pc4 [puppet] - 10https://gerrit.wikimedia.org/r/1111174 (https://phabricator.wikimedia.org/T383398)
[09:50:48] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2212-2215].codfw.wmnet
[09:51:53] <wikibugs>	 (03Merged) 10jenkins-bot: enum: remove type hints [software/spicerack] - 10https://gerrit.wikimedia.org/r/1110778 (owner: 10Volans)
[09:52:18] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383595#10457092 (10Jelto)
[09:53:10] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Super, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1109412 (owner: 10Muehlenhoff)
[09:53:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote pc2014 to codfw pc4 master dbmaint T383398', diff saved to https://phabricator.wikimedia.org/P72031 and previous config saved to /var/cache/conftool/dbconfig/20250114-095320-marostegui.json
[09:53:25] <stashbot>	 T383398: Reorganize and clean existing pc1-pc5 sections - https://phabricator.wikimedia.org/T383398
[09:53:28] <wikibugs>	 (03CR) 10Btullis: [C:03+2] airflow: Allow specific task pods to access the kube-api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110883 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[09:54:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool pc4 T383398', diff saved to https://phabricator.wikimedia.org/P72032 and previous config saved to /var/cache/conftool/dbconfig/20250114-095404-marostegui.json
[09:54:17] <wikibugs>	 (03CR) 10Jelto: Support multiple kubernetes-client versions (031 comment) [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[09:54:58] <wikibugs>	 (03Merged) 10jenkins-bot: airflow: Allow specific task pods to access the kube-api [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110883 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[09:56:22] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [debs/kubernetes] (v1.31) - 10https://gerrit.wikimedia.org/r/1109672 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[09:58:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[09:58:54] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Reorganize pc4 [puppet] - 10https://gerrit.wikimedia.org/r/1111174 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[09:59:57] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:59:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] api: allow to abort before run() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105351 (https://phabricator.wikimedia.org/T365454) (owner: 10Volans)
[10:00:15] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:01:47] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Reorganize pc5 [puppet] - 10https://gerrit.wikimedia.org/r/1111179 (https://phabricator.wikimedia.org/T383398)
[10:01:48] <wikibugs>	 (03PS4) 10Volans: api: allow to abort before run() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105351 (https://phabricator.wikimedia.org/T365454)
[10:02:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72033 and previous config saved to /var/cache/conftool/dbconfig/20250114-100227-root.json
[10:02:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Presto: Remove ferm support [puppet] - 10https://gerrit.wikimedia.org/r/1109412 (owner: 10Muehlenhoff)
[10:02:53] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Reorganize pc5 [puppet] - 10https://gerrit.wikimedia.org/r/1111179 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[10:03:02] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp
[10:03:09] <marostegui>	 moritzm: ok to merge?
[10:03:23] <moritzm>	 yes, please
[10:03:30] <marostegui>	 moritzm: mergning
[10:03:33] <moritzm>	 thx
[10:03:37] <marostegui>	 :*
[10:05:22] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp
[10:11:05] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch an-test-presto1001 to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1111180
[10:12:28] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111180 (owner: 10Muehlenhoff)
[10:14:48] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm, I tested the checksum part locally and it works as expected" [debs/calico] (v3.29) - 10https://gerrit.wikimedia.org/r/1109671 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:15:50] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:16:49] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "The PCC shows contint being updated which is updating `/var/lib/zuul/.ssh/known_hosts`" [puppet] - 10https://gerrit.wikimedia.org/r/1111163 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[10:16:56] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Review and update grants for dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182
[10:17:01] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Reorganize pc sections [puppet] - 10https://gerrit.wikimedia.org/r/1111183 (https://phabricator.wikimedia.org/T383398)
[10:17:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72034 and previous config saved to /var/cache/conftool/dbconfig/20250114-101732-root.json
[10:18:04] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] site.pp: Reorganize pc sections [puppet] - 10https://gerrit.wikimedia.org/r/1111183 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[10:19:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dbbackups: Review and update grants for dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182 (owner: 10Jcrespo)
[10:20:24] <wikibugs>	 (03CR) 10JMeybohm: Support multiple kubernetes-client versions (031 comment) [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:20:30] <wikibugs>	 (03PS2) 10Jcrespo: dbbackups: Review and update grants for m1 dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182 (https://phabricator.wikimedia.org/T373579)
[10:21:52] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] profile::mediawiki::common: Remove obsolete DSH group check [puppet] - 10https://gerrit.wikimedia.org/r/1110872 (https://phabricator.wikimedia.org/T370527) (owner: 10Andrea Denisse)
[10:22:50] <wikibugs>	 (03PS1) 10Marostegui: valid_sections.pp: Add pc6 and pc7 [puppet] - 10https://gerrit.wikimedia.org/r/1111184 (https://phabricator.wikimedia.org/T383234)
[10:23:09] <wikibugs>	 (03PS3) 10Jcrespo: dbbackups: Review and update grants for m1 dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182 (https://phabricator.wikimedia.org/T373579)
[10:26:54] <wikibugs>	 (03PS1) 10Marostegui: dbproxy2006: Change m2 master [puppet] - 10https://gerrit.wikimedia.org/r/1111185 (https://phabricator.wikimedia.org/T373579)
[10:27:38] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp
[10:27:43] <wikibugs>	 (03CR) 10Marostegui: "root@cumin1002:~# host 10.192.28.6" [puppet] - 10https://gerrit.wikimedia.org/r/1111185 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[10:27:59] <wikibugs>	 (03PS1) 10Jelto: Rename mw237[3-6] to wikikube-worker22[16-19] [puppet] - 10https://gerrit.wikimedia.org/r/1111187 (https://phabricator.wikimedia.org/T377877)
[10:30:30] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] dbproxy2006: Change m2 master [puppet] - 10https://gerrit.wikimedia.org/r/1111185 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[10:30:42] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] dbproxy2006: Change m2 master [puppet] - 10https://gerrit.wikimedia.org/r/1111185 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[10:32:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72035 and previous config saved to /var/cache/conftool/dbconfig/20250114-103238-root.json
[10:33:37] <wikibugs>	 (03CR) 10Cathal Mooney: "One comment in-line on the separate NAT source.  But happy for this to proceed overall." [puppet] - 10https://gerrit.wikimedia.org/r/1105036 (https://phabricator.wikimedia.org/T383261) (owner: 10FNegri)
[10:34:15] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 5:00:00 on db2235.codfw.wmnet with reason: upgrade
[10:34:29] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2235.codfw.wmnet with reason: upgrade
[10:35:23] <marostegui>	 !log Reboot db2235 m5 codfw master
[10:35:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:10] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Validators: Allow an interface to be called just "irb" on a device [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1105346 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[10:37:30] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
[10:37:58] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Update pc4-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1111189 (https://phabricator.wikimedia.org/T383398)
[10:38:50] <icinga-wm>	 PROBLEM - MariaDB Replica IO: m5 on db2160 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl2024@db2235.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 100000 message: Cant connect to server on db2235.codfw.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:39:29] <wikibugs>	 (03Merged) 10jenkins-bot: Validators: Allow an interface to be called just "irb" on a device [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1105346 (https://phabricator.wikimedia.org/T371088) (owner: 10Cathal Mooney)
[10:39:30] <marostegui>	 ^ expected (I downtimed it)
[10:39:50] <marostegui>	 Ah, I made a typo for the hostname and that's why it alerted
[10:39:54] <marostegui>	 anyway, will recover soon
[10:41:21] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update pc4-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1111189 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[10:41:26] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[10:41:48] <icinga-wm>	 RECOVERY - MariaDB Replica IO: m5 on db2160 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:43:06] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[10:43:39] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[10:45:23] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[10:46:47] <wikibugs>	 (03CR) 10JMeybohm: "Not sure what you're referring to but let me try to explain my idea:" [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:47:29] <wikibugs>	 (03PS2) 10JMeybohm: Support multiple kubernetes-client versions [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984)
[10:47:30] <wikibugs>	 (03Abandoned) 10Vgutierrez: liberica: liberica got renamed to libericad [puppet] - 10https://gerrit.wikimedia.org/r/1099216 (owner: 10Vgutierrez)
[10:47:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72036 and previous config saved to /var/cache/conftool/dbconfig/20250114-104743-root.json
[10:51:27] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] "you got some syntax error per https://integration.wikimedia.org/ci/job/alerts-pipeline-test/2216/console" [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[10:52:50] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: BGP status (instance cr2-eqord) - https://phabricator.wikimedia.org/T383302#10457281 (10cmooney) a:03cmooney I fired off a mail to Planters Telecom Collective asking if they still needed the sessions.  I'll remove if they don't come b...
[10:57:57] <wikibugs>	 (03PS1) 10Fabfur: hiera: add haproxykafka to codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111193 (https://phabricator.wikimedia.org/T378578)
[11:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1100)
[11:02:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72037 and previous config saved to /var/cache/conftool/dbconfig/20250114-110248-root.json
[11:05:48] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
[11:06:54] <wikibugs>	 (03PS3) 10JMeybohm: Update to kubernetes v1.31.4 [debs/kubernetes] (v1.31) - 10https://gerrit.wikimedia.org/r/1109672 (https://phabricator.wikimedia.org/T341984)
[11:09:22] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: varnishmtail metric loss due to mtail not reading from pipe fast enough - https://phabricator.wikimedia.org/T293879#10457334 (10fgiunchedi) I'm untagging o11y here since things seem stable and there's no action ATM, please reach out if things change!
[11:10:26] <wikibugs>	 (03PS1) 10Michael Große: fix(tracking): TimingMetric:observe records milliseconds [extensions/GrowthExperiments] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111196 (https://phabricator.wikimedia.org/T383208)
[11:10:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/GrowthExperiments] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111196 (https://phabricator.wikimedia.org/T383208) (owner: 10Michael Große)
[11:13:25] <logmsgbot>	 !log jynus@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1003.eqiad.wmnet with reason: os upgrade
[11:13:39] <logmsgbot>	 !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1003.eqiad.wmnet with reason: os upgrade
[11:14:20] <logmsgbot>	 !log jynus@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2239.codfw.wmnet with reason: reboot
[11:14:46] <logmsgbot>	 !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: reboot
[11:14:56] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Remove es1020 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1111198 (https://phabricator.wikimedia.org/T383578)
[11:15:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Remove es1020 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1111198 (https://phabricator.wikimedia.org/T383578) (owner: 10Marostegui)
[11:16:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Remove es1020 from dbctl for decommission T383578', diff saved to https://phabricator.wikimedia.org/P72038 and previous config saved to /var/cache/conftool/dbconfig/20250114-111647-marostegui.json
[11:16:52] <stashbot>	 T383578: decommission es1020.eqiad.wmnet - https://phabricator.wikimedia.org/T383578
[11:17:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72039 and previous config saved to /var/cache/conftool/dbconfig/20250114-111754-root.json
[11:25:44] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Remove es1020 [puppet] - 10https://gerrit.wikimedia.org/r/1111199 (https://phabricator.wikimedia.org/T383578)
[11:27:12] <wikibugs>	 (03PS1) 10Muehlenhoff: presto::server: Specify ports as integers, not strings [puppet] - 10https://gerrit.wikimedia.org/r/1111200
[11:27:27] <wikibugs>	 (03PS2) 10Muehlenhoff: presto::server: Specify ports as integers, not strings [puppet] - 10https://gerrit.wikimedia.org/r/1111200
[11:28:35] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts es1020.eqiad.wmnet
[11:30:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Remove es1020 [puppet] - 10https://gerrit.wikimedia.org/r/1111199 (https://phabricator.wikimedia.org/T383578) (owner: 10Marostegui)
[11:33:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72040 and previous config saved to /var/cache/conftool/dbconfig/20250114-113259-root.json
[11:33:15] <wikibugs>	 (03PS1) 10Vgutierrez: Add missing includes for private1-d8-codfw reverse zones [dns] - 10https://gerrit.wikimedia.org/r/1111201
[11:34:33] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[11:37:17] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111200 (owner: 10Muehlenhoff)
[11:37:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[11:38:24] <wikibugs>	 (03PS3) 10Muehlenhoff: presto::server: Specify ports as integers, not strings [puppet] - 10https://gerrit.wikimedia.org/r/1111200
[11:38:36] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[11:38:36] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:38:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1020.eqiad.wmnet
[11:39:16] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission es1020.eqiad.wmnet - https://phabricator.wikimedia.org/T383578#10457409 (10Marostegui) a:05Marostegui→03None
[11:39:37] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission es1020.eqiad.wmnet - https://phabricator.wikimedia.org/T383578#10457415 (10Marostegui) This is ready for #dc-ops
[11:40:47] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Rename mw237[3-6] to wikikube-worker22[16-19] [puppet] - 10https://gerrit.wikimedia.org/r/1111187 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[11:41:22] <wikibugs>	 (03CR) 10Volans: [C:03+1] "Include LGTM and the files are already generated." [dns] - 10https://gerrit.wikimedia.org/r/1111201 (owner: 10Vgutierrez)
[11:42:10] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111193 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[11:42:12] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Update to calico v3.29.1 [debs/calico] (v3.29) - 10https://gerrit.wikimedia.org/r/1109671 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[11:42:29] <wikibugs>	 (03CR) 10Mvolz: [C:04-1] rest-gateway: add params to config, rework citoid path matching (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/973362 (https://phabricator.wikimedia.org/T329049) (owner: 10Hnowlan)
[11:43:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[11:45:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1110873 (https://phabricator.wikimedia.org/T383271) (owner: 10JHathaway)
[11:46:10] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10cloud-services-team (FY2024/2025-Q3-Q4), 13Patch-For-Review: Add permissions for Komla to run WMCS cookbooks - https://phabricator.wikimedia.org/T379159#10457419 (10joanna_borun) Approved
[11:48:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1044 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P72041 and previous config saved to /var/cache/conftool/dbconfig/20250114-114804-root.json
[11:50:59] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/1111184 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[11:51:54] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Add missing includes for private1-d8-codfw reverse zones [dns] - 10https://gerrit.wikimedia.org/r/1111201 (owner: 10Vgutierrez)
[11:52:51] <logmsgbot>	 !log vgutierrez@dns1004 START - running authdns-update
[11:53:25] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] valid_sections.pp: Add pc6 and pc7 [puppet] - 10https://gerrit.wikimedia.org/r/1111184 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[11:54:00] <icinga-wm>	 PROBLEM - Host dbproxy1025 is DOWN: PING CRITICAL - Packet loss = 100%
[11:54:38] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111200 (owner: 10Muehlenhoff)
[11:54:44] <logmsgbot>	 !log vgutierrez@dns1004 END - running authdns-update
[11:55:02] <icinga-wm>	 RECOVERY - Host dbproxy1025 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms
[11:55:44] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.mysql.pool db2212 gradually with 4 steps - Maint over
[11:57:58] <wikibugs>	 (03PS4) 10Muehlenhoff: presto::server: Specify ports as integers, not strings [puppet] - 10https://gerrit.wikimedia.org/r/1111200
[11:58:19] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Improve how we generate DNS entries from Netbox - https://phabricator.wikimedia.org/T362985#10457455 (10cmooney)
[11:58:57] <wikibugs>	 (03CR) 10Volans: [C:03+2] "Merging, last PS just fixed a typo in a docstring." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105351 (https://phabricator.wikimedia.org/T365454) (owner: 10Volans)
[11:59:46] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111200 (owner: 10Muehlenhoff)
[12:01:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:01:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2015,2017].codfw.wmnet,pc[1014-1015,1017].eqiad.wmnet with reason: maintenance
[12:02:02] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2015,2017].codfw.wmnet,pc[1014-1015,1017].eqiad.wmnet with reason: maintenance
[12:02:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool pc5 eqiad codfw dbmaint T383398', diff saved to https://phabricator.wikimedia.org/P72043 and previous config saved to /var/cache/conftool/dbconfig/20250114-120234-marostegui.json
[12:02:38] <stashbot>	 T383398: Reorganize and clean existing pc1-pc5 sections - https://phabricator.wikimedia.org/T383398
[12:08:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool pc5 T383398', diff saved to https://phabricator.wikimedia.org/P72044 and previous config saved to /var/cache/conftool/dbconfig/20250114-120804-marostegui.json
[12:08:09] <stashbot>	 T383398: Reorganize and clean existing pc1-pc5 sections - https://phabricator.wikimedia.org/T383398
[12:10:07] <wikibugs>	 (03Merged) 10jenkins-bot: api: allow to abort before run() [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105351 (https://phabricator.wikimedia.org/T365454) (owner: 10Volans)
[12:10:29] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:11:53] <wikibugs>	 (03PS1) 10Marostegui: pc1014: Move it to pc4 [puppet] - 10https://gerrit.wikimedia.org/r/1111205 (https://phabricator.wikimedia.org/T383398)
[12:12:58] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: ClusterConfig: add support for dumps trait [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109108 (https://phabricator.wikimedia.org/T382947)
[12:12:58] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: Use a bespoke database configuration for dumps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109109 (https://phabricator.wikimedia.org/T382947)
[12:13:00] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] pc1014: Move it to pc4 [puppet] - 10https://gerrit.wikimedia.org/r/1111205 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[12:16:06] <wikibugs>	 (03PS3) 10Volans: api: allow to skip the START log to SAL [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105666 (https://phabricator.wikimedia.org/T324655)
[12:16:20] <wikibugs>	 (03PS1) 10Btullis: airflow: Use the existing labels for kubernetes and spark operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111206 (https://phabricator.wikimedia.org/T383430)
[12:16:41] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Reorganize pc4 and pc5 [puppet] - 10https://gerrit.wikimedia.org/r/1111207 (https://phabricator.wikimedia.org/T383398)
[12:17:33] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] site.pp: Reorganize pc4 and pc5 [puppet] - 10https://gerrit.wikimedia.org/r/1111207 (https://phabricator.wikimedia.org/T383398) (owner: 10Marostegui)
[12:18:01] <wikibugs>	 06SRE, 10Scap, 06serviceops-radar: Introduce state to Scap - https://phabricator.wikimedia.org/T209881#10457507 (10jijiki) 05Open→03Invalid With #mw-on-k8s, this task is invalid.
[12:18:36] <wikibugs>	 (03PS2) 10Btullis: airflow: Use the existing labels for kubernetes and spark operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111206 (https://phabricator.wikimedia.org/T383430)
[12:18:47] <godog>	 jouncebot: now and next
[12:18:47] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 41 minute(s)
[12:19:13] <godog>	 jouncebot: bro :(
[12:20:42] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1 C:03+2] prometheus: k8s instances migration to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[12:23:19] <wikibugs>	 06SRE, 10Scap, 06serviceops-radar: Introduce state to Scap - https://phabricator.wikimedia.org/T209881#10457532 (10jijiki)
[12:24:43] <wikibugs>	 06SRE, 06serviceops: Canaries canaries canaries - https://phabricator.wikimedia.org/T210143#10457535 (10jijiki) 05Open→03Resolved a:03jijiki I think this is resolved.
[12:26:40] <wikibugs>	 06SRE, 10Scap, 06serviceops, 05Goal: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156#10457542 (10jijiki) 05Open→03Resolved a:03jijiki Main goal was T282148.
[12:30:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch magru01 to managed /var/lib/ganeti/known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/1109092 (https://phabricator.wikimedia.org/T309724) (owner: 10Muehlenhoff)
[12:32:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1 C:03+2] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[12:34:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: add initial lv size to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1109680 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[12:34:15] <wikibugs>	 (03PS3) 10Filippo Giunchedi: prometheus: add initial lv size to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1109680 (https://phabricator.wikimedia.org/T371087)
[12:37:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] prometheus: add initial lv size to prometheus::instances [puppet] - 10https://gerrit.wikimedia.org/r/1109680 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[12:38:19] <wikibugs>	 (03PS1) 10Marostegui: installserver: Do not format es1042 [puppet] - 10https://gerrit.wikimedia.org/r/1111210 (https://phabricator.wikimedia.org/T382569)
[12:41:07] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2212 gradually with 4 steps - Maint over
[12:41:45] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Do not format es1042 [puppet] - 10https://gerrit.wikimedia.org/r/1111210 (https://phabricator.wikimedia.org/T382569) (owner: 10Marostegui)
[12:41:45] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "ack thanks for the clarification! Makes sense now." [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:42:51] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] P:conftool: allow the parsercache section flavor [puppet] - 10https://gerrit.wikimedia.org/r/1110880 (https://phabricator.wikimedia.org/T383324) (owner: 10Scott French)
[12:44:06] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Remove db2128 [puppet] - 10https://gerrit.wikimedia.org/r/1111211 (https://phabricator.wikimedia.org/T383572)
[12:44:29] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts db2128.codfw.wmnet
[12:44:45] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Remove db2128 [puppet] - 10https://gerrit.wikimedia.org/r/1111211 (https://phabricator.wikimedia.org/T383572) (owner: 10Marostegui)
[12:49:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1111213
[12:49:07] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[12:50:50] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Support multiple kubernetes-client versions [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/1109458 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:52:34] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2128.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[12:52:48] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2128.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[12:52:49] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:52:49] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2128.codfw.wmnet
[12:53:02] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission db2128.codfw.wmnet - https://phabricator.wikimedia.org/T383572#10457620 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1002 for hosts: `db2128.codfw.wmnet` - db2128.codfw.wmnet (**...
[12:53:04] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission db2128.codfw.wmnet - https://phabricator.wikimedia.org/T383572#10457621 (10Marostegui) a:05Marostegui→03None
[12:53:20] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware, 13Patch-For-Review: decommission db2128.codfw.wmnet - https://phabricator.wikimedia.org/T383572#10457626 (10Marostegui) This is ready for #dc-ops
[12:54:47] <wikibugs>	 (03CR) 10Jelto: "I left a comment in-line" [puppet] - 10https://gerrit.wikimedia.org/r/1110813 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:55:43] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:57:16] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1109704 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[12:58:05] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - k8s-ingress-ml-staging_31443: Servers ml-staging2002.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[12:58:32] <wikibugs>	 (03PS1) 10Awight: Switch to explicit numbering for Parsoid footnote markers [extensions/Cite] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111215 (https://phabricator.wikimedia.org/T382310)
[12:59:05] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[12:59:10] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[2373-2376].codfw.wmnet
[12:59:14] <wikibugs>	 (03CR) 10Awight: [C:04-2] "Cherry-pick to wmf/1.44.0-wmf.12 is scheduled for 20 January" [extensions/Cite] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111215 (https://phabricator.wikimedia.org/T382310) (owner: 10Awight)
[13:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1300)
[13:01:06] <wikibugs>	 (03CR) 10Ladsgroup: Add new file tables to WMCS views (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1110046 (https://phabricator.wikimedia.org/T383491) (owner: 10Ladsgroup)
[13:01:30] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[2373-2376].codfw.wmnet
[13:03:51] <wikibugs>	 (03PS3) 10NMW03: Add azwiki to mobile-anon-talk dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394)
[13:04:06] <wikibugs>	 (03CR) 10Jelto: [C:03+2] Rename mw237[3-6] to wikikube-worker22[16-19] [puppet] - 10https://gerrit.wikimedia.org/r/1111187 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[13:04:18] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394) (owner: 10NMW03)
[13:04:46] <Nemoralis>	 @jouncebot: next
[13:04:52] <Nemoralis>	 jouncebot: next
[13:04:52] <jouncebot>	 In 0 hour(s) and 55 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1400)
[13:06:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2373 to wikikube-worker2216
[13:06:24] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:06:51] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[13:06:51] <icinga-wm>	 status
[13:09:07] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitorin
[13:09:07] <icinga-wm>	 status
[13:09:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2373 to wikikube-worker2216 - jelto@cumin1002"
[13:10:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2373 to wikikube-worker2216 - jelto@cumin1002"
[13:10:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:10:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2216
[13:10:23] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2216
[13:11:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2373 to wikikube-worker2216
[13:11:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2374 to wikikube-worker2217
[13:11:39] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:15:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2374 to wikikube-worker2217 - jelto@cumin1002"
[13:15:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2374 to wikikube-worker2217 - jelto@cumin1002"
[13:15:58] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:15:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2217
[13:16:12] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2217
[13:16:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on mw2375:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2375 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:16:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2374 to wikikube-worker2217
[13:17:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2375 to wikikube-worker2218
[13:17:42] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:21:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on mw2376:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2376 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[13:22:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2375 to wikikube-worker2218 - jelto@cumin1002"
[13:22:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2375 to wikikube-worker2218 - jelto@cumin1002"
[13:22:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:22:26] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2218
[13:22:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2218
[13:23:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2375 to wikikube-worker2218
[13:24:27] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:24:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.rename from mw2376 to wikikube-worker2219
[13:24:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:28:18] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2376 to wikikube-worker2219 - jelto@cumin1002"
[13:28:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2376 to wikikube-worker2219 - jelto@cumin1002"
[13:28:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:28:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2219
[13:28:47] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2219
[13:29:26] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2376 to wikikube-worker2219
[13:29:38] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2216.codfw.wmnet wikikube-worker2217.codfw.wmnet wikikube-worker2218.codfw.wmnet wikikube-worker2219.codfw.wmnet on all recursors
[13:29:42] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2216.codfw.wmnet wikikube-worker2217.codfw.wmnet wikikube-worker2218.codfw.wmnet wikikube-worker2219.codfw.wmnet on all recursors
[13:29:49] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1111200 (owner: 10Muehlenhoff)
[13:30:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:30:47] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow-research: disable the airflow systemd services [puppet] - 10https://gerrit.wikimedia.org/r/1109714 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[13:31:08] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Didn't we already do this?" [puppet] - 10https://gerrit.wikimedia.org/r/1109714 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[13:32:43] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2216.codfw.wmnet with OS bookworm
[13:32:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2216
[13:34:03] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:34:35] <wikibugs>	 (03CR) 10Brouberol: "Turns out we did it for search, but not research" [puppet] - 10https://gerrit.wikimedia.org/r/1109714 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[13:34:42] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-research: disable the airflow systemd services [puppet] - 10https://gerrit.wikimedia.org/r/1109714 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[13:35:55] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow: Use the existing labels for kubernetes and spark operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111206 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[13:37:26] <wikibugs>	 (03CR) 10Btullis: [C:03+2] airflow: Use the existing labels for kubernetes and spark operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111206 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[13:37:45] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2216 - jelto@cumin1002"
[13:37:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2216 - jelto@cumin1002"
[13:37:50] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:37:50] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2216.codfw.wmnet 145.48.192.10.in-addr.arpa 5.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:37:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2216.codfw.wmnet 145.48.192.10.in-addr.arpa 5.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:37:54] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2216
[13:38:13] <wikibugs>	 (03PS3) 10David Caro: ceph::conf: allow passing min_delay option [puppet] - 10https://gerrit.wikimedia.org/r/1109454 (https://phabricator.wikimedia.org/T371501)
[13:38:13] <wikibugs>	 (03PS1) 10David Caro: toolforge::prometheus: remove frontproxy-redis [puppet] - 10https://gerrit.wikimedia.org/r/1111221
[13:38:13] <wikibugs>	 (03PS1) 10David Caro: toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222
[13:38:52] <wikibugs>	 (03PS2) 10David Caro: toolforge::prometheus: remove frontproxy-redis [puppet] - 10https://gerrit.wikimedia.org/r/1111221
[13:38:52] <wikibugs>	 (03PS2) 10David Caro: toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222
[13:39:16] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2216
[13:39:16] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2216
[13:39:36] <wikibugs>	 (03Merged) 10jenkins-bot: airflow: Use the existing labels for kubernetes and spark operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111206 (https://phabricator.wikimedia.org/T383430) (owner: 10Btullis)
[13:39:55] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2217.codfw.wmnet with OS bookworm
[13:40:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2217
[13:40:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:40:34] <wikibugs>	 (03PS3) 10David Caro: toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222 (https://phabricator.wikimedia.org/T314664)
[13:40:57] <wikibugs>	 (03PS1) 10Brouberol: airflow-research: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/1111223 (https://phabricator.wikimedia.org/T380620)
[13:40:59] <wikibugs>	 (03PS3) 10David Caro: toolforge::prometheus: remove frontproxy-redis [puppet] - 10https://gerrit.wikimedia.org/r/1111221 (https://phabricator.wikimedia.org/T314664)
[13:41:07] <wikibugs>	 (03PS4) 10David Caro: toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222 (https://phabricator.wikimedia.org/T314664)
[13:41:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow-research: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/1111223 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[13:42:49] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2218.codfw.wmnet with OS bookworm
[13:43:35] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2217 - jelto@cumin1002"
[13:43:39] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2217 - jelto@cumin1002"
[13:43:40] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:43:40] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2217.codfw.wmnet 146.48.192.10.in-addr.arpa 6.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:43:43] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2217.codfw.wmnet 146.48.192.10.in-addr.arpa 6.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:43:43] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2217
[13:43:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2217
[13:43:55] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2217
[13:44:09] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[13:44:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2218
[13:44:23] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[13:44:34] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:44:41] <jayme>	 !log imported kubernetes 1.23.14-5 to bullseye/bookworm-wikimedia - T341984
[13:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:44] <stashbot>	 T341984: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984
[13:46:16] <wikibugs>	 10ops-eqiad, 06Data-Persistence, 06DC-Ops, 10RESTBase: Q3:rack/setup/install restbase104[345] - https://phabricator.wikimedia.org/T383673 (10RobH) 03NEW
[13:46:37] <wikibugs>	 10ops-eqiad, 06Data-Persistence, 06DC-Ops, 10RESTBase: Q3:rack/setup/install restbase104[345] - https://phabricator.wikimedia.org/T383673#10457774 (10RobH)
[13:47:08] <wikibugs>	 10ops-eqiad, 06Data-Persistence, 06DC-Ops, 10RESTBase: Q3:rack/setup/install restbase104[345] - https://phabricator.wikimedia.org/T383673#10457777 (10RobH) a:03Eevans Please note the workflow for racking tasks has changed this fiscal year, and we now require the puppet updates from the sub-team receiving...
[13:47:50] <wikibugs>	 (03PS1) 10Kamila Součková: kubernetes: rename mw141[4-6,9] -> kubernetes-worker10[99-01] [puppet] - 10https://gerrit.wikimedia.org/r/1111225 (https://phabricator.wikimedia.org/T365571)
[13:47:58] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2218 - jelto@cumin1002"
[13:48:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2218 - jelto@cumin1002"
[13:48:02] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:48:02] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2218.codfw.wmnet 147.48.192.10.in-addr.arpa 7.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:48:05] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2218.codfw.wmnet 147.48.192.10.in-addr.arpa 7.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:48:06] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2218
[13:48:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2218
[13:48:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2218
[13:48:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2219.codfw.wmnet with OS bookworm
[13:48:33] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2219
[13:48:48] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.netbox
[13:50:39] <wikibugs>	 (03PS1) 10Daimona Eaytoy: test(2)wiki: Explicitly assign event organizer rights to all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822)
[13:50:44] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:50:48] <jayme>	 !log imported calico 3.29.1-1 to bookworm-wikimedia - T341984
[13:50:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:52] <stashbot>	 T341984: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984
[13:51:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822) (owner: 10Daimona Eaytoy)
[13:51:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] test(2)wiki: Explicitly assign event organizer rights to all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822) (owner: 10Daimona Eaytoy)
[13:51:33] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[13:52:22] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Update to kubernetes v1.31.4 [debs/kubernetes] (v1.31) - 10https://gerrit.wikimedia.org/r/1109672 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[13:53:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2219 - jelto@cumin1002"
[13:53:27] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2219 - jelto@cumin1002"
[13:53:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:53:28] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2219.codfw.wmnet 148.48.192.10.in-addr.arpa 8.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:53:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2219.codfw.wmnet 148.48.192.10.in-addr.arpa 8.4.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:53:31] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2219
[13:53:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2219
[13:53:53] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2219
[13:54:21] <wikibugs>	 (03PS2) 10Daimona Eaytoy: test(2)wiki: Explicitly assign event organizer rights to all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822)
[13:54:39] <wikibugs>	 (03PS1) 10Btullis: airflow: Allow the scheduler to patch existing pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111230 (https://phabricator.wikimedia.org/T380621)
[13:55:26] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Spot on" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111230 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[13:56:12] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2216.codfw.wmnet with reason: host reimage
[13:56:33] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[13:56:48] <wikibugs>	 (03CR) 10Btullis: [C:03+2] airflow: Allow the scheduler to patch existing pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111230 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[13:57:44] <jayme>	 !log imported kubernetes 1.31.4-1 to bookworm-wikimedia - T341984
[13:57:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:47] <stashbot>	 T341984: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984
[13:58:15] <wikibugs>	 (03Merged) 10jenkins-bot: airflow: Allow the scheduler to patch existing pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111230 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[13:58:27] <wikibugs>	 (03CR) 10CDanis: [C:03+1] P:conftool: allow the parsercache section flavor [puppet] - 10https://gerrit.wikimedia.org/r/1110880 (https://phabricator.wikimedia.org/T383324) (owner: 10Scott French)
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1400). Please do the needful.
[14:00:05] <jouncebot>	 Daimona, steve_munene, MichaelG_WMF, and Nemoralis: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:12] <Nemoralis>	 o/
[14:00:15] <Lucas_WMDE>	 o/
[14:00:16] * MichaelG_WMF is here :)
[14:01:05] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2217.codfw.wmnet with reason: host reimage
[14:01:11] <MichaelG_WMF>	 My change only adjusts how timing metrics are being tracked. There is probably nothing to test there.
[14:01:16] <stevemunene>	 Hi Lucas_WMDE checking whether https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1105878 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1105879 are in plan for todays window
[14:01:20] <Daimona>	 o/
[14:01:26] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[14:01:39] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[14:01:45] <wikibugs>	 (03CR) 10FNegri: [C:03+2] Add komla to wmcs-roots [puppet] - 10https://gerrit.wikimedia.org/r/1087919 (https://phabricator.wikimedia.org/T379159) (owner: 10FNegri)
[14:02:20] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
[14:02:31] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2216.codfw.wmnet with reason: host reimage
[14:02:43] <Lucas_WMDE>	 stevemunene: yes, they’re in the deployment calendar
[14:02:45] <Lucas_WMDE>	 I can deploy!
[14:03:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10cloud-services-team (FY2024/2025-Q3-Q4), 13Patch-For-Review: Add permissions for Komla to run WMCS cookbooks - https://phabricator.wikimedia.org/T379159#10457879 (10fnegri) 05Open→03Resolved
[14:03:29] <stevemunene>	 Nice thanks Lucas_WMDE cc dcausse 
[14:03:35] <dcausse>	 o/
[14:03:43] <Lucas_WMDE>	 let’s start with Daimona 
[14:03:52] <Lucas_WMDE>	 and I think I’d like to deploy those changes separately
[14:04:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109842 (https://phabricator.wikimedia.org/T383154) (owner: 10Daimona Eaytoy)
[14:04:24] <Lucas_WMDE>	 ^ this one looks a bit larger than I’d be comfortable with deploying together with the other one ^^
[14:04:38] <Lucas_WMDE>	 (fortunately the large CampaignEvents config changes will soon be history anyway)
[14:04:48] <wikibugs>	 (03Merged) 10jenkins-bot: Enable CampaignEvents extension on idwiki, itwiki, mswiki, and plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109842 (https://phabricator.wikimedia.org/T383154) (owner: 10Daimona Eaytoy)
[14:05:17] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1109842|Enable CampaignEvents extension on idwiki, itwiki, mswiki, and plwiki (T383154)]]
[14:05:21] <stashbot>	 T383154: Release CampaignEvents extension to Indonesian, Italian, Malay, and Polish Wikipedia - https://phabricator.wikimedia.org/T383154
[14:05:24] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2218.codfw.wmnet with reason: host reimage
[14:05:44] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2217.codfw.wmnet with reason: host reimage
[14:05:46] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.197 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:06:13] <wikibugs>	 07Puppet, 10SRE-swift-storage, 10SRE-tools, 06DC-Ops, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10457893 (10elukey) Some tests to see if JBOD could be forced directly from the OS without rebooting into BIOS:  `...
[14:08:43] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:08:48] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:09:23] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2218.codfw.wmnet with reason: host reimage
[14:09:41] <wikibugs>	 (03PS2) 10Ottomata: admin - remove and deprecate unused eventlogging groups [puppet] - 10https://gerrit.wikimedia.org/r/1110845 (https://phabricator.wikimedia.org/T238230)
[14:10:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1110845 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata)
[14:11:01] <wikibugs>	 (03CR) 10Ottomata: "Okay!  I updated the README too to fix the docs then." [puppet] - 10https://gerrit.wikimedia.org/r/1110845 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata)
[14:11:43] <wikibugs>	 (03PS2) 10TChin: mw-content-history-reconcile-enrich: Add HA storageDir and Ceph egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1109448 (https://phabricator.wikimedia.org/T375176)
[14:11:48] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8923 bytes in 6.382 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:12:16] <wikibugs>	 (03PS1) 10Hashar: scap: do not show logo when cleaning old versions [puppet] - 10https://gerrit.wikimedia.org/r/1111233 (https://phabricator.wikimedia.org/T303828)
[14:12:34] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53367 bytes in 0.103 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:15:06] <wikibugs>	 (03PS3) 10TChin: mw-content-history-reconcile-enrich: Add HA storageDir and Ceph egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1109448 (https://phabricator.wikimedia.org/T375176)
[14:15:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, daimona: Backport for [[gerrit:1109842|Enable CampaignEvents extension on idwiki, itwiki, mswiki, and plwiki (T383154)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:15:16] <stashbot>	 T383154: Release CampaignEvents extension to Indonesian, Italian, Malay, and Polish Wikipedia - https://phabricator.wikimedia.org/T383154
[14:15:32] <wikibugs>	 (03CR) 10Hashar: "Ref: https://phabricator.wikimedia.org/T303828#10456908" [puppet] - 10https://gerrit.wikimedia.org/r/1111233 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[14:15:50] <wikibugs>	 (03CR) 10TChin: mw-content-history-reconcile-enrich: Add HA storageDir and Ceph egress (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1109448 (https://phabricator.wikimedia.org/T375176) (owner: 10TChin)
[14:16:00] <Lucas_WMDE>	 diff to api.php?action=query&meta=siteinfo&siprop=usergroups|restrictions&format=json&formatversion=2 on the four described wikis looks good to me FWIW
[14:16:24] <Lucas_WMDE>	 (using https://github.com/lucaswerkmeister/home/blob/main/.bashrc.d/wikimedia-debug-diff)
[14:17:58] <logmsgbot>	 !log jelto@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2219.codfw.wmnet with OS bookworm
[14:18:13] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2219.codfw.wmnet with OS bookworm
[14:18:16] <Lucas_WMDE>	 Daimona: can you test the change on mwdebug?
[14:18:17] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2219
[14:18:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2219
[14:19:42] <wikibugs>	 (03PS4) 10Jcrespo: dbbackups: Review and update grants for m1 dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182 (https://phabricator.wikimedia.org/T373579)
[14:19:54] <wikibugs>	 (03PS1) 10Ssingh: sre.dns.admin: update show to use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236
[14:21:10] <Nemoralis>	 :eyes:
[14:21:35] <Daimona>	 Lucas_WMDE: looks good to me, thanks!
[14:21:44] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, daimona: Continuing with sync
[14:21:46] <Lucas_WMDE>	 ok!
[14:21:55] <wikibugs>	 (03PS2) 10Ssingh: sre.dns.admin: update show to use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236
[14:22:25] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2216.codfw.wmnet with OS bookworm
[14:23:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] "it sure has" [puppet] - 10https://gerrit.wikimedia.org/r/1111222 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[14:23:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] toolforge::prometheus: remove frontproxy-redis [puppet] - 10https://gerrit.wikimedia.org/r/1111221 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[14:24:26] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:24:35] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[14:25:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:26:08] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2217.codfw.wmnet with OS bookworm
[14:26:27] <Lucas_WMDE>	 stevemunene: I’m guessing it’s probably okay to deploy your two changes together? (once we get to them)
[14:26:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 15 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[14:27:04] <logmsgbot>	 !log root@cumin1002 START - Cookbook sre.puppet.renew-cert for dbprov1003.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[14:28:12] <stevemunene>	 Yes it is Lucas_WMDE cc dcausse 
[14:28:31] <Lucas_WMDE>	 ok
[14:28:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.dns.admin: update show to use CookbookInitSuccess [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[14:28:38] <dcausse>	 +1
[14:28:51] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2218.codfw.wmnet with OS bookworm
[14:29:17] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1109842|Enable CampaignEvents extension on idwiki, itwiki, mswiki, and plwiki (T383154)]] (duration: 23m 59s)
[14:29:20] <stashbot>	 T383154: Release CampaignEvents extension to Indonesian, Italian, Malay, and Polish Wikipedia - https://phabricator.wikimedia.org/T383154
[14:29:25] <wikibugs>	 (03CR) 10Ssingh: "Failure is to be expected but I will rebase and ask for review when the Spicerack change is deployed." [cookbooks] - 10https://gerrit.wikimedia.org/r/1111236 (owner: 10Ssingh)
[14:29:51] <logmsgbot>	 !log root@cumin1002 END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1003.eqiad.wmnet: Renew puppet certificate - root@cumin1002
[14:30:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822) (owner: 10Daimona Eaytoy)
[14:30:44] <wikibugs>	 (03CR) 10Volans: [C:03+2] api: allow to skip the START log to SAL [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105666 (https://phabricator.wikimedia.org/T324655) (owner: 10Volans)
[14:30:52] <wikibugs>	 (03Merged) 10jenkins-bot: test(2)wiki: Explicitly assign event organizer rights to all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111227 (https://phabricator.wikimedia.org/T376822) (owner: 10Daimona Eaytoy)
[14:31:05] <wikibugs>	 (03PS1) 10Btullis: airflow: revert the change to the kube-api networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111237 (https://phabricator.wikimedia.org/T380621)
[14:31:21] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1111227|test(2)wiki: Explicitly assign event organizer rights to all users (T376822)]]
[14:31:24] <stashbot>	 T376822: Configure the CampaignEvents extension to use the event-organizer group by default - https://phabricator.wikimedia.org/T376822
[14:31:32] <wikibugs>	 (03CR) 10Jelto: "lgtm, nit: I used T377876 as the task for eqiad renames and reimages" [puppet] - 10https://gerrit.wikimedia.org/r/1111225 (https://phabricator.wikimedia.org/T365571) (owner: 10Kamila Součková)
[14:31:40] <wikibugs>	 (03CR) 10Jelto: [C:03+1] kubernetes: rename mw141[4-6,9] -> kubernetes-worker10[99-01] [puppet] - 10https://gerrit.wikimedia.org/r/1111225 (https://phabricator.wikimedia.org/T365571) (owner: 10Kamila Součková)
[14:32:34] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] Eventstreams: Bump image, use service-utils [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111105 (https://phabricator.wikimedia.org/T361769) (owner: 10TChin)
[14:33:52] <wikibugs>	 (03CR) 10FNegri: Revert "Block PAWS workers nodes from all UDP traffic other than DNS & NTP" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1105036 (https://phabricator.wikimedia.org/T383261) (owner: 10FNegri)
[14:34:54] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:35:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2219.codfw.wmnet with reason: host reimage
[14:35:44] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.190 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:53] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] admin - remove and deprecate unused eventlogging groups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1110845 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata)
[14:37:00] <wikibugs>	 (03CR) 10Btullis: [C:03+2] airflow: revert the change to the kube-api networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111237 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[14:37:17] <wikibugs>	 10ops-eqiad, 06SRE, 10Ceph, 10Cloud-VPS, and 2 others: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10458049 (10fnegri)
[14:37:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 daimona, lucaswerkmeister-wmde: Backport for [[gerrit:1111227|test(2)wiki: Explicitly assign event organizer rights to all users (T376822)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:37:50] <stashbot>	 T376822: Configure the CampaignEvents extension to use the event-organizer group by default - https://phabricator.wikimedia.org/T376822
[14:38:00] <Nemoralis>	 Lucas_WMDE: how many minutes will it take to reach my patch in the deployment list? I have to leave in ~10 minutes
[14:38:06] <wikibugs>	 (03PS1) 10Marostegui: production-parsercache.sql.erb: Add new sections [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234)
[14:38:26] <wikibugs>	 (03CR) 10Marostegui: "This is a noop,no grants are changing" [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:38:44] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] airflow: revert the change to the kube-api networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111237 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[14:38:44] <wikibugs>	 (03Merged) 10jenkins-bot: airflow: revert the change to the kube-api networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111237 (https://phabricator.wikimedia.org/T380621) (owner: 10Btullis)
[14:39:03] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2219.codfw.wmnet with reason: host reimage
[14:39:22] <Lucas_WMDE>	 Nemoralis: we definitely won’t have time for it then, sorry :(
[14:39:24] <wikibugs>	 (03CR) 10Ladsgroup: production-parsercache.sql.erb: Add new sections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:39:34] <Lucas_WMDE>	 10 minutes isn’t enough to finish this deployment and start yours even if we jump the rest of the queue
[14:39:42] <wikibugs>	 (03CR) 10Gehel: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1110862 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[14:39:57] <Lucas_WMDE>	 Daimona: AFAICT the only difference is that the rights get reordered, i.e. effectively a no-op ^^
[14:39:57] <wikibugs>	 (03CR) 10Marostegui: production-parsercache.sql.erb: Add new sections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:39:59] <Lucas_WMDE>	 can you confirm?
[14:40:08] <wikibugs>	 (03PS2) 10Marostegui: production-parsercache.sql.erb: Add new sections [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234)
[14:40:15] <Nemoralis>	 Lucas_WMDE: ok, I will reschedule my patch to late backport window, thanks :D
[14:40:17] <wikibugs>	 (03CR) 10Marostegui: production-parsercache.sql.erb: Add new sections (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:41:16] <Lucas_WMDE>	 I’ll start gate-and-submit for the backport already
[14:41:18] <Lucas_WMDE>	 jouncebot: next
[14:41:18] <jouncebot>	 In 1 hour(s) and 18 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1600)
[14:41:25] <wikibugs>	 (03Merged) 10jenkins-bot: api: allow to skip the START log to SAL [software/spicerack] - 10https://gerrit.wikimedia.org/r/1105666 (https://phabricator.wikimedia.org/T324655) (owner: 10Volans)
[14:41:27] <wikibugs>	 (03PS1) 10Ottomata: configcluster.yaml - remove eventlogging from profile::etcd::tlsproxy::acls [puppet] - 10https://gerrit.wikimedia.org/r/1111239 (https://phabricator.wikimedia.org/T238230)
[14:41:28] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111196 (https://phabricator.wikimedia.org/T383208) (owner: 10Michael Große)
[14:41:49] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[14:42:15] <wikibugs>	 (03CR) 10Ottomata: "Moritz, I'm not sure if this is the right thing to do. We can abandon if we should just leave this." [puppet] - 10https://gerrit.wikimedia.org/r/1111239 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata)
[14:43:06] <Daimona>	 ok, great :)
[14:43:31] <Lucas_WMDE>	 is it okay to deploy then?
[14:46:02] <MichaelG_WMF>	 Lucas_WMDE: mine (GrowthExperiments) is okay to deploy, but not sure if you were asking me or Daimona 
[14:46:09] <Lucas_WMDE>	 I’m asking Daimona 
[14:46:16] <Daimona>	 Yup, okay to deploy, sorry!
[14:46:17] <Lucas_WMDE>	 still sitting at the “continue with sync?” prompt
[14:46:19] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 daimona, lucaswerkmeister-wmde: Continuing with sync
[14:46:21] <Lucas_WMDE>	 ok, thanks!
[14:46:48] <marostegui>	 Amir1: happy with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1111238 ?
[14:46:49] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] shellbox-syntaxhighlight: 1 eqiad replica on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087579 (https://phabricator.wikimedia.org/T377038) (owner: 10Scott French)
[14:47:19] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:47:21] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
[14:47:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] production-parsercache.sql.erb: Add new sections [puppet] - 10https://gerrit.wikimedia.org/r/1111238 (https://phabricator.wikimedia.org/T383234) (owner: 10Marostegui)
[14:47:58] <Amir1>	 Thanks marostegui. If you see it somewhere, let's just change it to pcX or remove the list altogether (depending on the case)
[14:48:44] <godog>	 jouncebot: next
[14:48:45] <jouncebot>	 In 1 hour(s) and 11 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1600)
[14:48:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] thanos-query: write active queries to file [puppet] - 10https://gerrit.wikimedia.org/r/1110798 (https://phabricator.wikimedia.org/T383570) (owner: 10Filippo Giunchedi)
[14:48:58] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/1111241
[14:50:00] <Lucas_WMDE>	 godog: I’m currently deploying, and if it’s okay I’d probably like to overrun the window
[14:50:07] <Lucas_WMDE>	 (as there are still some changes pending)
[14:50:47] <godog>	 Lucas_WMDE: ack thank you that's fine
[14:50:53] <Lucas_WMDE>	 ok :)
[14:51:41] <wikibugs>	 (03PS1) 10Brouberol: airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620)
[14:52:40] <wikibugs>	 (03PS5) 10Jcrespo: dbbackups: Review and update grants for m1 dump user on codfw [puppet] - 10https://gerrit.wikimedia.org/r/1111182 (https://phabricator.wikimedia.org/T373579)
[14:52:48] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[14:53:35] <wikibugs>	 (03PS2) 10Brouberol: airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620)
[14:53:47] <wikibugs>	 (03CR) 10Stevemunene: "looks good!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[14:53:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111227|test(2)wiki: Explicitly assign event organizer rights to all users (T376822)]] (duration: 22m 26s)
[14:53:51] <stashbot>	 T376822: Configure the CampaignEvents extension to use the event-organizer group by default - https://phabricator.wikimedia.org/T376822
[14:54:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105879 (https://phabricator.wikimedia.org/T374021) (owner: 10Stevemunene)
[14:54:06] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105878 (https://phabricator.wikimedia.org/T377956) (owner: 10Stevemunene)
[14:54:37] <wikibugs>	 (03CR) 10CI reject: [V:04-1] airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[14:54:50] <wikibugs>	 (03Merged) 10jenkins-bot: Make WikibaseQualityConstraints use split-graph query service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105879 (https://phabricator.wikimedia.org/T374021) (owner: 10Stevemunene)
[14:54:53] <wikibugs>	 (03Merged) 10jenkins-bot: Make WikimediaCampaignEvents use split-graph query service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105878 (https://phabricator.wikimedia.org/T377956) (owner: 10Stevemunene)
[14:55:19] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1105879|Make WikibaseQualityConstraints use split-graph query service (T374021)]], [[gerrit:1105878|Make WikimediaCampaignEvents use split-graph query service (T377956)]]
[14:55:24] <stashbot>	 T374021: Make WikibaseQualityConstraints use split-graph query service - https://phabricator.wikimedia.org/T374021
[14:55:24] <stashbot>	 T377956: Make WikimediaCampaignEvents use split-graph query service - https://phabricator.wikimedia.org/T377956
[14:57:06] <wikibugs>	 (03PS3) 10Brouberol: airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620)
[14:58:12] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[14:58:33] <dcausse>	 Daimona: o/ is there a special or API we could use to test a change to the sparql endpoints used by the WikimediaCampaignEvents extension?
[14:58:50] <dcausse>	 s/special/special page/
[14:59:12] <godog>	 Lucas_WMDE: please LMK once you are done and I'll finish the rollout of my change in eqiad
[14:59:28] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2219.codfw.wmnet with OS bookworm
[14:59:29] <Lucas_WMDE>	 can do
[14:59:41] <Lucas_WMDE>	 I can also take a break in between if you want
[14:59:42] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: revert to having the scheduling using an http check [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111245 (https://phabricator.wikimedia.org/T380620) (owner: 10Brouberol)
[14:59:45] <Lucas_WMDE>	 (though idk how long the rollout takes ^^)
[14:59:53] <wikibugs>	 (03CR) 10Majavah: [C:03+1] "whoops" [puppet] - 10https://gerrit.wikimedia.org/r/1111221 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[14:59:56] <wikibugs>	 (03CR) 10Majavah: [C:03+1] toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[15:00:18] <Daimona>	 dcausse: hi! Yep, you can test it here: https://meta.wikimedia.org/wiki/Special:AllEvents?tab=form-tabs-1 (also on other wikis with the CampaignEvents extension enabled)
[15:00:30] <dcausse>	 thx!
[15:01:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 stevemunene, lucaswerkmeister-wmde: Backport for [[gerrit:1105879|Make WikibaseQualityConstraints use split-graph query service (T374021)]], [[gerrit:1105878|Make WikimediaCampaignEvents use split-graph query service (T377956)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:01:31] <stashbot>	 T374021: Make WikibaseQualityConstraints use split-graph query service - https://phabricator.wikimedia.org/T374021
[15:01:31] <stashbot>	 T377956: Make WikimediaCampaignEvents use split-graph query service - https://phabricator.wikimedia.org/T377956
[15:01:47] <Lucas_WMDE>	 I can test the WikibaseQualityConstraints part
[15:01:50] <godog>	 Lucas_WMDE: please go ahead, thank you though
[15:01:55] <dcausse>	 Lucas_WMDE: thanks!
[15:02:05] <godog>	 rollout on my end is quick but it can impact monitoring queries
[15:03:05] <wikibugs>	 (03Merged) 10jenkins-bot: fix(tracking): TimingMetric:observe records milliseconds [extensions/GrowthExperiments] (wmf/1.44.0-wmf.12) - 10https://gerrit.wikimedia.org/r/1111196 (https://phabricator.wikimedia.org/T383208) (owner: 10Michael Große)
[15:03:18] <Daimona>	 BTW, apologies but I can't test the CampaignEvents stuff because I'm overwhelmed with meetings today :)
[15:03:35] <Daimona>	 (in the context of split-graph migration)
[15:03:57] <Daimona>	 But do ping me if anything looks wrong and I'll reserve some time to take a look
[15:04:07] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[15:04:16] <Lucas_WMDE>	 hm, https://www.wikidata.org/wiki/Special:ConstraintReport/Q4115189 stops showing the distinct-values constraint violation when I turn on WikimediaDebug :/
[15:04:28] <Lucas_WMDE>	 dcausse: do you know if the split query service is lagging behind more, perhaps?
[15:04:32] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[15:04:48] * Lucas_WMDE tries to query the split services manually
[15:04:59] <dcausse>	 Lucas_WMDE: no it should not...
[15:05:14] <Lucas_WMDE>	 you’re right, https://w.wiki/CigH finds it just fine
[15:05:21] <Lucas_WMDE>	 hm
[15:05:28] * Lucas_WMDE peeks at logstash
[15:05:30] <dcausse>	 constraints are checked via a job?
[15:05:41] <dcausse>	 so perhaps not easily testable?
[15:05:46] <Lucas_WMDE>	 no, they’re checked live
[15:05:50] <dcausse>	 ok
[15:05:53] <Lucas_WMDE>	 and the special page also bypasses the cache that’s normally there
[15:06:07] <Lucas_WMDE>	 (they used to be checked via jobs too but that’s currently disabled I believe)
[15:06:28] <dcausse>	 ack
[15:06:36] <Lucas_WMDE>	 no messages from server:www.wikidata.org in logstash mwdebug at all o_O
[15:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:07:12] <dcausse>	 so for the campaigns I can navigate just fine through Special:AllEvents on meta hitting a debug
[15:07:13] <Lucas_WMDE>	 https://www.wikidata.org/wiki/Special:ConstraintReport/Q35017419 doesn’t show the duplicate on the sandbox either
[15:07:17] <Lucas_WMDE>	 so it’s broken in both directions, it seems
[15:07:23] <dcausse>	 :/
[15:07:48] <Lucas_WMDE>	 I think the WBQC part needs a revert (and then further investigation)
[15:07:52] <Lucas_WMDE>	 wondering what the better way to do this is
[15:08:00] <Lucas_WMDE>	 roll out both changes now and then revert the WBQC part
[15:08:07] <dcausse>	 ack
[15:08:17] <Lucas_WMDE>	 or abort the current deployment and then deploy the WBQC revert (which would include syncing the second change)
[15:08:26] <Lucas_WMDE>	 that might be faster, actually. only one sync-world instead of two
[15:09:04] <Lucas_WMDE>	 (I was first thinking, it’s fine to still deploy the WBQC change and only revert it afterwards because the breakage isn’t critical, but it would be slower that way anyways ^^)
[15:09:42] <dcausse>	 Lucas_WMDE: I think the duplication cannot work cross graph
[15:10:06] <dcausse>	 wait perhaps it can
[15:10:20] * dcausse needs to look at the codebase again
[15:10:48] * Lucas_WMDE looks at the code
[15:10:58] <Lucas_WMDE>	 hm
[15:11:02] <Lucas_WMDE>	 I think you might be right
[15:11:25] <Lucas_WMDE>	 we’re not just looking for entities with value X, we’re looking for entities with the same value as the base entity
[15:11:28] <Lucas_WMDE>	 so they need to be in the same graph
[15:11:53] <Lucas_WMDE>	 >.<
[15:12:01] <Lucas_WMDE>	 we need to properly serialize the value we’re looking for into the query
[15:12:23] <Lucas_WMDE>	 (and use Wikibase’s RdfBuilder stuff for that, instead of the ridiculous home-grown getRdfLiteral() method that I slapped together in what I think was my first month or two at WMDE lol)
[15:12:29] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Sync cancelled.
[15:12:31] <dcausse>	 so it relies on the query service to be updated to work properly
[15:12:54] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Revert "Make WikibaseQualityConstraints use split-graph query service" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111253 (https://phabricator.wikimedia.org/T374021)
[15:13:10] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] shellbox-syntaxhighlight: all eqiad replicas on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087580 (https://phabricator.wikimedia.org/T377038) (owner: 10Scott French)
[15:13:16] <wikibugs>	 (03PS4) 10Herron: thanos-rule: manage retention setting [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756)
[15:13:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111253 (https://phabricator.wikimedia.org/T374021) (owner: 10Lucas Werkmeister (WMDE))
[15:13:59] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Make WikibaseQualityConstraints use split-graph query service" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111253 (https://phabricator.wikimedia.org/T374021) (owner: 10Lucas Werkmeister (WMDE))
[15:14:58] <Lucas_WMDE>	 ah, and the backport merged in the meantime
[15:15:03] <Lucas_WMDE>	 so MichaelG_WMF this deployment will include that :)
[15:15:15] <MichaelG_WMF>	 YaY 😊
[15:15:38] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1111253|Revert "Make WikibaseQualityConstraints use split-graph query service" (T374021)]], [[gerrit:1105878|Make WikimediaCampaignEvents use split-graph query service (T377956)]]
[15:15:43] <stashbot>	 T374021: Make WikibaseQualityConstraints use split-graph query service - https://phabricator.wikimedia.org/T374021
[15:15:43] <stashbot>	 T377956: Make WikimediaCampaignEvents use split-graph query service - https://phabricator.wikimedia.org/T377956
[15:17:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: site: add prometheus200[78] [puppet] - 10https://gerrit.wikimedia.org/r/1111256 (https://phabricator.wikimedia.org/T383232)
[15:18:20] <Lucas_WMDE>	 dcausse: I left a comment in T374021
[15:18:38] <Lucas_WMDE>	 (not sure if that should actually be in that task or a separate new task, we’ll see)
[15:19:17] <dcausse>	 Lucas_WMDE: thanks! sure, I'll followup in phan
[15:19:23] <dcausse>	 phab*
[15:19:48] <Lucas_WMDE>	 phan 🤝 phab: often need followups
[15:21:16] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] scap: do not show logo when cleaning old versions [puppet] - 10https://gerrit.wikimedia.org/r/1111233 (https://phabricator.wikimedia.org/T303828) (owner: 10Hashar)
[15:21:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4795/co" [puppet] - 10https://gerrit.wikimedia.org/r/1111256 (https://phabricator.wikimedia.org/T383232) (owner: 10Filippo Giunchedi)
[15:21:44] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 stevemunene, lucaswerkmeister-wmde: Backport for [[gerrit:1111253|Revert "Make WikibaseQualityConstraints use split-graph query service" (T374021)]], [[gerrit:1105878|Make WikimediaCampaignEvents use split-graph query service (T377956)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:21:49] <stashbot>	 T374021: Make WikibaseQualityConstraints use split-graph query service - https://phabricator.wikimedia.org/T374021
[15:21:49] <stashbot>	 T377956: Make WikimediaCampaignEvents use split-graph query service - https://phabricator.wikimedia.org/T377956
[15:23:04] <Lucas_WMDE>	 now https://www.wikidata.org/wiki/Special:ConstraintReport/Q4115189 *only* shows the constraint violation on mwdebug o_O
[15:23:16] <Lucas_WMDE>	 but I guess that’s better than the other way around ^^
[15:23:20] <Lucas_WMDE>	 MichaelG_WMF: can you test your change?
[15:23:37] <Lucas_WMDE>	 (I  probably should’ve aborted the scap backport and instead added your change URL to the arguments so it would be included in all the messages, meh)
[15:23:47] <Lucas_WMDE>	 (though at that point the SAL would probably start to get truncated by IRC anyway)
[15:24:05] <MichaelG_WMF>	 Lucas_WMDE: no, not really. It is just a change to how we record metrics with the new system
[15:24:11] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2128.codfw.wmnet - https://phabricator.wikimedia.org/T383572#10458286 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[15:24:12] <Lucas_WMDE>	 ok now https://www.wikidata.org/wiki/Special:ConstraintReport/Q4115189 is working both with and without WikimediaDebug which is expected
[15:24:14] <Lucas_WMDE>	 oh right
[15:24:17] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 stevemunene, lucaswerkmeister-wmde: Continuing with sync
[15:24:19] <Lucas_WMDE>	 syncing then
[15:25:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 07Kubernetes: hw troubleshooting: Comm Error: backplane 0 for wikikube-worker2192.codfw.wmnet - https://phabricator.wikimedia.org/T383339#10458299 (10Jhancock.wm) a:05Papaul→03Jhancock.wm
[15:25:44] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10458301 (10Jhancock.wm) a:03Jhancock.wm
[15:25:57] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Degraded RAID due to failed sdy on ms-be2075 - https://phabricator.wikimedia.org/T383530#10458302 (10Jhancock.wm) a:03Jhancock.wm
[15:26:19] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 10:00:00 on db[2133,2160,2233].codfw.wmnet with reason: cloning
[15:26:34] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2133,2160,2233].codfw.wmnet with reason: cloning
[15:27:38] <marostegui>	 !log Stop in sync db2133 db2233 m2 codfw dbmaint T373579
[15:27:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:43] <stashbot>	 T373579: Productionize db22[21-40] - https://phabricator.wikimedia.org/T373579
[15:28:09] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mediawiki: enable mesh telemetry in mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110818 (owner: 10Scott French)
[15:29:14] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10458314 (10Jhancock.wm)
[15:31:41] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111253|Revert "Make WikibaseQualityConstraints use split-graph query service" (T374021)]], [[gerrit:1105878|Make WikimediaCampaignEvents use split-graph query service (T377956)]] (duration: 16m 03s)
[15:31:52] <stashbot>	 T374021: Make WikibaseQualityConstraints use split-graph query service - https://phabricator.wikimedia.org/T374021
[15:31:53] <stashbot>	 T377956: Make WikimediaCampaignEvents use split-graph query service - https://phabricator.wikimedia.org/T377956
[15:32:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Fix Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1111213 (owner: 10Muehlenhoff)
[15:33:25] <wikibugs>	 (03PS1) 10Marostegui: db2233.yaml: Make it master [puppet] - 10https://gerrit.wikimedia.org/r/1111258 (https://phabricator.wikimedia.org/T373579)
[15:33:41] <Lucas_WMDE>	 !log previous deployment also included [[gerrit:1111196|fix(tracking): TimingMetric:observe records milliseconds]] (T383208)
[15:33:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:44] <stashbot>	 T383208: StatsLib timings MUST be recorded as milliseconds - https://phabricator.wikimedia.org/T383208
[15:33:49] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2233.yaml: Make it master [puppet] - 10https://gerrit.wikimedia.org/r/1111258 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[15:34:01] <Lucas_WMDE>	 okay, I think that’s the deployment window done
[15:34:09] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:34:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:26] <Lucas_WMDE>	 godog: fyi ^
[15:34:51] <Lucas_WMDE>	 hm, logspam watch has some “PHP Warning: Stats: Cannot add labels to a metric containing samples for 'update_mentee_data_seconds'”
[15:34:56] <Lucas_WMDE>	 MichaelG_WMF: could that be related to your change?
[15:35:00] <Lucas_WMDE>	 (but it looks like it stopped again)
[15:35:04] * Lucas_WMDE looks at logstash
[15:35:21] <MichaelG_WMF>	 Mh, is that _new_?
[15:35:35] <Lucas_WMDE>	 maybe not
[15:35:36] <MichaelG_WMF>	 This is a know issue with our existing code, 
[15:35:41] <Lucas_WMDE>	 last occurrence 15:23
[15:35:58] <Lucas_WMDE>	 looks like it started Jan 7
[15:36:06] <wikibugs>	 (03PS1) 10Brouberol: airflow: ensure the pooler URI uses a terninated FQDN [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111259 (https://phabricator.wikimedia.org/T383651)
[15:36:07] <wikibugs>	 (03PS1) 10Phuedx: Beta Cluster: Update MetricsPlatform extension config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111260 (https://phabricator.wikimedia.org/T381964)
[15:36:09] <wikibugs>	 (03PS1) 10Phuedx: Enable MetricsPlatform extension everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111261 (https://phabricator.wikimedia.org/T381964)
[15:36:11] <wikibugs>	 (03PS1) 10Phuedx: testwiki: Enable MetricsPlatform stream config fetching and merging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111262 (https://phabricator.wikimedia.org/T381964)
[15:36:11] <Lucas_WMDE>	 and really ramped up Jan 10
[15:36:31] <Lucas_WMDE>	 so probably unrelated, and just showed up at the top of logspam-watch coincidentally
[15:36:40] <MichaelG_WMF>	 yes, this is something recently introduced and that part is also touched by my change, but my change should not affect that in particular
[15:36:53] <MichaelG_WMF>	 (we're working on a fix)
[15:36:56] <Lucas_WMDE>	 ok
[15:37:22] <Lucas_WMDE>	 and it’s all on mwmaint2002, so I guess the spike in the logs is just from whenever the maintenance script runs
[15:37:30] <Lucas_WMDE>	 (every 3 hours, judging by “last 24 hours” in logstash)
[15:37:36] <wikibugs>	 (03PS1) 10Marostegui: db2133: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111263 (https://phabricator.wikimedia.org/T373579)
[15:38:22] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10458388 (10JMeybohm) Hi @Jhancock.wm -  I could do next Monday (20th January) 15:30Z, would that work for you?
[15:38:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2133: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1111263 (https://phabricator.wikimedia.org/T373579) (owner: 10Marostegui)
[15:40:04] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] profile::mediawiki::common: Remove obsolete DSH group check [puppet] - 10https://gerrit.wikimedia.org/r/1110872 (https://phabricator.wikimedia.org/T370527) (owner: 10Andrea Denisse)
[15:40:29] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: ensure the pooler URI uses a terninated FQDN [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111259 (https://phabricator.wikimedia.org/T383651) (owner: 10Brouberol)
[15:40:57] <godog>	 Lucas_WMDE: ack thx
[15:42:07] <wikibugs>	 07Puppet, 10SRE-swift-storage, 10SRE-tools, 06DC-Ops, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10458416 (10elukey) I fear that this SAS controller doesn't support JBOD unless it is configured via BIOS, so real...
[15:42:33] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: ensure the pooler URI uses a terninated FQDN [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111259 (https://phabricator.wikimedia.org/T383651) (owner: 10Brouberol)
[15:43:19] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1184 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/1111264 (https://phabricator.wikimedia.org/T383689)
[15:43:30] <wikibugs>	 (03PS1) 10CDanis: urldownloader: scrub outbound privacy-sensitive hdrs [puppet] - 10https://gerrit.wikimedia.org/r/1111265 (https://phabricator.wikimedia.org/T340552)
[15:43:38] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[15:43:45] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2212 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/1111266 (https://phabricator.wikimedia.org/T383690)
[15:43:50] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/1111267 (https://phabricator.wikimedia.org/T383690)
[15:44:48] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383595#10458451 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[15:44:50] <jelto>	 !log homer 'lsw1-d3-codfw*' commit 'T377877'
[15:44:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:54] <stashbot>	 T377877: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877
[15:45:33] <jelto>	 !log homer 'cr*codfw*' commit 'T377877'
[15:45:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:46:25] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[15:46:29] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 112, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:46:51] <wikibugs>	 (03PS2) 10Kamila Součková: kubernetes: rename mw141[4-6,9] -> kubernetes-worker10[99-01] [puppet] - 10https://gerrit.wikimedia.org/r/1111225 (https://phabricator.wikimedia.org/T377876)
[15:46:58] <wikibugs>	 (03CR) 10David Caro: [C:03+2] toolforge::proxy: remove absenting statement [puppet] - 10https://gerrit.wikimedia.org/r/1111222 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[15:47:00] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[15:47:01] <wikibugs>	 (03CR) 10David Caro: [C:03+2] toolforge::prometheus: remove frontproxy-redis [puppet] - 10https://gerrit.wikimedia.org/r/1111221 (https://phabricator.wikimedia.org/T314664) (owner: 10David Caro)
[15:47:14] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2216-2219].codfw.wmnet
[15:47:17] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2216-2219].codfw.wmnet
[15:48:06] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T383691 (10Jelto) 03NEW
[15:48:42] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw[1414-1416,1419].eqiad.wmnet
[15:49:55] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+2] kubernetes: rename mw141[4-6,9] -> kubernetes-worker10[99-01] [puppet] - 10https://gerrit.wikimedia.org/r/1111225 (https://phabricator.wikimedia.org/T377876) (owner: 10Kamila Součková)
[15:50:54] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw[1414-1416,1419].eqiad.wmnet
[15:52:13] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1414 to wikikube-worker1098
[15:52:33] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[15:53:20] <moritzm>	 !log import prometheus-mysqld-exporter 0.13.0-1~bpo11+1 to the main component of bullseye-wikimedia (import from bullseye-backports which is going away) T383557
[15:53:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:23] <stashbot>	 T383557: Deprecate use of bullseye-backports - https://phabricator.wikimedia.org/T383557
[15:53:59] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitorin
[15:53:59] <icinga-wm>	 status
[15:53:59] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv6: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad, AS64601/IPv4: Active - kubernetes-eqiad https://wikitech.wikimedia.org/wiki/Network_monitorin
[15:53:59] <icinga-wm>	 status
[15:54:53] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1415 to wikikube-worker1099
[15:54:57] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1416 to wikikube-worker1100
[15:55:05] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1419 to wikikube-worker1101
[15:55:24] <wikibugs>	 (03PS1) 10Muehlenhoff: No longer import prometheus-mysqld-exporter from bullseye-backports [puppet] - 10https://gerrit.wikimedia.org/r/1111269 (https://phabricator.wikimedia.org/T383557)
[15:56:17] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1414 to wikikube-worker1098 - kamila@cumin1002"
[15:56:25] <logmsgbot>	 !log kamila@cumin1002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1414 to wikikube-worker1098 - kamila@cumin1002"
[15:56:25] <logmsgbot>	 !log kamila@cumin1002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[15:56:29] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[15:56:30] <logmsgbot>	 !log kamila@cumin1002 END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1414 to wikikube-worker1098
[15:57:53] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111269 (https://phabricator.wikimedia.org/T383557) (owner: 10Muehlenhoff)
[15:58:35] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.rename from mw1414 to wikikube-worker1098
[15:59:26] <wikibugs>	 (03PS1) 10Jelto: Rename mw23[69-72] to wikikube-worker222[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/1111271 (https://phabricator.wikimedia.org/T377877)
[15:59:42] <wikibugs>	 07Puppet, 10MW-on-K8s, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q3): Clean up "git repo needs merge" checks - https://phabricator.wikimedia.org/T370530#10458592 (10lmata)
[15:59:44] <wikibugs>	 10SRE-swift-storage, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q3): Remove load_average check for ms-be/thanos-be - https://phabricator.wikimedia.org/T370526#10458593 (10lmata)
[16:00:05] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for SRE Collaboration Services office hours deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1600).
[16:00:13] <wikibugs>	 07sre-alert-triage, 10SRE Observability (FY2024/2025-Q3): Alert in need of triage: AlertLintProblem (instance localhost:9123) - https://phabricator.wikimedia.org/T354255#10458603 (10lmata)
[16:00:51] <wikibugs>	 06SRE, 10observability, 10Observability-Logging, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710#10458606 (10lmata)
[16:00:53] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1415 to wikikube-worker1099 - kamila@cumin1002"
[16:01:19] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1415 to wikikube-worker1099 - kamila@cumin1002"
[16:01:19] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:01:19] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1099
[16:01:45] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[16:02:35] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1099
[16:03:14] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1415 to wikikube-worker1099
[16:03:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: ClusterConfig: add support for dumps trait (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109108 (https://phabricator.wikimedia.org/T382947) (owner: 10Giuseppe Lavagetto)
[16:04:34] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: ClusterConfig: add support for dumps trait [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109108 (https://phabricator.wikimedia.org/T382947)
[16:04:34] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Use a bespoke database configuration for dumps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109109 (https://phabricator.wikimedia.org/T382947)
[16:05:30] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1414 to wikikube-worker1098 - kamila@cumin1002"
[16:05:34] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1414 to wikikube-worker1098 - kamila@cumin1002"
[16:05:34] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:05:35] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1098
[16:05:39] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/1111267 (https://phabricator.wikimedia.org/T383690) (owner: 10Gerrit maintenance bot)
[16:06:37] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[16:07:14] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1098
[16:07:53] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1414 to wikikube-worker1098
[16:09:06] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:09:06] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1100
[16:09:41] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.netbox
[16:10:39] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1100
[16:11:18] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1416 to wikikube-worker1100
[16:12:03] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:12:04] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1101
[16:13:18] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1101
[16:13:57] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1419 to wikikube-worker1101
[16:14:03] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker1098.eqiad.wmnet wikikube-worker1099.eqiad.wmnet wikikube-worker1100.eqiad.wmnet wikikube-worker1101.eqiad.wmnet on all recursors
[16:14:07] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1098.eqiad.wmnet wikikube-worker1099.eqiad.wmnet wikikube-worker1100.eqiad.wmnet wikikube-worker1101.eqiad.wmnet on all recursors
[16:15:53] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1099.eqiad.wmnet with OS bookworm
[16:15:57] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1099
[16:15:58] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1099
[16:16:04] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1100.eqiad.wmnet with OS bookworm
[16:16:08] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1100
[16:16:08] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1100
[16:16:17] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1101.eqiad.wmnet with OS bookworm
[16:16:20] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1101
[16:16:21] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1101
[16:16:26] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker1098.eqiad.wmnet with OS bookworm
[16:16:30] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker1098
[16:16:31] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1098
[16:17:17] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "The change LGTM. I'd like to see also the addition of some httpbb tests, though." [puppet] - 10https://gerrit.wikimedia.org/r/1109196 (https://phabricator.wikimedia.org/T377187) (owner: 10Gergő Tisza)
[16:20:32] <wikibugs>	 (03CR) 10Gergő Tisza: "The tests are in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1099339. Would you prefer them squashed in one commit?" [puppet] - 10https://gerrit.wikimedia.org/r/1109196 (https://phabricator.wikimedia.org/T377187) (owner: 10Gergő Tisza)
[16:21:08] <wikibugs>	 06SRE: contint1002 - puppet failure - https://phabricator.wikimedia.org/T383699 (10Dzahn) 03NEW
[16:21:29] <wikibugs>	 06SRE: contint1002 - puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458781 (10Dzahn)
[16:24:01] <wikibugs>	 06SRE: contint1002 - puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458790 (10Dzahn) maybe caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1108772  ?
[16:25:55] <wikibugs>	 (03CR) 10Dzahn: "does it seem possible this caused puppet breakage like "Error while evaluating a Function Call, value returned from k8s::fetch_clusters ha" [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[16:27:26] <wikibugs>	 06SRE: contint1002 - puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458807 (10Dzahn)
[16:29:37] <wikibugs>	 06SRE, 06collaboration-services, 10observability: contint1002 - puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458860 (10Dzahn)
[16:31:59] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1099.eqiad.wmnet with reason: host reimage
[16:32:09] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1098.eqiad.wmnet with reason: host reimage
[16:32:39] <wikibugs>	 (03CR) 10CDanis: [C:03+1] Added new stream config for haproxy_requestctl [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111166 (https://phabricator.wikimedia.org/T383392) (owner: 10Fabfur)
[16:33:43] <wikibugs>	 06SRE, 06collaboration-services, 10observability: contint1002 - puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458884 (10Dzahn) The line in question is:   `     $kubernetes_clusters = k8s::fetch_clusters()  `  which is a function describ...
[16:35:43] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1099.eqiad.wmnet with reason: host reimage
[16:38:46] <wikibugs>	 06SRE, 06collaboration-services, 10observability: contint*- puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458922 (10Dzahn)
[16:39:11] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1098.eqiad.wmnet with reason: host reimage
[16:40:04] <wikibugs>	 06SRE, 06collaboration-services, 10observability: contint*- puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10458941 (10Dzahn) https://puppetboard.wikimedia.org/nodes?status=failed
[16:40:24] <wikibugs>	 (03CR) 10Dzahn: "https://puppetboard.wikimedia.org/nodes?status=failed" [puppet] - 10https://gerrit.wikimedia.org/r/1108772 (https://phabricator.wikimedia.org/T371087) (owner: 10Filippo Giunchedi)
[16:47:56] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Move kafka-main2010 within the same rack - https://phabricator.wikimedia.org/T381788#10458970 (10Jhancock.wm) We are off on the 20th in the US. but the rest of the week is good for me.
[16:48:38] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] Rename mw23[69-72] to wikikube-worker222[0-3] [puppet] - 10https://gerrit.wikimedia.org/r/1111271 (https://phabricator.wikimedia.org/T377877) (owner: 10Jelto)
[16:49:15] <wikibugs>	 (03CR) 10Thcipriani: "Is there more context for this? The only relevant task I could find was from a few years ago (https://phabricator.wikimedia.org/T283607)." [puppet] - 10https://gerrit.wikimedia.org/r/1110867 (owner: 10CDanis)
[16:50:45] <wikibugs>	 (03PS4) 10Scott French: shellbox-syntaxhighlight: 1 eqiad replica on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087579 (https://phabricator.wikimedia.org/T377038)
[16:50:45] <wikibugs>	 (03PS4) 10Scott French: shellbox-syntaxhighlight: all eqiad replicas on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087580 (https://phabricator.wikimedia.org/T377038)
[16:50:45] <wikibugs>	 (03PS4) 10Scott French: shellbox-syntaxhighlight: 1 codfw replica on 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087581 (https://phabricator.wikimedia.org/T377038)
[16:50:45] <wikibugs>	 (03PS4) 10Scott French: shellbox-syntaxhighlight: all replicas on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087582 (https://phabricator.wikimedia.org/T377038)
[16:53:48] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1099.eqiad.wmnet with OS bookworm
[17:58:16] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Frequent disk resets on ms-be2075 - https://phabricator.wikimedia.org/T382707#10459407 (10MatthewVernon) ` Jan 13 01:04:10 ms-be2075 kernel: [462667.760590] megaraid_sas 0000:18:00.0: 18530 (790045449s/0x0020/DEAD) - Fatal firmware error: Line 977 in ../.....
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1800)
[18:00:37] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1101.eqiad.wmnet with reason: host reimage
[18:04:01] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1100.eqiad.wmnet with OS bookworm
[18:04:04] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1101.eqiad.wmnet with reason: host reimage
[18:04:29] <wikibugs>	 (03PS1) 10Majavah: hieradata: Bump striker-tools to 2025-01-13-165415-production [puppet] - 10https://gerrit.wikimedia.org/r/1111292
[18:04:50] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] "PCC OK: https://puppet-compiler.wmflabs.org/output/1109188/4797/" [puppet] - 10https://gerrit.wikimedia.org/r/1109188 (https://phabricator.wikimedia.org/T353912) (owner: 10Cwhite)
[18:06:29] <wikibugs>	 (03CR) 10Majavah: [C:03+2] hieradata: Bump striker-tools to 2025-01-13-165415-production [puppet] - 10https://gerrit.wikimedia.org/r/1111292 (owner: 10Majavah)
[18:06:49] <logmsgbot>	 !log kamila@deploy2002 Finished scap sync-world: enable auth.wikimedia.org (duration: 17m 55s)
[18:07:19] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:10:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-web_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:12:08] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "PCC results:  https://puppet-compiler.wmflabs.org/output/1111285/4799/" [puppet] - 10https://gerrit.wikimedia.org/r/1111285 (https://phabricator.wikimedia.org/T383699) (owner: 10Filippo Giunchedi)
[18:12:11] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] ci: remove 'prometheus' section from kubernetes::clusters [puppet] - 10https://gerrit.wikimedia.org/r/1111285 (https://phabricator.wikimedia.org/T383699) (owner: 10Filippo Giunchedi)
[18:12:39] <kamila_>	 ^ that me, I broke httpbb on metal
[18:13:35] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:15:20] <wikibugs>	 (03PS1) 10Dduvall: ci: Install memcached for MediaWiki success cache [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243)
[18:18:29] <wikibugs>	 06SRE, 06collaboration-services, 10observability, 13Patch-For-Review: contint*- puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10459468 (10andrea.denisse) 05Open→03Resolved a:03andrea.denisse I merged and applied patch #1111285...
[18:18:31] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review: Define a schema for analytics pipeline ingestion - https://phabricator.wikimedia.org/T383392#10459472 (10nshahquinn-wmf) Viewing and editing this task is not actually restricted.
[18:20:48] <wikibugs>	 06SRE, 10Ceph, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10459488 (10dcaro) Just finished restarting all the osd daemons, all the traffic should now being tagged correctly 👍
[18:22:20] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1101.eqiad.wmnet with OS bookworm
[18:26:23] <moritzm>	 !log installing rsync security updates on bookworm
[18:26:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:37] <wikibugs>	 06SRE, 06collaboration-services, 10observability, 13Patch-For-Review: contint*- puppet failure -  value returned from k8s::fetch_clusters has wrong type - https://phabricator.wikimedia.org/T383699#10459497 (10Dzahn) Thanks for the quick response and fix. I confirm puppet works again :)
[18:26:42] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099338 (https://phabricator.wikimedia.org/T380574) (owner: 10Gergő Tisza)
[18:30:56] <wikibugs>	 (03PS2) 10Dduvall: ci: Install memcached for MediaWiki success cache [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243)
[18:31:23] <swfrench-wmf>	 jouncebot: nowandnext
[18:31:23] <jouncebot>	 For the next 0 hour(s) and 28 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1800)
[18:31:23] <jouncebot>	 In 0 hour(s) and 28 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1900)
[18:34:12] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mediawiki: enable mesh telemetry in mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110818 (owner: 10Scott French)
[18:34:45] <swfrench-wmf>	 in the remainder of the infra window, I'm going to deploy some metrics collection fixes for mw-videoscaler
[18:35:43] <wikibugs>	 (03CR) 10Dduvall: "Tested via a cherry-pick on `integration-puppetserver-01` and puppet run on `integration-castor05`." [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall)
[18:35:49] <wikibugs>	 (03CR) 10Dduvall: [C:03+1] ci: Install memcached for MediaWiki success cache [puppet] - 10https://gerrit.wikimedia.org/r/1111295 (https://phabricator.wikimedia.org/T383243) (owner: 10Dduvall)
[18:36:39] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: enable mesh telemetry in mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1110818 (owner: 10Scott French)
[18:41:51] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
[18:41:59] <logmsgbot>	 !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
[18:45:04] <wikibugs>	 (03PS1) 10DCausse: search: add alerts for weighted_tags indexing throughput [alerts] - 10https://gerrit.wikimedia.org/r/1111300 (https://phabricator.wikimedia.org/T373459)
[18:46:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] search: add alerts for weighted_tags indexing throughput [alerts] - 10https://gerrit.wikimedia.org/r/1111300 (https://phabricator.wikimedia.org/T373459) (owner: 10DCausse)
[18:49:46] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
[18:49:51] <logmsgbot>	 !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
[18:49:53] <wikibugs>	 (03PS2) 10DCausse: search: add alerts for weighted_tags indexing throughput [alerts] - 10https://gerrit.wikimedia.org/r/1111300 (https://phabricator.wikimedia.org/T373459)
[18:53:43] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/1111303
[18:55:41] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: k8s-only deploy to clear noop chart version diffs
[18:57:56] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: k8s-only deploy to clear noop chart version diffs (duration: 02m 15s)
[19:00:05] <jouncebot>	 thcipriani and thcipriani: Time to do the MediaWiki train - Utc-7 Version deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T1900).
[19:00:14] <thcipriani>	 oh noes
[19:00:16] <brennen>	 o/
[19:00:36] <thcipriani>	 looks like I didn't beat the automated deployment calendar run
[19:00:40] <brennen>	 heh
[19:00:42] <thcipriani>	 I'll fix that after this meeting
[19:00:52] <brennen>	 i'm taking a quick walk over to the post office pre-train, will go ahead here in ~5.
[19:01:00] <thcipriani>	 <3
[19:05:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-web_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:06:40] <wikibugs>	 (03PS1) 10Scott French: shellbox-syntaxhighlight: revert eqiad to PHP 7.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111309
[19:06:43] <swfrench-wmf>	 ^ FYI, that's a "just in case" patch. no issues encountered so far :)
[19:07:19] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:12:38] <brennen>	 !log 1.44.0-wmf.12 train (T382363): no current blockers, rolling to group0
[19:12:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:42] <stashbot>	 T382363: 1.44.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T382363
[19:14:38] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111311 (https://phabricator.wikimedia.org/T382363)
[19:14:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111311 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[19:15:29] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.44.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111311 (https://phabricator.wikimedia.org/T382363) (owner: 10TrainBranchBot)
[19:17:14] <wikibugs>	 07Puppet, 06SRE, 06Data-Engineering-Radar: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104#10459746 (10Ottomata)
[19:17:43] <wikibugs>	 07Puppet, 06SRE, 06Data-Engineering-Radar: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104#10459748 (10Ottomata) Data-Engineering no longer operates udp2log.  SRE should feel free to decline this task at will.
[19:26:29] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:28:30] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:29:06] <logmsgbot>	 !log brennen@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.12  refs T382363
[19:29:10] <stashbot>	 T382363: 1.44.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T382363
[19:29:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[19:30:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:33:58] <jinxer-wm>	 FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1012:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[19:34:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[19:38:58] <jinxer-wm>	 RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1012:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[19:42:03] <moritzm>	 !log installing rsync security updates on bullseye
[19:42:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:43:39] <jinxer-wm>	 FIRING: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[19:53:20] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] postfix: increase message size limit from 10MiB to 50MiB [puppet] - 10https://gerrit.wikimedia.org/r/1110873 (https://phabricator.wikimedia.org/T383271) (owner: 10JHathaway)
[19:57:11] <wikibugs>	 06SRE, 10DNS, 06MediaWiki-Platform-Team, 06Traffic, 05SUL3: Set up auth.wikimedia.org - https://phabricator.wikimedia.org/T377187#10460143 (10Tgr)
[19:59:37] <wikibugs>	 (03PS1) 10Clare Ming: Experiment Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111318 (https://phabricator.wikimedia.org/T374957)
[20:01:22] <logmsgbot>	 !log cdanis@cumin2002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin2002"
[20:01:24] <logmsgbot>	 !log cdanis@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin2002
[20:01:55] <logmsgbot>	 !log cdanis@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin2002
[20:01:57] <logmsgbot>	 !log cdanis@cumin2002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin2002"
[20:03:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106739 (owner: 10Tacsipacsi)
[20:03:47] <wikibugs>	 (03PS1) 10Clare Ming: Experiment Platform Instrument Configuration: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111321 (https://phabricator.wikimedia.org/T374957)
[20:04:29] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail, 13Patch-For-Review: Message sizes exceeding limits - https://phabricator.wikimedia.org/T383271#10460171 (10jhathaway) a:03jhathaway
[20:04:37] <wikibugs>	 (03PS2) 10Clare Ming: Experiment Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111318 (https://phabricator.wikimedia.org/T374957)
[20:08:55] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Experiment Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111318 (https://phabricator.wikimedia.org/T374957) (owner: 10Clare Ming)
[20:08:56] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Mail, 13Patch-For-Review: Message sizes exceeding limits - https://phabricator.wikimedia.org/T383271#10460184 (10jhathaway) 05Open→03Resolved @DSeyfert_WMF this appears to be a regression in our mail servers when migrating from Exim to Postfix. Exim had a defa...
[20:08:56] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Experiment Platform Instrument Configuration: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111321 (https://phabricator.wikimedia.org/T374957) (owner: 10Clare Ming)
[20:09:52] <wikibugs>	 (03Merged) 10jenkins-bot: Experiment Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111318 (https://phabricator.wikimedia.org/T374957) (owner: 10Clare Ming)
[20:10:03] <wikibugs>	 (03Merged) 10jenkins-bot: Experiment Platform Instrument Configuration: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1111321 (https://phabricator.wikimedia.org/T374957) (owner: 10Clare Ming)
[20:11:09] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:11:51] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:11:59] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.193 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:12:43] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53367 bytes in 0.107 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:12:46] <wikibugs>	 (03PS2) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:13:00] <wikibugs>	 (03CR) 10Scott French: "Thanks for the reviews, all!" [puppet] - 10https://gerrit.wikimedia.org/r/1110880 (https://phabricator.wikimedia.org/T383324) (owner: 10Scott French)
[20:13:06] <wikibugs>	 (03CR) 10Scott French: [C:03+2] P:conftool: allow the parsercache section flavor [puppet] - 10https://gerrit.wikimedia.org/r/1110880 (https://phabricator.wikimedia.org/T383324) (owner: 10Scott French)
[20:13:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:14:44] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
[20:15:06] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
[20:15:32] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
[20:15:58] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
[20:18:02] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, January 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394) (owner: 10NMW03)
[20:20:51] <wikibugs>	 (03PS3) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:22:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:22:21] <icinga-wm>	 PROBLEM - BGP status on pfw1-codfw is CRITICAL: BGP CRITICAL - AS64600/IPv4: Connect - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:22:37] <wikibugs>	 (03CR) 10Herron: [V:03+1] "Thanks for having a look!  Yes, in fact I looked into this route initially but min-time/max-time supports different formats from tsdb.rete" [puppet] - 10https://gerrit.wikimedia.org/r/1111241 (https://phabricator.wikimedia.org/T352756) (owner: 10Herron)
[20:24:29] <wikibugs>	 (03CR) 10Bking: [C:03+2] cloudelastic: remove cloudelastic100[56] from conftool, add 101[12] [puppet] - 10https://gerrit.wikimedia.org/r/1110862 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[20:25:29] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:25:41] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - ASunknown/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:26:29] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-int_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-int_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:28:21] <icinga-wm>	 RECOVERY - BGP status on pfw1-codfw is OK: BGP OK - up: 7, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[20:28:28] <wikibugs>	 (03PS4) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:29:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:30:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10460247 (10phaultfinder)
[20:30:49] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9400 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9400/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9400): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:30:49] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9400 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9400/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9400): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:31:43] <wikibugs>	 06SRE, 10DNS, 06MediaWiki-Platform-Team, 06Traffic, 05SUL3: Set up auth.wikimedia.org - https://phabricator.wikimedia.org/T377187#10460255 (10Tgr) 05Open→03Resolved Working as expected:  * https://auth.wikimedia.org/enwiki/wiki/Special:UserLogin, https://auth.wikimedia.org/dewiki/wiki/Special:Use...
[20:32:11] <logmsgbot>	 !log bking@cumin2002 conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1011.eqiad.wmnet
[20:32:16] <wikibugs>	 (03PS5) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:32:47] <logmsgbot>	 !log bking@cumin2002 conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1012.eqiad.wmnet
[20:33:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:35:34] <wikibugs>	 (03PS1) 10Subramanya Sastry: Turn on Parsoid Read Views on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645)
[20:35:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10460273 (10phaultfinder)
[20:38:39] <jinxer-wm>	 RESOLVED: CirrusSearchHighOldGCFrequency: Elasticsearch instance elastic2069-production-search-psi-codfw is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[20:40:07] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9600 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9600/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9600): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:40:25] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:40:25] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:40:31] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9600 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9600/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9600): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:41:47] <wikibugs>	 (03PS1) 10Bking: cloudelastic: remove references to cloudelastic hosts before 1007 [puppet] - 10https://gerrit.wikimedia.org/r/1111326 (https://phabricator.wikimedia.org/T380937)
[20:42:17] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111326 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[20:42:33] <wikibugs>	 (03PS2) 10Ryan Kemper: cloudelastic: decom cloudelastic100[5,6] [puppet] - 10https://gerrit.wikimedia.org/r/1111326 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[20:42:38] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] cloudelastic: decom cloudelastic100[5,6] [puppet] - 10https://gerrit.wikimedia.org/r/1111326 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[20:43:52] <wikibugs>	 (03PS6) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:44:35] <wikibugs>	 (03CR) 10Urbanecm: "Should now be in production." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522) (owner: 10Michael Große)
[20:44:39] <wikibugs>	 (03CR) 10Urbanecm: [C:03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1105420 (https://phabricator.wikimedia.org/T379522) (owner: 10Michael Große)
[20:45:04] <wikibugs>	 (03CR) 10Bking: [C:03+2] cloudelastic: decom cloudelastic100[5,6] [puppet] - 10https://gerrit.wikimedia.org/r/1111326 (https://phabricator.wikimedia.org/T380937) (owner: 10Bking)
[20:45:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:46:51] <wikibugs>	 (03PS7) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:48:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:48:08] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.decommission for hosts cloudelastic[1005-1006].eqiad.wmnet
[20:48:09] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:48:59] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.195 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:53:13] <wikibugs>	 (03PS8) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:54:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:55:20] <wikibugs>	 (03PS9) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[20:55:44] <wikibugs>	 (03CR) 10Arlolra: [C:03+1] Turn on Parsoid Read Views on test2wiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645) (owner: 10Subramanya Sastry)
[20:56:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[20:56:37] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T2100).
[21:00:05] <jouncebot>	 fabfur, tgr, tacsipacsi, and Nemoralis: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:45] <cjming>	 o/
[21:00:48] <cjming>	 i can deploy
[21:01:09] <tgr|away>	 o/ I'll add one more config patch in a sec
[21:01:18] <cjming>	 np!
[21:01:52] <cjming>	 fabfur: are you around?
[21:02:39] <cjming>	 tgr: are you a self-deployer?  happy to do them for you - up to you
[21:03:01] <tacsipacsi>	 My patch is not urgent, so if you run out of time, it’s okay to delay it. After all, it’s been broken for years without anyone complaining. :)
[21:03:08] <cjming>	 lol
[21:03:33] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic[1005-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
[21:04:10] <tacsipacsi>	 Well, broken in that people were sent to a soft redirect. But that’s still annoying.
[21:04:11] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic[1005-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
[21:04:11] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:04:13] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic[1005-1006].eqiad.wmnet
[21:04:23] <cjming>	 tgr: shall i start with your first patch?
[21:04:48] <wikibugs>	 (03PS3) 10Gergő Tisza: SUL3: Add auth domain to URL tests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099338 (https://phabricator.wikimedia.org/T380574)
[21:05:39] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1098-1101].eqiad.wmnet
[21:05:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T383383#10460385 (10phaultfinder)
[21:05:41] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1098-1101].eqiad.wmnet
[21:07:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T383620#10460390 (10kamila)
[21:08:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099338 (https://phabricator.wikimedia.org/T380574) (owner: 10Gergő Tisza)
[21:09:18] <tgr|away>	 cjming: thanks! I'll self-deploy, want to do some extended testing. Can wait until all the other patches are deployed.
[21:09:41] <cjming>	 doh - i just started your first patch - sorry
[21:09:46] <tgr|away>	 though that one patch doesn't matter much
[21:09:56] <wikibugs>	 (03Merged) 10jenkins-bot: SUL3: Add auth domain to URL tests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1099338 (https://phabricator.wikimedia.org/T380574) (owner: 10Gergő Tisza)
[21:10:29] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1099338|SUL3: Add auth domain to URL tests (T380574)]]
[21:10:33] <stashbot>	 T380574: Add SUL3 authentication domain to deploy canary checks - https://phabricator.wikimedia.org/T380574
[21:10:37] <tgr|away>	 in all honesty I'm not sure what it does :) we don't seem to use that file anymore, there is a puppet file that's actually used for URL tests but I figured better to keep this one in sync
[21:11:06] <cjming>	 tgr: ok - i'll let you handle whatever other patches you add to the queue
[21:11:16] <tgr|away>	 thanks!
[21:12:00] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable SUL3 on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111330 (https://phabricator.wikimedia.org/T383729)
[21:16:33] <cjming>	 tgr: for your 1st patch tho -- can it be tested? up on mwdebug
[21:17:20] <logmsgbot>	 !log cjming@deploy2002 cjming, tgr: Backport for [[gerrit:1099338|SUL3: Add auth domain to URL tests (T380574)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:17:23] <stashbot>	 T380574: Add SUL3 authentication domain to deploy canary checks - https://phabricator.wikimedia.org/T380574
[21:18:31] <tgr|away>	 cjming: no
[21:18:47] <cjming>	 i'll just sync then
[21:18:55] <tgr|away>	 if it's still in use, scap will run the test automatically, I imagine
[21:19:01] <logmsgbot>	 !log cjming@deploy2002 cjming, tgr: Continuing with sync
[21:19:58] <wikibugs>	 (03PS2) 10Jforrester: Turn on Parsoid Read Views on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645) (owner: 10Subramanya Sastry)
[21:20:01] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] Turn on Parsoid Read Views on test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111325 (https://phabricator.wikimedia.org/T378645) (owner: 10Subramanya Sastry)
[21:20:26] <tgr|away>	 in theory these are URLs pinged during the scap canary check but I think they have been replaced by modules/profile/files/httpbb/appserver/ in puppet
[21:20:47] <cjming>	 cool - gtk
[21:26:54] <cjming>	 tacsipacsi: i'll do yours next
[21:26:59] <wikibugs>	 (03PS10) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[21:27:04] <tacsipacsi>	 Thanks!
[21:27:07] <wikibugs>	 (03PS2) 10Tacsipacsi: Fix links pointing to m:Help:Export [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106739
[21:28:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[21:28:29] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1099338|SUL3: Add auth domain to URL tests (T380574)]] (duration: 18m 00s)
[21:28:33] <stashbot>	 T380574: Add SUL3 authentication domain to deploy canary checks - https://phabricator.wikimedia.org/T380574
[21:29:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106739 (owner: 10Tacsipacsi)
[21:30:19] <wikibugs>	 (03Merged) 10jenkins-bot: Fix links pointing to m:Help:Export [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1106739 (owner: 10Tacsipacsi)
[21:30:45] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1106739|Fix links pointing to m:Help:Export]]
[21:35:24] <cjming>	 tacsipacsi: up on test servers if you want to verify
[21:35:41] <logmsgbot>	 !log cjming@deploy2002 tacsipacsi, cjming: Backport for [[gerrit:1106739|Fix links pointing to m:Help:Export]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:36:22] <tacsipacsi>	 Thanks! Checked the URLs mentioned on Gerrit, and they look good.
[21:36:28] <cjming>	 cool ! syncing
[21:36:31] <logmsgbot>	 !log cjming@deploy2002 tacsipacsi, cjming: Continuing with sync
[21:37:52] <cjming>	 Nemoralis: are you around?
[21:38:28] <cjming>	 fabfur: are you around?
[21:39:48] <Nemoralis>	 o/
[21:39:53] <Nemoralis>	 I had a patch in this window
[21:39:58] <Nemoralis>	 jouncebot: now
[21:39:58] <jouncebot>	 For the next 0 hour(s) and 20 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T2100)
[21:40:07] <cjming>	 Nemoralis: i'll do your patch next
[21:40:36] <wikibugs>	 (03PS4) 10NMW03: Add azwiki to mobile-anon-talk dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394)
[21:42:14] <wikibugs>	 10ops-eqiad, 06SRE, 10Ceph, 10Cloud-VPS, and 2 others: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10460491 (10wiki_willy) Hi @dcaro - because this was taking so long, I escalated this up to our account team again last week...and they came back tod...
[21:43:46] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1106739|Fix links pointing to m:Help:Export]] (duration: 13m 00s)
[21:44:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394) (owner: 10NMW03)
[21:44:58] <wikibugs>	 (03Merged) 10jenkins-bot: Add azwiki to mobile-anon-talk dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1109694 (https://phabricator.wikimedia.org/T383394) (owner: 10NMW03)
[21:45:29] <logmsgbot>	 !log cjming@deploy2002 Started scap sync-world: Backport for [[gerrit:1109694|Add azwiki to mobile-anon-talk dblist (T383394)]]
[21:45:32] <stashbot>	 T383394: Enable talk for mobile anon users on azwiki - https://phabricator.wikimedia.org/T383394
[21:48:28] <swfrench-wmf>	 !log deployed conftool 4.2.0 fleet-wide as of ~ 20:00 UTC (previously 4.1.0)
[21:48:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:50:14] <wikibugs>	 06SRE, 10Ceph, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10460515 (10cmooney) >>! In T371501#10459488, @dcaro wrote: > Just finished restarting all the osd daemons, all the traffic should now being t...
[21:51:56] <cjming>	 Nemoralis: on test servers if you can verify - lmk if/when to sync
[21:52:33] <logmsgbot>	 !log cjming@deploy2002 nmw03, cjming: Backport for [[gerrit:1109694|Add azwiki to mobile-anon-talk dblist (T383394)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:52:37] <stashbot>	 T383394: Enable talk for mobile anon users on azwiki - https://phabricator.wikimedia.org/T383394
[21:53:06] <Nemoralis>	 sure one sec
[21:54:54] <Nemoralis>	 cjming: LGTM
[21:55:01] <cjming>	 great - syncing
[21:55:04] <logmsgbot>	 !log cjming@deploy2002 nmw03, cjming: Continuing with sync
[21:55:45] <cjming>	 fabfur: last call
[21:55:57] <wikibugs>	 (03PS3) 10Andrea Denisse: wmcs: Migrate network saturation alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502)
[21:59:47] <wikibugs>	 (03PS11) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250114T2200)
[22:00:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:02:03] <wikibugs>	 (03PS1) 10JHathaway: kafka_shipper: when disabled, don't render templates [puppet] - 10https://gerrit.wikimedia.org/r/1111336
[22:02:17] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111336 (owner: 10JHathaway)
[22:02:24] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kafka_shipper: when disabled, don't render templates [puppet] - 10https://gerrit.wikimedia.org/r/1111336 (owner: 10JHathaway)
[22:02:32] <logmsgbot>	 !log cjming@deploy2002 Finished scap sync-world: Backport for [[gerrit:1109694|Add azwiki to mobile-anon-talk dblist (T383394)]] (duration: 17m 03s)
[22:02:36] <stashbot>	 T383394: Enable talk for mobile anon users on azwiki - https://phabricator.wikimedia.org/T383394
[22:02:45] <cjming>	 tgr: all yours
[22:03:52] <tgr|away>	 thanks cjming!
[22:03:59] <wikibugs>	 (03PS12) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:05:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:06:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111330 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[22:06:51] <wikibugs>	 (03PS13) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:07:09] <wikibugs>	 (03Merged) 10jenkins-bot: Enable SUL3 on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111330 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[22:07:37] <logmsgbot>	 !log tgr@deploy2002 Started scap sync-world: Backport for [[gerrit:1111330|Enable SUL3 on test wikis (T383729)]]
[22:07:41] <stashbot>	 T383729: SUL3 Phase 0: Account creation and login on test wikis - https://phabricator.wikimedia.org/T383729
[22:08:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:14:01] <logmsgbot>	 !log tgr@deploy2002 tgr: Backport for [[gerrit:1111330|Enable SUL3 on test wikis (T383729)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:14:01] <wikibugs>	 (03PS2) 10JHathaway: kafka_shipper: when disabled, don't render templates [puppet] - 10https://gerrit.wikimedia.org/r/1111336
[22:14:05] <stashbot>	 T383729: SUL3 Phase 0: Account creation and login on test wikis - https://phabricator.wikimedia.org/T383729
[22:14:19] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1111336 (owner: 10JHathaway)
[22:15:14] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.01.11 - 2025.01.31): decommission cloudelastic100[5-6] - https://phabricator.wikimedia.org/T380937#10460555 (10bking)
[22:15:46] <wikibugs>	 (03PS4) 10Andrea Denisse: wmcs: Migrate network saturation alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111328 (https://phabricator.wikimedia.org/T328502)
[22:16:29] <wikibugs>	 (03PS1) 10Andrea Denisse: wmcs: Migrate iowait stalling alerts to the alerts.git repository [alerts] - 10https://gerrit.wikimedia.org/r/1111338 (https://phabricator.wikimedia.org/T328502)
[22:18:20] <wikibugs>	 (03PS14) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:18:32] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 10observability, 10Observability-Alerting: Alertmanager rule for network interface errors? - https://phabricator.wikimedia.org/T335350#10460558 (10cmooney) 05Open→03Resolved a:03cmooney >>! In T335350#10456238, @andrea.denisse wrote: > Hi @cmooney,...
[22:19:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:24:11] <logmsgbot>	 !log tgr@deploy2002 Sync cancelled.
[22:29:10] <wikibugs>	 (03PS1) 10Andrea Denisse: wmcs: Remove Puppet files for migrated Prometheus alerts [puppet] - 10https://gerrit.wikimedia.org/r/1111340 (https://phabricator.wikimedia.org/T328502)
[22:29:10] <wikibugs>	 (03CR) 10Andrea Denisse: "To be merged once 1111328, and 1111338 are  merged." [puppet] - 10https://gerrit.wikimedia.org/r/1111340 (https://phabricator.wikimedia.org/T328502) (owner: 10Andrea Denisse)
[22:31:11] <wikibugs>	 (03PS15) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:31:45] <tzatziki>	 !log removing 8 files for legal compliance
[22:31:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:33:17] <wikibugs>	 (03PS16) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:34:28] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:37:24] <wikibugs>	 (03PS17) 10CDobbins: alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843
[22:38:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: add alert for ferm_mss_cfg Prometheus metric [alerts] - 10https://gerrit.wikimedia.org/r/1110843 (owner: 10CDobbins)
[22:43:26] <wikibugs>	 (03PS1) 10TrainBranchBot: Revert "Enable SUL3 on test wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111341
[22:43:26] <wikibugs>	 (03CR) 10TrainBranchBot: "tgr@deploy2002 created a revert of this change as Ibe92907f535345a3ac1a266a4a86ede4f8d2887f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111330 (https://phabricator.wikimedia.org/T383729) (owner: 10Gergő Tisza)
[22:44:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111341 (owner: 10TrainBranchBot)
[22:44:54] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Enable SUL3 on test wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111341 (owner: 10TrainBranchBot)
[22:45:21] <logmsgbot>	 !log tgr@deploy2002 Started scap sync-world: Backport for [[gerrit:1111341|Revert "Enable SUL3 on test wikis"]]
[22:49:53] <logmsgbot>	 !log tgr@deploy2002 tgr, trainbranchbot: Backport for [[gerrit:1111341|Revert "Enable SUL3 on test wikis"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:49:59] <tzatziki>	 !log removing 5 files for legal compliance
[22:50:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:40] <logmsgbot>	 !log tgr@deploy2002 tgr, trainbranchbot: Continuing with sync
[22:53:44] <wikibugs>	 (03PS1) 10Gergő Tisza: Yet more authentication domain overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111343 (https://phabricator.wikimedia.org/T383729)
[22:58:21] <logmsgbot>	 !log tgr@deploy2002 Finished scap sync-world: Backport for [[gerrit:1111341|Revert "Enable SUL3 on test wikis"]] (duration: 12m 59s)
[23:01:00] <tgr|away>	 !log UTC late deploys done
[23:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:03:32] <wikibugs>	 (03PS1) 10Gergő Tisza: Add entry point names to all entry points under w/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111344 (https://phabricator.wikimedia.org/T383729)
[23:14:44] <tzatziki>	 !log removing 2 files for legal compliance
[23:14:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:59] <wikibugs>	 (03PS1) 10Chlod Alejandro: Increase Nuke max age to 90 days (attempt 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1111350 (https://phabricator.wikimedia.org/T380846)
[23:23:58] <wikibugs>	 (03CR) 10Gergő Tisza: "`$wgFavicon` / `$wgAppleTouchIcon` are URLs, the script fetches them and outputs the content. `wmfStaticStreamFile()` expects a disk path." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997) (owner: 10Bartosz Dziewoński)
[23:30:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[23:54:31] <tzatziki>	 !log removing 2 files for legal compliance
[23:54:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log