[00:03:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:03:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:04:24] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:08:32] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_krb5-admin-server.service on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:08:44] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1149515
[00:08:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1149515 (owner: 10TrainBranchBot)
[00:09:24] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:20:20] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:20:32] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:26:22] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:28:20] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 07 Aug 2025 09:25:51 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:29:44] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1149515 (owner: 10TrainBranchBot)
[00:31:16] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53942 bytes in 5.022 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:31:24] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.192 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:44:14] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/01c301b6a6ce2fdf1479f0bbef44ebac3b7144826f5cc0f0e1e0faf3c58880ef/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[00:46:16] <icinga-wm>	 RECOVERY - Disk space on centrallog2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog2002&var-datasource=codfw+prometheus/ops
[01:04:14] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[01:48:37] <jinxer-wm>	 FIRING: [3x] HelmReleaseBadStatus: Helm release kube-system/calico on k8s-staging@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[01:56:54] <wikibugs>	 (03PS1) 10Raymond Ndibe: toolforge:prometheus:: add components-api scrape endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1149533 (https://phabricator.wikimedia.org/T394276)
[01:59:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[02:08:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:18:32] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[03:41:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[03:57:36] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1090 to cirrussearch1090
[03:57:45] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1089.eqiad.wmnet with OS bullseye
[03:57:48] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[03:57:49] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1089
[03:57:49] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1089
[04:03:24] <logmsgbot>	 ryankemper@cumin2002 rename (PID 271136) is awaiting input
[04:03:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:04:04] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1090 to cirrussearch1090 - ryankemper@cumin2002"
[04:07:09] <logmsgbot>	 ryankemper@cumin2002 rename (PID 271136) is awaiting input
[04:07:22] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1090 to cirrussearch1090 - ryankemper@cumin2002"
[04:07:23] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[04:07:23] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1090 on all recursors
[04:07:26] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1090 on all recursors
[04:07:27] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1090
[04:08:32] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_krb5-admin-server.service on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:09:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:10:31] <logmsgbot>	 ryankemper@cumin2002 rename (PID 271136) is awaiting input
[04:10:40] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1090
[04:11:20] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1090 to cirrussearch1090
[04:16:20] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1089.eqiad.wmnet with reason: host reimage
[04:20:05] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1089.eqiad.wmnet with reason: host reimage
[04:21:13] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1090.eqiad.wmnet on all recursors
[04:21:16] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1090.eqiad.wmnet on all recursors
[04:21:45] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1090.eqiad.wmnet with OS bullseye
[04:21:49] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1090
[04:21:50] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1090
[04:35:45] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1090.eqiad.wmnet with reason: host reimage
[04:39:36] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1090.eqiad.wmnet with reason: host reimage
[04:46:11] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1089.eqiad.wmnet with OS bullseye
[05:06:32] <wikibugs>	 (03CR) 10Marostegui: "Please test again to see if it is fixed." [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[05:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:09:26] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Remove db1183 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1149536 (https://phabricator.wikimedia.org/T394507)
[05:12:07] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] instances.yaml: Remove db1183 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/1149536 (https://phabricator.wikimedia.org/T394507) (owner: 10Marostegui)
[05:13:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Remove db1183 from dbctl T394507', diff saved to https://phabricator.wikimedia.org/P76410 and previous config saved to /var/cache/conftool/dbconfig/20250523-051339-marostegui.json
[05:13:43] <stashbot>	 T394507: decommission db1183 - https://phabricator.wikimedia.org/T394507
[05:15:30] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1090.eqiad.wmnet with OS bullseye
[05:16:45] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1091 to cirrussearch1091
[05:16:57] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[05:22:34] <logmsgbot>	 ryankemper@cumin2002 rename (PID 308701) is awaiting input
[05:23:16] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1091 to cirrussearch1091 - ryankemper@cumin2002"
[05:24:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1091 to cirrussearch1091 - ryankemper@cumin2002"
[05:24:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:24:55] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1091 on all recursors
[05:24:59] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1091 on all recursors
[05:25:00] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1091
[05:25:00] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1092 to cirrussearch1092
[05:25:13] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[05:28:03] <logmsgbot>	 ryankemper@cumin2002 rename (PID 308701) is awaiting input
[05:28:15] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1091
[05:28:41] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1092 to cirrussearch1092 - ryankemper@cumin2002"
[05:28:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1091 to cirrussearch1091
[05:31:47] <logmsgbot>	 ryankemper@cumin2002 rename (PID 311165) is awaiting input
[05:32:01] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1092 to cirrussearch1092 - ryankemper@cumin2002"
[05:32:01] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:32:02] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1092 on all recursors
[05:32:05] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1092 on all recursors
[05:32:06] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1092
[05:32:42] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1092
[05:33:22] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1092 to cirrussearch1092
[05:38:18] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1092.eqiad.wmnet on all recursors
[05:38:21] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1092.eqiad.wmnet on all recursors
[05:38:24] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db1183 [puppet] - 10https://gerrit.wikimedia.org/r/1149538 (https://phabricator.wikimedia.org/T394507)
[05:38:26] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1091.eqiad.wmnet on all recursors
[05:38:29] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1091.eqiad.wmnet on all recursors
[05:38:55] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1092.eqiad.wmnet with OS bullseye
[05:39:00] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1092
[05:39:00] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1092
[05:39:01] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1091.eqiad.wmnet with OS bullseye
[05:39:05] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1091
[05:39:05] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1091
[05:39:28] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
[05:39:46] <wikibugs>	 06SRE, 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, and 2 others: Create a cookbook to automate gerrit's switchover - https://phabricator.wikimedia.org/T260666#10851232 (10ABran-WMF)
[05:39:58] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Decommission db1183 [puppet] - 10https://gerrit.wikimedia.org/r/1149538 (https://phabricator.wikimedia.org/T394507) (owner: 10Marostegui)
[05:46:14] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[05:48:37] <jinxer-wm>	 FIRING: [3x] HelmReleaseBadStatus: Helm release kube-system/calico on k8s-staging@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[05:49:25] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1183.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[05:49:43] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1183.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[05:49:43] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:49:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
[05:50:15] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1183 - https://phabricator.wikimedia.org/T394507#10851251 (10Marostegui) a:05Marostegui→03None
[05:50:23] <wikibugs>	 10ops-eqiad, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db1183 - https://phabricator.wikimedia.org/T394507#10851256 (10Marostegui) Ready for DC-Ops
[05:52:56] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1091.eqiad.wmnet with reason: host reimage
[05:55:57] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1091.eqiad.wmnet with reason: host reimage
[05:56:05] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1149488 (https://phabricator.wikimedia.org/T393723) (owner: 10Dzahn)
[05:56:24] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1092.eqiad.wmnet with reason: host reimage
[05:59:31] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1092.eqiad.wmnet with reason: host reimage
[05:59:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250523T0600)
[06:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:08:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:17:34] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Add pc8-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1149539 (https://phabricator.wikimedia.org/T394260)
[06:17:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1092.eqiad.wmnet with OS bullseye
[06:17:59] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[06:18:38] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Add pc8-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/1149539 (https://phabricator.wikimedia.org/T394260) (owner: 10Marostegui)
[06:18:41] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[06:19:25] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1094 to cirrussearch1094
[06:19:29] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[06:19:39] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[06:23:21] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1091.eqiad.wmnet with OS bullseye
[06:24:01] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1094 to cirrussearch1094 - ryankemper@cumin2002"
[06:24:21] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1093 to cirrussearch1093
[06:24:26] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1094 to cirrussearch1094 - ryankemper@cumin2002"
[06:24:26] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:24:27] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1094 on all recursors
[06:24:30] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1094 on all recursors
[06:24:31] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1094
[06:24:34] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[06:26:30] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:26:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1094
[06:27:35] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1094 to cirrussearch1094
[06:28:16] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1093 to cirrussearch1093 - ryankemper@cumin2002"
[06:29:49] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[06:30:49] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1093 to cirrussearch1093 - ryankemper@cumin2002"
[06:30:50] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:30:50] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1093 on all recursors
[06:30:53] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1093 on all recursors
[06:30:54] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1093
[06:32:07] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on relforge1010 is CRITICAL: CRITICAL - .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z), frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:32:07] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on relforge1009 is CRITICAL: CRITICAL - frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:32:07] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on relforge1008 is CRITICAL: CRITICAL - frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:33:07] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:33:43] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[06:33:58] <logmsgbot>	 ryankemper@cumin2002 rename (PID 342766) is awaiting input
[06:34:09] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1093
[06:34:17] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:34:49] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1093 to cirrussearch1093
[06:34:59] <icinga-wm>	 ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on relforge1008 is CRITICAL: CRITICAL - frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z) Ryan Kemper this can wait till tomorrow https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:34:59] <icinga-wm>	 ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on relforge1009 is CRITICAL: CRITICAL - frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z) Ryan Kemper this can wait till tomorrow https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:34:59] <icinga-wm>	 ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on relforge1010 is CRITICAL: CRITICAL - .kibana_1[0](2025-05-19T22:19:28.041Z), .kibana_1[0](2025-05-19T22:19:50.630Z), frwiki_content[0](2025-05-19T22:19:50.631Z), frwiki_content[0](2025-05-19T22:19:28.041Z) Ryan Kemper this can wait till tomorrow https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:35:24] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[06:35:57] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1093.eqiad.wmnet with OS bullseye
[06:36:01] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1093
[06:36:02] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1093
[06:36:06] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1094.eqiad.wmnet with OS bullseye
[06:36:10] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1094
[06:36:10] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1094
[06:36:21] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[06:38:16] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:47:33] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[06:48:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch krb1001 to insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1149540 (https://phabricator.wikimedia.org/T390863)
[06:48:32] <jinxer-wm>	 FIRING: [4x] HelmReleaseBadStatus: Helm release kube-system/calico on k8s-staging@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[06:48:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1149488 (https://phabricator.wikimedia.org/T393723) (owner: 10Dzahn)
[06:50:46] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1093.eqiad.wmnet with reason: host reimage
[06:51:52] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:53:06] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[06:54:09] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1093.eqiad.wmnet with reason: host reimage
[06:55:30] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1094.eqiad.wmnet with reason: host reimage
[06:56:15] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:00:01] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250523T0700)
[07:00:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch krb1001 to insetup role [puppet] - 10https://gerrit.wikimedia.org/r/1149540 (https://phabricator.wikimedia.org/T390863) (owner: 10Muehlenhoff)
[07:01:08] <logmsgbot>	 !log akosiaris@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[07:02:54] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1094.eqiad.wmnet with reason: host reimage
[07:03:32] <jinxer-wm>	 FIRING: [4x] HelmReleaseBadStatus: Helm release kube-system/calico on k8s-staging@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[07:07:45] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:10:30] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:11:07] <logmsgbot>	 !log akosiaris@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[07:11:19] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:12:21] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:14:24] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_krb5-admin-server.service on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:14:48] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:15:35] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:16:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[07:18:32] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[07:21:25] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitarium_restart
[07:21:44] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:22:40] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:23:24] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1093.eqiad.wmnet with OS bullseye
[07:23:30] <wikibugs>	 (03CR) 10Federico Ceratto: "Started another test run - the logging into Phabricator is showing the number: https://phabricator.wikimedia.org/T363665#10851328" [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[07:25:44] <wikibugs>	 (03CR) 10Marostegui: "I think it is more useful if it shows the hostname and the instance restarted. Is that doable?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[07:25:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:26:33] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
[07:26:37] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:26:42] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:26:52] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:28:32] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1094.eqiad.wmnet with OS bullseye
[07:30:55] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:32:01] <wikibugs>	 (03PS1) 10Muehlenhoff: Default the Kerberos role to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1149542 (https://phabricator.wikimedia.org/T390863)
[07:32:07] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:33:32] <wikibugs>	 (03CR) 10Federico Ceratto: "Yes - do we want a Phab update for each instance restart or one for each host?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[07:34:14] <wikibugs>	 (03CR) 10Marostegui: "This shouldn't be a common operation, so I think each host and instance is good for now. We can be verbose about this." [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[07:35:27] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sde) failed in moss-be1002 - https://phabricator.wikimedia.org/T395103 (10MatthewVernon) 03NEW
[07:35:37] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sde) failed in moss-be1002 - https://phabricator.wikimedia.org/T395103#10851369 (10MatthewVernon) p:05Triage→03High
[07:36:55] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1108 to cirrussearch1108
[07:37:07] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[07:38:50] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1095 to cirrussearch1095
[07:41:31] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1108 to cirrussearch1108 - ryankemper@cumin2002"
[07:41:49] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1108 to cirrussearch1108 - ryankemper@cumin2002"
[07:41:49] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:41:49] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1108 on all recursors
[07:41:52] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1108 on all recursors
[07:41:53] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1108
[07:42:05] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[07:42:06] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1108
[07:42:46] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1108 to cirrussearch1108
[07:42:47] <wikibugs>	 (03PS1) 10Brouberol: datahub: make nocode migration job resources configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149543
[07:44:26] <wikibugs>	 (03CR) 10Joal: [C:03+1] datahub: make nocode migration job resources configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149543 (owner: 10Brouberol)
[07:45:03] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:45:27] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1095 to cirrussearch1095 - ryankemper@cumin2002"
[07:45:32] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1095 to cirrussearch1095 - ryankemper@cumin2002"
[07:45:32] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:45:33] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1095 on all recursors
[07:45:36] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1095 on all recursors
[07:45:37] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1095
[07:45:52] <wikibugs>	 (03PS2) 10Brouberol: datahub: make nocode migration job resources configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149543
[07:46:45] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:48:29] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] datahub: make nocode migration job resources configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149543 (owner: 10Brouberol)
[07:48:32] <jinxer-wm>	 FIRING: [4x] HelmReleaseBadStatus: Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[07:48:40] <logmsgbot>	 ryankemper@cumin2002 rename (PID 380448) is awaiting input
[07:49:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix auto restart for alertmanager-irc-relay [puppet] - 10https://gerrit.wikimedia.org/r/1149544
[07:49:42] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1149544 (owner: 10Muehlenhoff)
[07:49:53] <wikibugs>	 (03CR) 10Btullis: [C:03+1] datahub: make nocode migration job resources configurable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149543 (owner: 10Brouberol)
[07:50:07] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:50:27] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:50:48] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1095
[07:51:29] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1095 to cirrussearch1095
[07:51:55] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:53:15] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:55:23] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:56:05] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[07:57:49] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1108.eqiad.wmnet with OS bullseye
[07:57:53] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1095.eqiad.wmnet with OS bullseye
[07:57:54] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1108
[07:57:55] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1108
[07:57:57] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1095
[07:57:57] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1095
[07:58:42] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[07:59:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
[08:03:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:04:44] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
[08:09:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:11:47] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1095.eqiad.wmnet with reason: host reimage
[08:12:57] <wikibugs>	 (03PS1) 10Brouberol: datahub: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149602
[08:14:47] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1108.eqiad.wmnet with reason: host reimage
[08:14:51] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851408 (10MatthewVernon) @Jclark-ctr the problem (at least on thanos-be1006 where I started) is that the disks aren't visible to the operating sys...
[08:15:12] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1095.eqiad.wmnet with reason: host reimage
[08:15:41] <wikibugs>	 (03CR) 10Btullis: [C:03+1] datahub: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149602 (owner: 10Brouberol)
[08:15:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[08:17:35] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.rename from elastic1109 to cirrussearch1109
[08:17:47] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.netbox
[08:18:32] <jinxer-wm>	 FIRING: [3x] HelmReleaseBadStatus: Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[08:19:07] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1108.eqiad.wmnet with reason: host reimage
[08:19:25] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitarium_restart
[08:20:04] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[08:21:09] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[08:21:21] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1109 to cirrussearch1109 - ryankemper@cumin2002"
[08:21:31] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] datahub: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149602 (owner: 10Brouberol)
[08:24:05] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1109 to cirrussearch1109 - ryankemper@cumin2002"
[08:24:05] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:24:05] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1109 on all recursors
[08:24:09] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1109 on all recursors
[08:24:09] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1109
[08:24:20] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1109
[08:24:24] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
[08:24:59] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1109 to cirrussearch1109
[08:25:51] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[08:27:09] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[08:27:12] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[08:29:05] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[08:29:08] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[08:31:13] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
[08:32:56] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1109.eqiad.wmnet with OS bullseye
[08:33:00] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1109
[08:33:01] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1109
[08:33:03] <wikibugs>	 (03PS1) 10Majavah: conftool-data: Add x3 wiki replica backend services [puppet] - 10https://gerrit.wikimedia.org/r/1149603 (https://phabricator.wikimedia.org/T390954)
[08:33:04] <wikibugs>	 (03PS13) 10Federico Ceratto: sanitarium_restart.py: restart Sanitarium hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665)
[08:33:04] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::cloudlb: Add x3 wiki replica backend service [puppet] - 10https://gerrit.wikimedia.org/r/1149604 (https://phabricator.wikimedia.org/T390954)
[08:33:06] <wikibugs>	 (03PS1) 10Majavah: hieradata: cloudlb: Move x3 VIP to new x3 backend [puppet] - 10https://gerrit.wikimedia.org/r/1149605 (https://phabricator.wikimedia.org/T390954)
[08:35:28] <wikibugs>	 (03PS1) 10Majavah: definitions: Add port for x3 wiki replica backend [homer/public] - 10https://gerrit.wikimedia.org/r/1149606 (https://phabricator.wikimedia.org/T390954)
[08:39:11] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1095.eqiad.wmnet with OS bullseye
[08:40:21] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
[08:42:12] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.reimage for host thanos-be1006.eqiad.wmnet with OS bullseye
[08:42:22] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851473 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1002 for host thanos-be1006.eqiad.wmnet with OS bul...
[08:42:37] <wikibugs>	 (03PS1) 10Brouberol: datahub-next: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149608 (https://phabricator.wikimedia.org/T395057)
[08:43:21] <wikibugs>	 (03CR) 10Btullis: [C:03+1] datahub-next: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149608 (https://phabricator.wikimedia.org/T395057) (owner: 10Brouberol)
[08:44:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] datahub-next: increase resources accross the board [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149608 (https://phabricator.wikimedia.org/T395057) (owner: 10Brouberol)
[08:44:56] <wikibugs>	 (03PS1) 10JMeybohm: Update admin_ng fixtures to reflect puppet changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149609 (https://phabricator.wikimedia.org/T378429)
[08:49:07] <wikibugs>	 (03CR) 10Federico Ceratto: "Updated and did another run." [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[08:49:16] <wikibugs>	 (03PS1) 10FNegri: Revert "Failover all dumps traffic to clouddumps1002" [puppet] - 10https://gerrit.wikimedia.org/r/1149610
[08:50:11] <wikibugs>	 (03CR) 10Marostegui: "The last !log now shows: https://phabricator.wikimedia.org/T363665#10851443" [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[08:51:43] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1108.eqiad.wmnet with OS bullseye
[08:52:41] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] deployment_server: Call into the mwscript helper from mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1148490 (https://phabricator.wikimedia.org/T378479) (owner: 10RLazarus)
[08:53:33] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1109.eqiad.wmnet with reason: host reimage
[08:56:03] <wikibugs>	 (03CR) 10Majavah: [C:03+1] Revert "Failover all dumps traffic to clouddumps1002" [puppet] - 10https://gerrit.wikimedia.org/r/1149610 (owner: 10FNegri)
[08:56:17] <wikibugs>	 (03CR) 10FNegri: [C:03+2] Revert "Failover all dumps traffic to clouddumps1002" [puppet] - 10https://gerrit.wikimedia.org/r/1149610 (owner: 10FNegri)
[08:57:36] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1109.eqiad.wmnet with reason: host reimage
[09:01:51] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Update admin_ng fixtures to reflect puppet changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149609 (https://phabricator.wikimedia.org/T378429) (owner: 10JMeybohm)
[09:02:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (FY2024/2025-Q3-Q4): Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10851565 (10fnegri) I reverted my [change from last month](https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131051) and moved bac...
[09:05:14] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
[09:08:22] <wikibugs>	 (03Merged) 10jenkins-bot: Update admin_ng fixtures to reflect puppet changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149609 (https://phabricator.wikimedia.org/T378429) (owner: 10JMeybohm)
[09:10:12] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] Fix auto restart for alertmanager-irc-relay [puppet] - 10https://gerrit.wikimedia.org/r/1149544 (owner: 10Muehlenhoff)
[09:10:24] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
[09:10:33] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
[09:14:20] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
[09:18:50] <wikibugs>	 (03PS1) 10Marostegui: es2035: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1149612 (https://phabricator.wikimedia.org/T394469)
[09:18:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2035 T394469', diff saved to https://phabricator.wikimedia.org/P76411 and previous config saved to /var/cache/conftool/dbconfig/20250523-091853-marostegui.json
[09:18:58] <stashbot>	 T394469: Migrate es6 to MariaDB 10.11 - https://phabricator.wikimedia.org/T394469
[09:19:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
[09:19:23] <wikibugs>	 (03PS1) 10Clément Goubert: mediawiki: Add netpol for prometheus HTTP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538)
[09:19:27] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2035.codfw.wmnet with reason: Maintenance
[09:20:37] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mediawiki: Add netpol for prometheus HTTP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:21:10] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] es2035: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1149612 (https://phabricator.wikimedia.org/T394469) (owner: 10Marostegui)
[09:23:09] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1109.eqiad.wmnet with OS bullseye
[09:25:17] <wikibugs>	 (03PS2) 10Clément Goubert: mediawiki: Add netpol for prometheus HTTP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538)
[09:27:14] <wikibugs>	 10ops-magru, 06DC-Ops, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): missing pdu infos for magru - https://phabricator.wikimedia.org/T387231#10851628 (10tappof) Sure @RobH, nothing will catch fire because of this patch (or maybe it will, since we’re talking about elec...
[09:27:20] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:27:44] <wikibugs>	 (03PS1) 10Clément Goubert: mw::maintenance::cirrussearch: Skip s8 [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538)
[09:27:51] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:28:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
[09:28:31] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:28:39] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:28:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mw::maintenance::cirrussearch: Skip s8 [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:30:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76412 and previous config saved to /var/cache/conftool/dbconfig/20250523-093015-root.json
[09:31:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
[09:31:42] <wikibugs>	 (03CR) 10Hnowlan: mediawiki: Add netpol for prometheus HTTP (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:32:02] <wikibugs>	 (03PS1) 10Brouberol: airflow: grant analytics airflow instances access to schema.deployment.wmnet [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149617 (https://phabricator.wikimedia.org/T392668)
[09:32:16] <wikibugs>	 (03CR) 10Clément Goubert: mediawiki: Add netpol for prometheus HTTP (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:34:21] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:34:40] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:34:58] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for lsobanski - https://phabricator.wikimedia.org/T395110 (10LSobanski) 03NEW
[09:36:14] <wikibugs>	 (03PS2) 10Brouberol: airflow: grant airflow instances access to schema.deployment.wmnet [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149617 (https://phabricator.wikimedia.org/T392668)
[09:36:28] <wikibugs>	 (03CR) 10Joal: [C:03+1] "Thank you :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149617 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[09:37:16] <wikibugs>	 (03PS3) 10Clément Goubert: mediawiki: Add netpol for prometheus HTTP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538)
[09:40:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
[09:40:52] <wikibugs>	 (03PS1) 10MVernon: Set new-style storage for new thanos backends [puppet] - 10https://gerrit.wikimedia.org/r/1149619 (https://phabricator.wikimedia.org/T392908)
[09:41:35] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mediawiki: Add netpol for prometheus HTTP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:42:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: grant airflow instances access to schema.deployment.wmnet [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149617 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[09:44:57] <wikibugs>	 (03Abandoned) 10Aqu: airflow-analytics-test: Temporarily Disable DataHub plugin [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149456 (owner: 10Aqu)
[09:45:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76413 and previous config saved to /var/cache/conftool/dbconfig/20250523-094520-root.json
[09:47:58] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mediawiki: Add netpol for prometheus HTTP (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149613 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:50:12] <logmsgbot>	 !log cgoubert@deploy1003 Started scap sync-world: 1149613: mediawiki: Add netpol for prometheus HTTP - T388538
[09:50:16] <stashbot>	 T388538: Migrate discovery-search jobs to mw-cron - https://phabricator.wikimedia.org/T388538
[09:52:16] <logmsgbot>	 !log cgoubert@deploy1003 Finished scap sync-world: 1149613: mediawiki: Add netpol for prometheus HTTP - T388538 (duration: 03m 11s)
[09:52:25] <logmsgbot>	 !log isaranto@deploy1003 helmfile [staging] START helmfile.d/services/api-gateway: sync
[09:52:38] <logmsgbot>	 !log isaranto@deploy1003 helmfile [staging] DONE helmfile.d/services/api-gateway: sync
[09:55:57] <wikibugs>	 (03CR) 10Clément Goubert: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:57:20] <wikibugs>	 (03PS1) 10Clément Goubert: Revert^2 "mw::maintenance: Migrate wikidata-updateQueryServiceLag to mw-cron" [puppet] - 10https://gerrit.wikimedia.org/r/1149623 (https://phabricator.wikimedia.org/T388538)
[09:58:06] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:58:32] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] Revert^2 "mw::maintenance: Migrate wikidata-updateQueryServiceLag to mw-cron" [puppet] - 10https://gerrit.wikimedia.org/r/1149623 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[09:58:35] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[09:58:37] <wikibugs>	 (03PS2) 10Clément Goubert: mw::maintenance::cirrussearch: Skip s8 [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538)
[09:58:49] <wikibugs>	 (03PS1) 10Hnowlan: trafficserver: restbaseless reading lists API for ~group0 [puppet] - 10https://gerrit.wikimedia.org/r/1149624 (https://phabricator.wikimedia.org/T384891)
[09:58:52] <wikibugs>	 (03PS1) 10Hnowlan: trafficserver: restbaseless reading lists API for all wikis [puppet] - 10https://gerrit.wikimedia.org/r/1149625 (https://phabricator.wikimedia.org/T384891)
[09:59:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[10:00:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76414 and previous config saved to /var/cache/conftool/dbconfig/20250523-100026-root.json
[10:00:40] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] Revert^2 "mw::maintenance: Migrate wikidata-updateQueryServiceLag to mw-cron" [puppet] - 10https://gerrit.wikimedia.org/r/1149623 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[10:00:51] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+1] "I've done a quick review basic syntax checking." [puppet] - 10https://gerrit.wikimedia.org/r/1149619 (https://phabricator.wikimedia.org/T392908) (owner: 10MVernon)
[10:01:34] <logmsgbot>	 !log isaranto@deploy1003 helmfile [codfw] START helmfile.d/services/api-gateway: apply
[10:01:45] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:01:56] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:02:38] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:02:49] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:02:57] <logmsgbot>	 !log isaranto@deploy1003 helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
[10:03:13] <wikibugs>	 (03CR) 10MVernon: [C:03+2] Set new-style storage for new thanos backends [puppet] - 10https://gerrit.wikimedia.org/r/1149619 (https://phabricator.wikimedia.org/T392908) (owner: 10MVernon)
[10:03:50] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:04:06] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:04:48] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:05:25] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:05:32] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:06:58] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:07:01] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:07:03] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[10:07:05] <wikibugs>	 (03CR) 10DCausse: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[10:07:51] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-cron: apply
[10:08:30] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
[10:08:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:08:43] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] START helmfile.d/services/mobileapps: sync
[10:08:51] <logmsgbot>	 !log jgiannelos@deploy1003 helmfile [staging] DONE helmfile.d/services/mobileapps: sync
[10:09:24] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[10:10:24] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[10:11:04] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mw::maintenance::cirrussearch: Skip s8 [puppet] - 10https://gerrit.wikimedia.org/r/1149614 (https://phabricator.wikimedia.org/T388538) (owner: 10Clément Goubert)
[10:12:03] <wikibugs>	 (03PS2) 10Hnowlan: trafficserver: restbaseless reading lists API for ~group1 [puppet] - 10https://gerrit.wikimedia.org/r/1149624 (https://phabricator.wikimedia.org/T384891)
[10:12:31] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] mw::maintenance: migrate remaining translation notifications jobs [puppet] - 10https://gerrit.wikimedia.org/r/1149426 (https://phabricator.wikimedia.org/T388539) (owner: 10Hnowlan)
[10:14:44] <logmsgbot>	 !log isaranto@deploy1003 helmfile [eqiad] START helmfile.d/services/api-gateway: apply
[10:15:06] <logmsgbot>	 !log isaranto@deploy1003 helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
[10:15:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76415 and previous config saved to /var/cache/conftool/dbconfig/20250523-101532-root.json
[10:15:58] <wikibugs>	 (03PS1) 10Brouberol: airflow: deploy a tiny toolbox allowing users to test task networking [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149628 (https://phabricator.wikimedia.org/T392668)
[10:16:00] <claime>	 hnowlan: my puppet run on deploy pulled your lpl cronjob changes
[10:16:09] <claime>	 so i'll deploy them with the cirrus one
[10:16:20] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-cron: apply
[10:16:57] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
[10:18:07] <jinxer-wm>	 FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[10:18:19] <hnowlan>	 claime: thanks 
[10:18:51] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[10:18:56] <claime>	 hnowlan: deployed
[10:19:23] <wikibugs>	 (03PS2) 10Brouberol: airflow: deploy a tiny toolbox allowing users to test task networking [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149628 (https://phabricator.wikimedia.org/T392668)
[10:21:06] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] admin: add jdlrobson to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/1149488 (https://phabricator.wikimedia.org/T393723) (owner: 10Dzahn)
[10:21:27] <hnowlan>	 cool
[10:21:56] <logmsgbot>	 mvernon@cumin1002 reimage (PID 1162774) is awaiting input
[10:24:22] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: deploy a tiny toolbox allowing users to test task networking [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149628 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:24:44] <wikibugs>	 (03PS1) 10Clément Goubert: mw::maintenance::purge_securepoll: Only run on securepollglobal.dblist [puppet] - 10https://gerrit.wikimedia.org/r/1149629 (https://phabricator.wikimedia.org/T388542)
[10:25:23] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: deploy a tiny toolbox allowing users to test task networking [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149628 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:27:37] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - ml-staging-ctrl_6443: Servers ml-staging-ctrl2002.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[10:27:55] <wikibugs>	 (03PS1) 10Majavah: P:openstack: Move away from manual @resolve syntax [puppet] - 10https://gerrit.wikimedia.org/r/1149630
[10:27:55] <wikibugs>	 (03PS1) 10Majavah: P:openstack: Pass ports as numbers to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1149631
[10:28:35] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[10:29:46] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: edit-check latest image deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149632 (https://phabricator.wikimedia.org/T394779)
[10:29:56] <wikibugs>	 (03PS1) 10Effie Mouzeli: WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633
[10:30:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76417 and previous config saved to /var/cache/conftool/dbconfig/20250523-103038-root.json
[10:31:02] <moritzm>	 !log importing ferm 2.5.1-4+wmf13u1 T391083
[10:31:03] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: httpbb(liftwing): add edit-check tests [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779)
[10:31:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633 (owner: 10Effie Mouzeli)
[10:31:08] <stashbot>	 T391083: Prepare our custom installer and the base layer for Trixie - https://phabricator.wikimedia.org/T391083
[10:31:42] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5670/co" [puppet] - 10https://gerrit.wikimedia.org/r/1149631 (owner: 10Majavah)
[10:32:22] <wikibugs>	 (03PS2) 10Effie Mouzeli: WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633
[10:33:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1149631 (owner: 10Majavah)
[10:33:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633 (owner: 10Effie Mouzeli)
[10:33:56] <wikibugs>	 (03CR) 10JMeybohm: "Nice one!" [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052) (owner: 10Scott French)
[10:35:04] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] "LGTM! Thank you" [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[10:35:04] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5671/co" [puppet] - 10https://gerrit.wikimedia.org/r/1149630 (owner: 10Majavah)
[10:35:36] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[10:35:36] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1006.eqiad.wmnet with OS bullseye
[10:35:48] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] mw::maintenance::purge_securepoll: Only run on securepollglobal.dblist [puppet] - 10https://gerrit.wikimedia.org/r/1149629 (https://phabricator.wikimedia.org/T388542) (owner: 10Clément Goubert)
[10:35:49] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851798 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1002 for host thanos-be1006.eqiad.wmnet with OS bullsey...
[10:36:10] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mw::maintenance::purge_securepoll: Only run on securepollglobal.dblist [puppet] - 10https://gerrit.wikimedia.org/r/1149629 (https://phabricator.wikimedia.org/T388542) (owner: 10Clément Goubert)
[10:36:26] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.reimage for host thanos-be1007.eqiad.wmnet with OS bullseye
[10:36:35] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851802 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1002 for host thanos-be1007.eqiad.wmnet with OS bul...
[10:37:57] <wikibugs>	 (03PS3) 10Effie Mouzeli: WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633
[10:39:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: adding mcrouter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149633 (owner: 10Effie Mouzeli)
[10:42:02] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] START helmfile.d/services/mw-cron: apply
[10:42:24] <logmsgbot>	 !log cgoubert@deploy1003 helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
[10:42:27] <wikibugs>	 (03PS6) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984)
[10:45:37] <claime>	 !log Manual run of purge-securepollvotedata - T388542
[10:45:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:42] <stashbot>	 T388542: Migrate trust_and_safety_product_team jobs to mw-cron - https://phabricator.wikimedia.org/T388542
[10:45:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76418 and previous config saved to /var/cache/conftool/dbconfig/20250523-104543-root.json
[10:46:40] <wikibugs>	 (03PS1) 10Brouberol: Enable talking to schema.discovery.wmnet via the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149638 (https://phabricator.wikimedia.org/T392668)
[10:47:08] <wikibugs>	 (03CR) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[10:47:19] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "@tklausmann@wikimedia.org could you review and merge please? I don't have +2 access." [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[10:47:59] <wikibugs>	 (03PS7) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984)
[10:48:10] <wikibugs>	 (03PS1) 10Brouberol: airflow: disable hardcoded networkpolicy in favor of the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149639 (https://phabricator.wikimedia.org/T392668)
[10:50:16] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.11 point update - https://phabricator.wikimedia.org/T394489#10851841 (10MoritzMuehlenhoff)
[10:51:05] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: ml-services: edit-check latest image deployment (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149632 (https://phabricator.wikimedia.org/T394779) (owner: 10Gkyziridis)
[10:51:59] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] httpbb(liftwing): add edit-check tests (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[10:52:27] <wikibugs>	 (03CR) 10Btullis: [C:03+1] Enable talking to schema.discovery.wmnet via the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149638 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:52:41] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: disable hardcoded networkpolicy in favor of the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149639 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:53:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1149630 (owner: 10Majavah)
[10:53:59] <wikibugs>	 (03PS8) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984)
[10:55:09] <wikibugs>	 (03CR) 10Joal: [C:03+1] Enable talking to schema.discovery.wmnet via the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149638 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:55:35] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
[10:55:36] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
[10:56:08] <wikibugs>	 (03PS2) 10Gkyziridis: ml-services: edit-check latest image deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149632 (https://phabricator.wikimedia.org/T394779)
[10:56:19] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: httpbb(liftwing): add edit-check tests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[10:56:54] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2004.codfw.wmnet
[10:57:20] <wikibugs>	 (03CR) 10Joal: [C:03+1] airflow: disable hardcoded networkpolicy in favor of the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149639 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[10:57:40] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2004.codfw.wmnet
[10:58:00] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:openstack: Move away from manual @resolve syntax [puppet] - 10https://gerrit.wikimedia.org/r/1149630 (owner: 10Majavah)
[10:59:46] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
[10:59:47] <wikibugs>	 (03PS2) 10Majavah: P:openstack: Pass ports as numbers to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1149631
[11:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250523T0700)
[11:00:05] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[11:00:05] <jouncebot>	 jelto, arnoldokoth, and mutante: May I have your attention please! GitLab version upgrades. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250523T1100)
[11:00:25] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[11:01:15] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.reimage for host thanos-be1008.eqiad.wmnet with OS bullseye
[11:01:28] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1002 for host thanos-be1008.eqiad.wmnet with OS bul...
[11:03:24] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+1] httpbb(liftwing): add edit-check tests (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[11:03:48] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
[11:03:51] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
[11:03:53] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
[11:04:32] <wikibugs>	 (03CR) 10Majavah: [C:03+2] P:openstack: Pass ports as numbers to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1149631 (owner: 10Majavah)
[11:05:48] <wikibugs>	 (03PS9) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984)
[11:06:31] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2004.codfw.wmnet
[11:06:38] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: httpbb(liftwing): add edit-check tests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[11:06:47] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2004.codfw.wmnet
[11:07:14] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
[11:07:17] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
[11:10:41] <wikibugs>	 (03PS10) 10JMeybohm: k8s.pool-depool-node: Add support to downtime/remove downtime [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984)
[11:10:56] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
[11:17:41] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149632 (https://phabricator.wikimedia.org/T394779) (owner: 10Gkyziridis)
[11:18:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:18:32] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[11:23:11] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
[11:26:21] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
[11:37:17] <wikibugs>	 (03PS1) 10Vgutierrez: systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001)
[11:37:19] <wikibugs>	 (03PS1) 10Vgutierrez: systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001)
[11:39:22] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5672/console" [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[11:41:08] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5673/console" [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[11:47:37] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.wikireplicas.update-views
[11:49:35] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[11:51:23] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[11:51:24] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1007.eqiad.wmnet with OS bullseye
[11:51:37] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1002 for host thanos-be1007.eqiad.wmnet with OS bullsey...
[11:51:42] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
[11:54:08] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.reimage for host thanos-be1009.eqiad.wmnet with OS bullseye
[11:54:18] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851939 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin1002 for host thanos-be1009.eqiad.wmnet with OS bul...
[11:54:53] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[11:55:12] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[11:55:12] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1008.eqiad.wmnet with OS bullseye
[11:55:28] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10851943 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1002 for host thanos-be1008.eqiad.wmnet with OS bullsey...
[11:56:06] <wikibugs>	 (03PS2) 10Vgutierrez: systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001)
[11:58:38] <wikibugs>	 (03PS1) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[12:01:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[12:03:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:04:07] <wikibugs>	 (03PS2) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[12:05:45] <wikibugs>	 (03PS3) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[12:08:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[12:09:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:18:32] <jinxer-wm>	 FIRING: [2x] HelmReleaseBadStatus: Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=eventgate-analytics - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[12:20:03] <wikibugs>	 (03PS4) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[12:21:42] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
[12:21:57] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[12:25:32] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
[12:29:12] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: edit-check latest image deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149632 (https://phabricator.wikimedia.org/T394779) (owner: 10Gkyziridis)
[12:32:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Upgrade an-worker hard drives from 4TB to 8TB (group 4 - rack F3) - https://phabricator.wikimedia.org/T390171#10852117 (10Jclark-ctr) sorry for spam forgot to update ticket number on running reimage on another host
[12:33:24] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
[12:35:40] <wikibugs>	 (03PS2) 10AOkoth: doc: swap doc1003 with doc1004 [puppet] - 10https://gerrit.wikimedia.org/r/1149469
[12:36:44] <wikibugs>	 (03CR) 10AOkoth: "Ack. I think I did it in my head. 😕" [puppet] - 10https://gerrit.wikimedia.org/r/1149469 (owner: 10AOkoth)
[12:37:12] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
[12:37:32] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10852125 (10Jclark-ctr) @MatthewVernon  Thank you for your assistance. In most cases, I am able to image a server and have it successfully pass Pupp...
[12:37:34] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
[12:39:17] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Enable talking to schema.discovery.wmnet via the service mesh [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149638 (https://phabricator.wikimedia.org/T392668) (owner: 10Brouberol)
[12:40:20] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[12:40:59] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[12:44:57] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[12:45:37] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
[12:45:39] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1009.eqiad.wmnet with OS bullseye
[12:45:48] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10852139 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin1002 for host thanos-be1009.eqiad.wmnet with OS bullsey...
[12:50:41] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10852153 (10MatthewVernon) >>! In T392909#10852125, @Jclark-ctr wrote: > @MatthewVernon  Thank you for your assistance. In most cases, I am able to...
[12:52:00] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, and 2 others: Q4:rack/setup/install thanos-be100[6-9] - https://phabricator.wikimedia.org/T392909#10852155 (10Jclark-ctr) 05Open→03Resolved Thanks for your assistance
[12:53:03] <wikibugs>	 (03CR) 10Hashar: "I was challenging the need to support an alternate repo since apparently the sole usage could switch from `peer` to `origin` which would t" [puppet] - 10https://gerrit.wikimedia.org/r/1148267 (owner: 10Volans)
[12:53:40] <wikibugs>	 (03CR) 10Hashar: [C:03+1] git::clone: set given remote name on initial cloning [puppet] - 10https://gerrit.wikimedia.org/r/1148267 (owner: 10Volans)
[12:53:52] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for lsobanski - https://phabricator.wikimedia.org/T395110#10852165 (10ABran-WMF) 05Open→03Resolved a:03ABran-WMF {T395094} and {T395110} done
[12:54:04] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10852171 (10ABran-WMF) 05Open→03Resolved a:03ABran-WMF {T395094} and {T395110} done
[12:54:25] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.update-views
[12:57:26] <logmsgbot>	 fnegri@cumin1002 update-views (PID 1213905) is awaiting input
[13:02:23] <wikibugs>	 (03PS14) 10Federico Ceratto: sanitarium_restart.py: restart Sanitarium hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665)
[13:02:24] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.sanitarium_restart
[13:04:04] <wikibugs>	 (03CR) 10Federico Ceratto: "Updated as described" [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[13:04:32] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure, 10LDAP-Access-Requests, 07Jenkins: Grant Jenkins admin rights to Peter Hedenskog (QTE) - https://phabricator.wikimedia.org/T394749#10852258 (10hashar) >>! In T394749#10850867, @Dzahn wrote: > @hashar So it requires 2 things, membe...
[13:04:33] <wikibugs>	 06SRE, 10SRE-swift-storage: Q4 Thanos hardware refresh - https://phabricator.wikimedia.org/T391352#10852259 (10MatthewVernon)
[13:07:17] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
[13:08:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10852293 (10Gehel)
[13:08:41] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q3: an-worker data volumes HDD upgrade tracking task - https://phabricator.wikimedia.org/T385485#10852317 (10Gehel)
[13:09:09] <wikibugs>	 10ops-eqiad, 06SRE, 10Data-Platform, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install kafka-jumbo10[16-18] - https://phabricator.wikimedia.org/T377874#10852329 (10Gehel)
[13:09:19] <wikibugs>	 07sre-alert-triage, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Alert in need of triage: MegaRAID (instance an-worker1135) - https://phabricator.wikimedia.org/T394632#10852335 (10Gehel)
[13:09:30] <wikibugs>	 07sre-alert-triage, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Alert in need of triage: PuppetFailure (instance an-worker1068:9100) - https://phabricator.wikimedia.org/T392554#10852341 (10Gehel)
[13:09:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 4 - rack F3) - https://phabricator.wikimedia.org/T390171#10852345 (10Gehel)
[13:11:20] <wikibugs>	 07Puppet, 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery-Search, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Puppet failing on deployment-cirrussearch{12,13,14}.deployment-prep.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T393924#10852377 (10Gehel)
[13:11:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): SSD firmware update for cirrussearch211[0-5] - https://phabricator.wikimedia.org/T394432#10852381 (10Gehel)
[13:12:08] <wikibugs>	 06SRE, 06Data-Engineering, 06Data-Engineering-Radar, 06Infrastructure-Foundations, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Rebuild Spark images with Bookworm / bullseye-backports deprecation - https://phabricator.wikimedia.org/T390139#10852395 (10Gehel)
[13:12:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 5 - rack F1) - https://phabricator.wikimedia.org/T390172#10852399 (10Gehel)
[13:12:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 6 - rack E7) - https://phabricator.wikimedia.org/T390173#10852401 (10Gehel)
[13:12:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 10 - multiple racks - singletons) - https://phabricator.wikimedia.org/T390178#10852405 (10Gehel)
[13:12:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 7 - rack E6) - https://phabricator.wikimedia.org/T390174#10852403 (10Gehel)
[13:12:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 8 - rack E5) - https://phabricator.wikimedia.org/T390175#10852407 (10Gehel)
[13:13:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Upgrade an-worker hard drives from 4TB to 8TB (group 9 - rack E3) - https://phabricator.wikimedia.org/T390176#10852409 (10Gehel)
[13:13:11] <wikibugs>	 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Bring relforge100[89] into production - https://phabricator.wikimedia.org/T389957#10852412 (10Gehel)
[13:13:57] <wikibugs>	 07sre-alert-triage, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Alert in need of triage: Dell PowerEdge RAID Controller (instance an-presto1016) - https://phabricator.wikimedia.org/T382714#10852430 (10Gehel)
[13:15:33] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] sanitarium_restart.py: restart Sanitarium hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[13:16:25] <logmsgbot>	 fnegri@cumin1002 update-views (PID 1213905) is awaiting input
[13:18:44] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deploy for KCVelaga - https://phabricator.wikimedia.org/T395125 (10KCVelaga_WMF) 03NEW
[13:18:46] <tgr>	 deploying a security patch
[13:21:21] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE: Requesting access to deploy for KCVelaga - https://phabricator.wikimedia.org/T395125#10852497 (10KCVelaga_WMF) @Ahoelzl this might need your approval as well. Please see this [[ https://wikimedia.slack.com/archives/CSV483812/p1747735432060449 | Slack discuss...
[13:26:45] <wikibugs>	 (03PS1) 10Slyngshede: CAS: 7.2.2 [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1149665
[13:26:48] <wikibugs>	 (03PS1) 10Brouberol: admin/data: add kcvelaga to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/1149666 (https://phabricator.wikimedia.org/T393998)
[13:33:50] <wikibugs>	 (03PS2) 10Brouberol: admin/data: add kcvelaga to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/1149666 (https://phabricator.wikimedia.org/T393998)
[13:36:51] <wikibugs>	 (03CR) 10Fabfur: varnish: Deploy edge uniques experiment fetcher (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:38:51] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+2] sanitarium_restart.py: restart Sanitarium hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[13:38:56] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:39:46] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:40:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] "Overall LGTM, minus the comment by Janis." [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052) (owner: 10Scott French)
[13:42:15] <wikibugs>	 (03CR) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:45:40] <wikibugs>	 (03Merged) 10jenkins-bot: sanitarium_restart.py: restart Sanitarium hosts [cookbooks] - 10https://gerrit.wikimedia.org/r/1131954 (https://phabricator.wikimedia.org/T363665) (owner: 10Federico Ceratto)
[13:45:54] <tgr>	 !log deployed private mitigation for T395073
[13:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:08] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.update-views
[13:48:07] <jinxer-wm>	 RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins
[13:48:26] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:49:56] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:50:08] <wikibugs>	 (03CR) 10Fabfur: varnish: Deploy edge uniques experiment fetcher (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:51:32] <wikibugs>	 (03PS8) 10Tiziano Fogli: pdb_resource_exporter: add puppetdb resource exporter to puppedb [puppet] - 10https://gerrit.wikimedia.org/r/1143600
[13:54:13] <wikibugs>	 (03CR) 10Tiziano Fogli: "Functionality was tested successfully on the Pontoon environment." [puppet] - 10https://gerrit.wikimedia.org/r/1143600 (owner: 10Tiziano Fogli)
[13:54:26] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q4:rack/setup/install ms-be109[2-5] - https://phabricator.wikimedia.org/T393104#10852603 (10MatthewVernon) @Jclark-ctr @wiki_willy any idea when that might be or if there's anywhere else these servers could go? Once they're installed...
[13:55:39] <wikibugs>	 (03PS1) 10Brouberol: airflow: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149673
[13:55:39] <wikibugs>	 (03PS1) 10Brouberol: blunderbuss: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149674
[13:55:39] <wikibugs>	 (03PS1) 10Brouberol: spark-history: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149675
[13:55:40] <wikibugs>	 (03PS1) 10Brouberol: superset: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149676
[13:55:57] <wikibugs>	 (03CR) 10Fabfur: varnish: Deploy edge uniques experiment fetcher (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[13:56:30] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:56:52] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[13:59:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[14:00:10] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149673 (owner: 10Brouberol)
[14:00:20] <wikibugs>	 (03CR) 10Btullis: [C:03+1] blunderbuss: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149674 (owner: 10Brouberol)
[14:00:36] <wikibugs>	 (03CR) 10Btullis: [C:03+1] spark-history: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149675 (owner: 10Brouberol)
[14:00:46] <wikibugs>	 (03CR) 10Btullis: [C:03+1] superset: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149676 (owner: 10Brouberol)
[14:00:59] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149673 (owner: 10Brouberol)
[14:01:02] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] blunderbuss: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149674 (owner: 10Brouberol)
[14:01:05] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] spark-history: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149675 (owner: 10Brouberol)
[14:01:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] superset: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149676 (owner: 10Brouberol)
[14:03:02] <wikibugs>	 (03Merged) 10jenkins-bot: airflow: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149673 (owner: 10Brouberol)
[14:03:11] <wikibugs>	 (03Merged) 10jenkins-bot: blunderbuss: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149674 (owner: 10Brouberol)
[14:03:12] <wikibugs>	 (03Merged) 10jenkins-bot: spark-history: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149675 (owner: 10Brouberol)
[14:03:27] <wikibugs>	 (03Merged) 10jenkins-bot: superset: update kadmin server [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149676 (owner: 10Brouberol)
[14:03:28] <wikibugs>	 (03CR) 10Ssingh: "Looks good except for the comments. I also don't have much of an opinion on the fetcher.py file right now -- I think once you finalize it " [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[14:05:03] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[14:05:04] <logmsgbot>	 fnegri@cumin1002 update-views (PID 1268512) is awaiting input
[14:05:55] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[14:06:44] <wikibugs>	 (03PS5) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[14:06:55] <wikibugs>	 (03CR) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[14:07:26] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
[14:07:38] <wikibugs>	 (03PS6) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[14:07:53] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
[14:08:32] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:09:45] <logmsgbot>	 !log bking@cumin2002 conftool action : set/pooled=no; selector: name=elastic1096.eqiad.wmnet|elastic1097.eqiad.wmnet|elastic1098.eqiad.wmnet|elastic1099.eqiad.wmnet|elastic1100.eqiad.wmnet|elastic1101.eqiad.wmnet|elastic1102.eqiad.wmnet|elastic1107.eqiad.wmnet|elastic1110.eqiad.wmnet
[14:10:02] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
[14:10:28] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
[14:11:11] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
[14:12:08] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
[14:13:26] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
[14:14:03] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
[14:14:31] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[14:15:10] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[14:15:19] <wikibugs>	 (03PS9) 10Tiziano Fogli: pdb_resource_exporter: add puppetdb resource exporter to puppedb [puppet] - 10https://gerrit.wikimedia.org/r/1143600
[14:16:13] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[14:16:46] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
[14:17:50] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[14:18:27] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
[14:18:57] <wikibugs>	 06SRE, 06cloud-services-team, 06DC-Ops: Supporting new hardware in older debian releases - https://phabricator.wikimedia.org/T301162#10852638 (10taavi) 05Open→03Resolved I don't see any specific problems here that need addressing so closing in order to get this to stop lingering on our workboard.
[14:19:25] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
[14:19:38] <logmsgbot>	 !log fnegri@cumin1002 END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
[14:20:03] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
[14:22:34] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
[14:23:07] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
[14:23:22] <wikibugs>	 10ops-magru, 06DC-Ops, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): missing pdu infos for magru - https://phabricator.wikimedia.org/T387231#10852644 (10RobH) >>! In T387231#10851628, @tappof wrote: > Sure @RobH, nothing will catch fire because of this patch (or maybe...
[14:24:10] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
[14:24:40] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
[14:25:34] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
[14:26:09] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
[14:26:41] <wikibugs>	 (03PS1) 10Fabfur: external_cloud_vendors: temporary commented Azure fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149681 (https://phabricator.wikimedia.org/T395127)
[14:27:46] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
[14:28:19] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
[14:29:54] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
[14:30:22] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
[14:30:36] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[14:31:53] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] external_cloud_vendors: temporary commented Azure fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149681 (https://phabricator.wikimedia.org/T395127) (owner: 10Fabfur)
[14:32:27] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[14:32:43] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[14:35:00] <wikibugs>	 (03PS1) 10Andrea Denisse: alerts: Change team receiving alerts [alerts] - 10https://gerrit.wikimedia.org/r/1149682 (https://phabricator.wikimedia.org/T395117)
[14:35:11] <wikibugs>	 (03CR) 10Andrea Denisse: [V:03+2 C:03+2] alerts: Change team receiving alerts [alerts] - 10https://gerrit.wikimedia.org/r/1149682 (https://phabricator.wikimedia.org/T395117) (owner: 10Andrea Denisse)
[14:35:24] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] external_cloud_vendors: temporary commented Azure fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149681 (https://phabricator.wikimedia.org/T395127) (owner: 10Fabfur)
[14:44:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:45:22] <wikibugs>	 (03CR) 10Klausman: [C:03+1] profile::prometheus::k8s: drop terminated pod targets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052) (owner: 10Scott French)
[14:46:02] <wikibugs>	 (03CR) 10Klausman: [C:03+1] httpbb(liftwing): add edit-check tests [puppet] - 10https://gerrit.wikimedia.org/r/1149634 (https://phabricator.wikimedia.org/T394779) (owner: 10Ilias Sarantopoulos)
[14:58:25] <wikibugs>	 (03PS1) 10Bking: cirrussearch: add cirrussearch row E/remove elastic row F [puppet] - 10https://gerrit.wikimedia.org/r/1149687 (https://phabricator.wikimedia.org/T388610)
[14:59:38] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1149687 (https://phabricator.wikimedia.org/T388610) (owner: 10Bking)
[15:01:44] <wikibugs>	 (03PS2) 10Scott French: Profile::Mediawiki_deployment: add 'clusters' field [puppet] - 10https://gerrit.wikimedia.org/r/1148480 (https://phabricator.wikimedia.org/T388761)
[15:02:00] <wikibugs>	 (03CR) 10Scott French: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1148480 (https://phabricator.wikimedia.org/T388761) (owner: 10Scott French)
[15:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:07:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Record LDAP access for ericmill [puppet] - 10https://gerrit.wikimedia.org/r/1149688
[15:08:46] <wikibugs>	 (03CR) 10Btullis: [C:03+1] cirrussearch: add cirrussearch row E/remove elastic row F [puppet] - 10https://gerrit.wikimedia.org/r/1149687 (https://phabricator.wikimedia.org/T388610) (owner: 10Bking)
[15:09:48] <wikibugs>	 (03CR) 10Bking: [C:03+2] cirrussearch: add cirrussearch row E/remove elastic row F [puppet] - 10https://gerrit.wikimedia.org/r/1149687 (https://phabricator.wikimedia.org/T388610) (owner: 10Bking)
[15:09:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Record LDAP access for ericmill [puppet] - 10https://gerrit.wikimedia.org/r/1149688 (owner: 10Muehlenhoff)
[15:10:15] <wikibugs>	 (03PS5) 10Scott French: profile::prometheus::k8s: drop terminated pod targets [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052)
[15:11:53] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[15:11:59] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[15:15:16] <logmsgbot>	 !log bking@cumin2002 conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.eqiad.wmnet
[15:15:44] <wikibugs>	 (03CR) 10Klausman: [C:03+1] profile::prometheus::k8s: drop terminated pod targets [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052) (owner: 10Scott French)
[15:16:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:17:22] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Use GTS staging account in acmechief-test2001 [puppet] - 10https://gerrit.wikimedia.org/r/1149692
[15:17:26] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1096 to cirrussearch1096
[15:17:38] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[15:17:45] <wikibugs>	 (03CR) 10Scott French: "Thank you all for the reviews!" [puppet] - 10https://gerrit.wikimedia.org/r/1149505 (https://phabricator.wikimedia.org/T395052) (owner: 10Scott French)
[15:18:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:18:33] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[15:18:46] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1149692 (owner: 10Vgutierrez)
[15:19:09] <wikibugs>	 (03PS1) 10Fabfur: external_cloud_vendors: fix Azure prefix fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127)
[15:19:47] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Profile::Mediawiki_deployment: add 'clusters' field [puppet] - 10https://gerrit.wikimedia.org/r/1148480 (https://phabricator.wikimedia.org/T388761) (owner: 10Scott French)
[15:19:58] <wikibugs>	 (03PS2) 10Fabfur: external_cloud_vendors: fix Azure prefix fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127)
[15:20:20] <wikibugs>	 (03PS3) 10Fabfur: external_cloud_vendors: fix Azure prefix fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127)
[15:21:17] <wikibugs>	 (03CR) 10JHathaway: systemd::timer: Allow setting FixedRandomDelay (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[15:21:27] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1096 to cirrussearch1096 - bking@cumin2002"
[15:21:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] external_cloud_vendors: fix Azure prefix fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127) (owner: 10Fabfur)
[15:22:06] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1096 to cirrussearch1096 - bking@cumin2002"
[15:22:06] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:22:06] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1096 on all recursors
[15:22:09] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1096 on all recursors
[15:22:11] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1096
[15:22:18] <wikibugs>	 (03CR) 10Scott French: [C:03+2] Profile::Mediawiki_deployment: add 'clusters' field [puppet] - 10https://gerrit.wikimedia.org/r/1148480 (https://phabricator.wikimedia.org/T388761) (owner: 10Scott French)
[15:22:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1096
[15:23:02] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1096 to cirrussearch1096
[15:23:36] <wikibugs>	 (03CR) 10Btullis: "I'd like to think that we won't need this now, so I would be tempted not to proceed with it. But feel free to convince me otherwise." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149013 (https://phabricator.wikimedia.org/T394459) (owner: 10Brouberol)
[15:23:37] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1096.eqiad.wmnet with OS bullseye
[15:23:41] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1096
[15:23:42] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1096
[15:26:34] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "nit in commit message." [puppet] - 10https://gerrit.wikimedia.org/r/1149692 (owner: 10Vgutierrez)
[15:31:13] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1097 to cirrussearch1097
[15:31:27] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[15:31:31] <wikibugs>	 (03CR) 10Vgutierrez: external_cloud_vendors: fix Azure prefix fetch (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127) (owner: 10Fabfur)
[15:32:42] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: Use GTS staging account in acmechief-test2001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149692 (owner: 10Vgutierrez)
[15:33:05] <wikibugs>	 (03CR) 10Scott French: "Alright, [0] has been merged, so you should be good to update this patch to include the `clusters` field. That said, [1] has not yet been " [puppet] - 10https://gerrit.wikimedia.org/r/1148203 (https://phabricator.wikimedia.org/T389786) (owner: 10Brouberol)
[15:33:09] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Use GTS staging account in acmechief-test2001 [puppet] - 10https://gerrit.wikimedia.org/r/1149692 (owner: 10Vgutierrez)
[15:34:28] <wikibugs>	 (03PS4) 10Fabfur: external_cloud_vendors: fix Azure prefix fetch [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127)
[15:34:42] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1097 to cirrussearch1097 - bking@cumin2002"
[15:35:07] <wikibugs>	 (03CR) 10Fabfur: external_cloud_vendors: fix Azure prefix fetch (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1149693 (https://phabricator.wikimedia.org/T395127) (owner: 10Fabfur)
[15:35:52] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1097 to cirrussearch1097 - bking@cumin2002"
[15:35:52] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:35:53] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1097 on all recursors
[15:35:56] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1097 on all recursors
[15:35:57] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1097
[15:36:09] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1097
[15:36:49] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1097 to cirrussearch1097
[15:38:11] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1096.eqiad.wmnet with reason: host reimage
[15:38:19] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1097.eqiad.wmnet with OS bullseye
[15:38:23] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1097
[15:38:23] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1097
[15:42:13] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1096.eqiad.wmnet with reason: host reimage
[15:42:22] <wikibugs>	 10SRE-swift-storage, 10Ceph, 06Data-Persistence, 06serviceops: Onboard the Docker Registry to apus - https://phabricator.wikimedia.org/T394476#10852861 (10MatthewVernon) A few TB of quota shouldn't be a problem; how many objects per bucket are you looking at? We get better performance out of fewer larger o...
[15:45:30] <wikibugs>	 10SRE-swift-storage, 10Thumbor: Gradually drop all thumbnails as a one-off clean up - https://phabricator.wikimedia.org/T379942#10852874 (10MatthewVernon) @Ladsgroup can you let me know when one of the current batch has finished, please? Now we've done the thumbnail defrag stuff, I'd like to re-asses (for the...
[15:52:57] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1097.eqiad.wmnet with reason: host reimage
[15:53:58] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting production SSH key update for Joseph Seddon - https://phabricator.wikimedia.org/T393579#10852902 (10Seddon) @BCornwall @Eevans @Dzahn: Apologies for the delay!   I've posted at https://wikitech.wikimedia.org/wiki/User:Seddon_(WMF)/public_keys
[15:56:14] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1097.eqiad.wmnet with reason: host reimage
[15:57:43] <icinga-wm>	 ACKNOWLEDGEMENT - Host cirrussearch2115 is DOWN: PING CRITICAL - Packet loss = 100% Brian_King rebooted for fw update
[16:01:55] <wikibugs>	 (03PS2) 10Vgutierrez: systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001)
[16:01:55] <wikibugs>	 (03PS3) 10Vgutierrez: systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001)
[16:01:55] <wikibugs>	 (03PS7) 10Vgutierrez: varnish: Deploy edge uniques experiment fetcher [puppet] - 10https://gerrit.wikimedia.org/r/1149651 (https://phabricator.wikimedia.org/T395001)
[16:02:11] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:02:28] <wikibugs>	 (03CR) 10Vgutierrez: systemd::timer: Allow setting FixedRandomDelay (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[16:03:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:04:02] <wikibugs>	 (03PS2) 10Effie Mouzeli: WIP: profile::kubernetes::node: Add script to pull and mount latest mw [puppet] - 10https://gerrit.wikimedia.org/r/1148905 (https://phabricator.wikimedia.org/T276994)
[16:04:52] <wikibugs>	 (03PS3) 10Effie Mouzeli: WIP: profile::kubernetes::node: Add script to pull and mount latest mw [puppet] - 10https://gerrit.wikimedia.org/r/1148905 (https://phabricator.wikimedia.org/T276994)
[16:06:05] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1096.eqiad.wmnet with OS bullseye
[16:07:10] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] systemd::timer: Allow setting FixedRandomDelay [puppet] - 10https://gerrit.wikimedia.org/r/1149647 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[16:07:24] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] systemd::timer::job: Allow setting accuracy and fixed_random_delay [puppet] - 10https://gerrit.wikimedia.org/r/1149648 (https://phabricator.wikimedia.org/T395001) (owner: 10Vgutierrez)
[16:07:31] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2115 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:09:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:17:31] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2115 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:17:32] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1097.eqiad.wmnet with OS bullseye
[16:18:33] <jinxer-wm>	 FIRING: [2x] HelmReleaseBadStatus: Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=eventgate-analytics - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[16:23:14] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE, 13Patch-For-Review: Requesting access to deploy for KCVelaga - https://phabricator.wikimedia.org/T395125#10852994 (10BTullis) Hmm. The `deployment` group brings a lot of power with it, though. I'm not sure that all of our possible Airflow developers would...
[16:26:53] <wikibugs>	 (03PS1) 10Tchanders: Temp accounts: Allow sysop/steward to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T393615)
[16:27:52] <wikibugs>	 (03PS2) 10Tchanders: Temp accounts: Allow sysop/steward to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942)
[16:34:03] <wikibugs>	 (03CR) 10JJMC89: "Stewards have `userrights` (locally) and `userrights-interwiki` (on metawiki, where the changes would actually be done), so this should no" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[16:35:21] <wikibugs>	 (03CR) 10Dreamy Jazz: "+1 to this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[16:39:54] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 07Wikimedia-production-error: UploadChunkFileException: Error storing file in '{chunkPath}': backend-fail-internal; local-swift-codfw - https://phabricator.wikimedia.org/T395049#10853050 (10MatthewVernon) I had a look in the swift logs for the associated item (per...
[16:43:39] <wikibugs>	 (03CR) 10Dreamy Jazz: mw::maintenance::purge_securepoll: Only run on securepollglobal.dblist (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1149629 (https://phabricator.wikimedia.org/T388542) (owner: 10Clément Goubert)
[16:47:03] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853133 (10ABran-WMF) 05Resolved→03Open @wiki_willy mentioned to me that @SDeckelmann-WMF needed to access netbox as well, I'm handing this over to @Dzahn if this is time sensitive as...
[16:51:09] <wikibugs>	 (03PS3) 10Tchanders: Temp accounts: Allow sysop/steward to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942)
[16:51:45] <wikibugs>	 (03PS4) 10Tchanders: Temp accounts: Allow sysop to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942)
[17:00:01] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853174 (10Dzahn) a:05ABran-WMF→03Dzahn
[17:02:11] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:02:24] <wikibugs>	 (03PS5) 10Tchanders: Temp accounts: Allow sysop/steward to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942)
[17:02:46] <wikibugs>	 (03CR) 10Tchanders: "I've removed the changes for stewards from the default." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:04:25] <jinxer-wm>	 FIRING: [8x] SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:05:16] <wikibugs>	 (03CR) 10JJMC89: "> As far as I can tell, it seems that they would need this assigning for metawiki, and that would allow them to grant/revoke the group at " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:06:05] <wikibugs>	 (03CR) 10Dreamy Jazz: "https://meta.wikimedia.org/wiki/Special:ListGroupRights says that stewards have the `userrights` permission, which allows the user to "Edi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:06:43] <wikibugs>	 (03CR) 10Dreamy Jazz: "Also to what JJMCC89 said." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:09:15] <wikibugs>	 (03CR) 10Dreamy Jazz: "AFAICS, it seems that `::getGroupsChangeableBy` returns all groups if the user //locally// has the `userrights` permission even if the cha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:13:37] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853218 (10Dzahn) "netbox access" can still mean some different things.  It requires membership in one of these LDAP groups:  'nda' - partial read-only access (this group is typically use...
[17:19:58] <wikibugs>	 (03PS6) 10Tchanders: Temp accounts: Allow sysop to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942)
[17:20:32] <wikibugs>	 (03CR) 10Tchanders: "Ah, I see `userrights` allows you to change all groups. I thought it allowed you to change whichever ones were added to `$wgAddGroups`." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:21:35] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] Temp accounts: Allow sysop to grant and revoke IP reveal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149699 (https://phabricator.wikimedia.org/T390942) (owner: 10Tchanders)
[17:24:19] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1098 to cirrussearch1098
[17:24:31] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[17:27:54] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1098 to cirrussearch1098 - bking@cumin2002"
[17:28:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1098 to cirrussearch1098 - bking@cumin2002"
[17:28:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:28:49] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1098 on all recursors
[17:28:52] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1098 on all recursors
[17:28:53] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1098
[17:29:21] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1098
[17:30:01] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1098 to cirrussearch1098
[17:30:10] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853269 (10Dzahn) @SDeckelmann-WMF Hey, so.. I checked and you already have membership in the "wmf" LDAP group.  So that means you should be able to login on https://netbox.wikimedia.org...
[17:30:44] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1098.eqiad.wmnet with OS bullseye
[17:30:48] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1098
[17:30:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1098
[17:32:22] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853279 (10Dzahn) 05Open→03Resolved resolving!   But if there is any issue or more is needed feel free to just reopen it, or we can.   (Given the US holiday on Monday and the rota...
[17:32:54] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1099 to cirrussearch1099
[17:33:06] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[17:36:15] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1099 to cirrussearch1099 - bking@cumin2002"
[17:36:55] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1099 to cirrussearch1099 - bking@cumin2002"
[17:36:55] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:36:56] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1099 on all recursors
[17:36:59] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1099 on all recursors
[17:37:00] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1099
[17:37:12] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1099
[17:37:35] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853304 (10SDeckelmann-WMF) Thanks! I can definitely login to netbox, but all of the objects are locked. I'm following the SRE tutorials, so if there's maybe something I missed earlie...
[17:37:51] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1099 to cirrussearch1099
[17:38:22] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1099.eqiad.wmnet with OS bullseye
[17:38:26] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1099
[17:38:26] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1099
[17:50:06] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1098.eqiad.wmnet with reason: host reimage
[17:53:41] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853350 (10Dzahn) Hi Selena, could you link me to the tutorial you are following?
[17:53:49] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1098.eqiad.wmnet with reason: host reimage
[17:57:23] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1099.eqiad.wmnet with reason: host reimage
[17:58:33] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:59:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[18:02:01] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1099.eqiad.wmnet with reason: host reimage
[18:21:11] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1099.eqiad.wmnet with OS bullseye
[18:28:00] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1098.eqiad.wmnet with OS bullseye
[18:42:59] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:43:07] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:43:51] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.190 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:43:57] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 53941 bytes in 0.123 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:44:48] <wikibugs>	 (03PS1) 10Ebernhardson: Turn on glent m1 AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1149720 (https://phabricator.wikimedia.org/T262612)
[18:51:46] <logmsgbot>	 bking@cumin2002 rename (PID 689782) is awaiting input
[18:54:29] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1100 to cirrussearch1100
[18:54:43] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[18:56:39] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for jdlrobson - https://phabricator.wikimedia.org/T393723#10853518 (10Dzahn) Hey @Jdlrobson @Jdlrobson-WMF you have a user on the deployment servers now.   You should now be able to run the `scap spiderpig-otp` command on them to get the access code...
[19:00:18] <logmsgbot>	 bking@cumin2002 rename (PID 689782) is awaiting input
[19:05:43] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1100 to cirrussearch1100 - bking@cumin2002"
[19:06:00] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1100 to cirrussearch1100 - bking@cumin2002"
[19:06:00] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:06:00] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for jdlrobson - https://phabricator.wikimedia.org/T393723#10853523 (10Dzahn) 05In progress→03Resolved a:03Dzahn feel free to reopen if you run into any issues, cheers!
[19:06:01] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1100 on all recursors
[19:06:04] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1100 on all recursors
[19:06:04] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1100
[19:06:17] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1100
[19:06:57] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1100 to cirrussearch1100
[19:08:14] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1100.eqiad.wmnet with OS bullseye
[19:08:18] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1100
[19:08:19] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1100
[19:09:10] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1101 to cirrussearch1101
[19:09:22] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[19:13:18] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1101 to cirrussearch1101 - bking@cumin2002"
[19:15:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1101 to cirrussearch1101 - bking@cumin2002"
[19:15:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:15:22] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1101 on all recursors
[19:15:25] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1101 on all recursors
[19:15:26] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1101
[19:15:46] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1101
[19:16:25] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1101 to cirrussearch1101
[19:18:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:18:38] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[19:22:04] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1101.eqiad.wmnet with OS bullseye
[19:22:09] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1101
[19:22:10] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1101
[19:23:42] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1100.eqiad.wmnet with reason: host reimage
[19:26:37] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting production SSH key update for Joseph Seddon - https://phabricator.wikimedia.org/T393579#10853594 (10Dzahn)
[19:27:02] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting production SSH key update for Joseph Seddon - https://phabricator.wikimedia.org/T393579#10853597 (10Dzahn) Thank you. Verified :)
[19:27:14] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting production SSH key update for Joseph Seddon - https://phabricator.wikimedia.org/T393579#10853598 (10Dzahn) a:05Seddon→03None
[19:27:27] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1100.eqiad.wmnet with reason: host reimage
[19:30:35] <wikibugs>	 (03PS1) 10Dzahn: admin: replace SSH key for seddon [puppet] - 10https://gerrit.wikimedia.org/r/1149736 (https://phabricator.wikimedia.org/T393579)
[19:36:16] <icinga-wm>	 PROBLEM - Disk space on centrallog2002 is CRITICAL: DISK CRITICAL - free space: /srv 83448MiB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog2002&var-datasource=codfw+prometheus/ops
[19:41:54] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1101.eqiad.wmnet with reason: host reimage
[19:42:52] <wikibugs>	 10SRE-swift-storage, 10Thumbor: Gradually drop all thumbnails as a one-off clean up - https://phabricator.wikimedia.org/T379942#10853674 (10Ladsgroup) Sure. In eqiad it's running and it'll take a while
[19:45:40] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1101.eqiad.wmnet with reason: host reimage
[19:46:47] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1102 to cirrussearch1102
[19:47:00] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[19:47:01] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1149544 (owner: 10Muehlenhoff)
[19:50:14] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1102 to cirrussearch1102 - bking@cumin2002"
[19:52:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1102 to cirrussearch1102 - bking@cumin2002"
[19:52:48] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:52:48] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1102 on all recursors
[19:52:52] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1102 on all recursors
[19:52:52] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1102
[19:55:56] <logmsgbot>	 bking@cumin2002 rename (PID 718256) is awaiting input
[19:57:23] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1100.eqiad.wmnet with OS bullseye
[19:58:51] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1102
[19:59:32] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1102 to cirrussearch1102
[20:00:10] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1102.eqiad.wmnet with OS bullseye
[20:00:14] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1102
[20:00:15] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1102
[20:03:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_exim4.service on crm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:07:40] <logmsgbot>	 bking@cumin2002 rename (PID 727000) is awaiting input
[20:09:45] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.rename from elastic1107 to cirrussearch1107
[20:09:57] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[20:10:31] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1101.eqiad.wmnet with OS bullseye
[20:13:25] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1107 to cirrussearch1107 - bking@cumin2002"
[20:14:52] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1102.eqiad.wmnet with reason: host reimage
[20:15:00] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1107 to cirrussearch1107 - bking@cumin2002"
[20:15:01] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:15:01] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1107 on all recursors
[20:15:04] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1107 on all recursors
[20:15:05] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1107
[20:15:15] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1107
[20:15:56] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1107 to cirrussearch1107
[20:16:07] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1114000 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)
[20:17:01] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1107.eqiad.wmnet with OS bullseye
[20:17:06] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1107
[20:17:06] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1107
[20:18:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1102.eqiad.wmnet with reason: host reimage
[20:18:33] <jinxer-wm>	 FIRING: [2x] HelmReleaseBadStatus: Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-rollback - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=eventgate-analytics - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[20:34:17] <logmsgbot>	 bking@cumin2002 reimage (PID 732063) is awaiting input
[20:37:05] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1107.eqiad.wmnet with OS bullseye
[20:37:31] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149748
[20:37:54] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.wipe-cache cirrussearch1107.eqiad.wmnet on all recursors
[20:37:57] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1107.eqiad.wmnet on all recursors
[20:39:41] <jinxer-wm>	 FIRING: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[20:43:20] <logmsgbot>	 bking@cumin2002 reimage (PID 744874) is awaiting input
[20:43:48] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.reimage for host cirrussearch1107.eqiad.wmnet with OS bullseye
[20:43:52] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.move-vlan for host cirrussearch1107
[20:43:52] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1107
[20:44:18] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1102.eqiad.wmnet with OS bullseye
[20:44:41] <jinxer-wm>	 RESOLVED: [6x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[20:51:37] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1149751
[20:59:58] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1107.eqiad.wmnet with reason: host reimage
[21:03:19] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1107.eqiad.wmnet with reason: host reimage
[21:03:49] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for jdlrobson - https://phabricator.wikimedia.org/T393723#10853772 (10Jdlrobson-WMF) Thanks all! Looking forward to trying this out next week!
[21:04:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:16:31] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:19:40] <logmsgbot>	 jhancock@cumin2002 provision (PID 762713) is awaiting input
[21:26:24] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167 (10Dzahn) 03NEW
[21:27:09] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853814 (10Dzahn)
[21:28:10] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853830 (10Dzahn) The contint-roots group will be used for access to new VMs created in T394819.  @Corvus adding your realname is op...
[21:28:46] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853833 (10Dzahn) @Corvus Please take a look at L3 and sign it if you are comfortable with it.
[21:30:33] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1107.eqiad.wmnet with OS bullseye
[21:32:36] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE, 13Patch-For-Review: Requesting access to deploy for KCVelaga - https://phabricator.wikimedia.org/T395125#10853851 (10Dzahn) This sounds like a good idea. We already have other groups like that, gerrit-deployers, zuul-deployers, research-deployers, platform...
[21:35:39] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853866 (10Reedy)
[21:36:25] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853867 (10thcipriani)
[21:37:01] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure (Zuul upgrade): Requesting access to contint-roots for Corvus - https://phabricator.wikimedia.org/T395167#10853869 (10thcipriani) Approved as `contint-roots` approver.
[21:38:22] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting production SSH key update for Joseph Seddon - https://phabricator.wikimedia.org/T393579#10853873 (10Dzahn) 05Stalled→03In progress
[21:49:23] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[21:50:50] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Infrastructure-Foundations, 10netbox: Selena can't see objects in Netbox despite having wmf group membership - https://phabricator.wikimedia.org/T395172#10853926 (10Dzahn)
[21:53:13] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ops-limited for sdeckelmann-wmf - https://phabricator.wikimedia.org/T395094#10853935 (10Dzahn) We chatted a bit about this and it now seems like this is a bug or outdated docs, because she can login but not see any objects despite being in the wmf group.  I cr...
[21:56:35] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2110*,cirrussearch2111* for T394543 - bking@cumin2002
[21:56:37] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2110*,cirrussearch2111* for T394543 - bking@cumin2002
[21:56:39] <stashbot>	 T394543: SSD firmware update not working in firmware cookbook - https://phabricator.wikimedia.org/T394543
[21:56:50] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Infrastructure-Foundations, 10netbox: Selena can't see objects in Netbox despite having wmf group membership - https://phabricator.wikimedia.org/T395172#10853940 (10Dzahn)
[21:58:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host apus-be2004.codfw.wmnet with OS bookworm
[21:58:13] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q4:rack/setup/install apus-be2004 - https://phabricator.wikimedia.org/T392845#10853942 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host apus-be2004.codfw.wmnet with OS bookworm
[21:58:33] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: curator_actions_apifeatureusage_eqiad.service on apifeatureusage1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:07:39] <logmsgbot>	 !log bking@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cirrussearch[2110-2111].codfw.wmnet with reason: firmware update
[22:07:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): SSD firmware update for cirrussearch211[0-5] - https://phabricator.wikimedia.org/T394432#10853952 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ba793dbe-d223-4f3a-8163-69d6fc192e7f) set by bking@cumin2002 for...
[22:22:57] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: SSD firmware update not working in firmware cookbook - https://phabricator.wikimedia.org/T394543#10853990 (10bking) Hey Volans, I flipped the script a bit to make it a bit more readable...  > Anyway, let's look at the future :)  >  > Today I've run some te...
[22:34:35] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on apus-be2004.codfw.wmnet with reason: host reimage
[22:37:22] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-be2004.codfw.wmnet with reason: host reimage
[22:55:28] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[22:58:34] <logmsgbot>	 jhancock@cumin2002 reimage (PID 782301) is awaiting input
[23:01:11] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:01:12] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-be2004.codfw.wmnet with OS bookworm
[23:01:24] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q4:rack/setup/install apus-be2004 - https://phabricator.wikimedia.org/T392845#10854041 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host apus-be2004.codfw.wmnet with OS bookworm completed: - apus-be2004 (**P...
[23:02:09] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q4:rack/setup/install apus-be2004 - https://phabricator.wikimedia.org/T392845#10854042 (10Jhancock.wm) 05Open→03Resolved
[23:02:47] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: Q4:rack/setup/install apus-be2004 - https://phabricator.wikimedia.org/T392845#10854046 (10Jhancock.wm) @MatthewVernon @Jclark-ctr thanks for troubleshooting on the other one.  This one is ready to go
[23:18:33] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:18:38] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[23:23:36] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q4:rack/setup/install ms-be109[2-5] - https://phabricator.wikimedia.org/T393104#10854075 (10wiki_willy) Hi @MatthewVernon - I just replied back to your email with a more in-depth explanation.  The short answer though is that we need m...
[23:38:44] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1149786
[23:38:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1149786 (owner: 10TrainBranchBot)
[23:50:49] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1149786 (owner: 10TrainBranchBot)