[00:05:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-nginx-exporter.service on urldownloader1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:05:41] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-nginx-exporter.service on urldownloader1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:50:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-nginx-exporter.service on urldownloader1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:34:56] <wikibugs>	 10netbox, 10netops, 10DNS, 06Infrastructure-Foundations, and 2 others: Missing includes in DNS repo from Netbox-generated snippets - https://phabricator.wikimedia.org/T422115#11792933 (10ayounsi) What would be a good day to alert about those ? Or even better, not even need an alert ?
[09:50:41] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-nginx-exporter.service on urldownloader2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:05:26] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wmf_auto_restart_prometheus-nginx-exporter.service on urldownloader2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:01:33] <wikibugs>	 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: esams: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416450#11793409 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5c0f0433-7743-49c9-b411-8f120a9f337d) set by ayounsi@cumin1003 for 1:00:00 on 3 host(s)...
[13:30:44] <wikibugs>	 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: esams: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416450#11793801 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3c9a80b3-71a3-420a-bae4-d8cf79e5188e) set by ayounsi@cumin1003 for 0:30:00 on 3 host(s)...
[13:53:20] <wikibugs>	 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: esams: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416450#11793947 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b81e6903-f6db-4d47-bfbb-a5bff02b24fa) set by ayounsi@cumin1003 for 2:00:00 on 3 host(s)...
[14:31:38] <wikibugs>	 10CAS-SSO, 06Infrastructure-Foundations: CAS login page overflows on iOS Safari (iPhone 16e) - https://phabricator.wikimedia.org/T422203#11794193 (10SLyngshede-WMF) p:05Triage→03Medium a:03SLyngshede-WMF
[14:31:42] <wikibugs>	 10CAS-SSO, 06Infrastructure-Foundations: CAS login page overflows on iOS Safari (iPhone 16e) - https://phabricator.wikimedia.org/T422203#11794195 (10LSobanski) p:05Medium→03Low a:05SLyngshede-WMF→03None
[14:32:48] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Re-IP eqiad private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T421704#11794201 (10LSobanski) p:05Triage→03Low
[14:32:50] <wikibugs>	 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11794213 (10ayounsi) p:05Triage→03Medium
[14:38:30] <elukey>	 topranks: o/
[14:39:05] <elukey>	 I am seeing a decrease in memcached errors in the past 3/4 days: https://phabricator.wikimedia.org/T420223#11794128
[14:39:39] <elukey>	 And I checked the top 4 pods ending up in errors, their correspondent wikikube worker is in the private c/d old vlan
[14:40:33] <elukey>	 could it be that Arzhel moving wikikube workers to the new vlans is the source of the improvement?
[14:40:34] <elukey>	 somehow
[14:42:42] <topranks>	 not sure - did he move some hosts?  maybe as a test?  I know clem is moving wikikube-worker1273 right now 
[14:43:03] <claime>	 I'm moving one
[14:43:14] <claime>	 I don't think those arzhel moved are in prod
[14:44:04] <claime>	 but could it be that depooling the ones that were resolved the issue?
[14:44:09] <claime>	 """resolved"""
[14:44:26] <claime>	 My plan is to move one host to the new vlan, reintegrate it to prod and see if it gets the issue agian
[14:44:28] <claime>	 again*
[14:48:06] <elukey>	 we depooled them a while ago, not sure though if Effie depooled more during the past days
[14:49:07] <claime>	 Apparently there's a host on the new vlan already
[14:49:15] <claime>	 We should check if it has errors or not
[14:55:20] <claime>	 It wasn't pooled, but I just pooled it, we'll see when workloads are deployed on it if anything happens
[14:55:25] <claime>	 wikikube-worker1347.eqiad.wmnet
[14:57:40] <wikibugs>	 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: esams: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416450#11794425 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=918af1bd-6f87-4ec9-acb7-39622f38db7c) set by ayounsi@cumin1003 for 0:30:00 on 3 host(s)...
[15:55:51] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams failed upgrade - https://phabricator.wikimedia.org/T422525 (10cmooney) 03NEW p:05Triage→03Medium
[15:56:01] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams failed upgrade - https://phabricator.wikimedia.org/T422525#11794809 (10cmooney)
[15:56:07] <wikibugs>	 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: esams: upgrade routers & switches (2026) - https://phabricator.wikimedia.org/T416450#11794810 (10cmooney)
[16:47:16] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams failed upgrade - https://phabricator.wikimedia.org/T422525#11795258 (10cmooney)
[16:49:27] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: cr1-esams failed upgrade - https://phabricator.wikimedia.org/T422525#11795284 (10cmooney)