[00:04:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp3068:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=esams%20prometheus/ops&var-instance=cp3068&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [00:08:25] FIRING: [5x] SystemdUnitFailed: logrotate.service on cp3067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:40] FIRING: [3x] SystemdUnitFailed: logrotate.service on cp3068:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:08:40] FIRING: [3x] SystemdUnitFailed: logrotate.service on cp3068:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:15:03] logrotate was still complaining on cp[3068,3070-3071].esams.wmnet, it should be fixed now [08:18:25] RESOLVED: [3x] SystemdUnitFailed: logrotate.service on cp3068:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:35:36] hi folks! For https://phabricator.wikimedia.org/T367978 I'd need to rollout a new version of glibc fleetwide, that doesn't really require immediate roll restarts of anything. I wanted to ask what you prefer to do - I can deploy it on some canary nodes first, and possibly restart (I guess) ha-proxy/ats/varnish to verify that everything looks good. After that I could simply complete the rollout [08:35:42] and the updates will be picked up anytime [08:35:44] how does it sound? [08:37:15] of course not today, next week :) [08:42:23] :) [08:42:53] elukey: I'd say target two hosts (one per cluster) in codfw or ulsfo [08:43:17] one in text and another one in upload [08:51:12] all right makes sense, I'll do it next week and then I'll ping again the chan before doing anything [08:51:19] sounds like a plan [08:51:42] till the end of July I'm only working on EU afternoons (with some exceptions like today) [08:51:50] but you can ping fabfur in the EU mornings [08:52:35] sure! [09:28:05] ack! [13:11:20] 10netops, 06Infrastructure-Foundations: Move a server within the same row script not working - https://phabricator.wikimedia.org/T368148 (10Papaul) 03NEW [13:11:36] 10netops, 06Infrastructure-Foundations: Move a server within the same row script not working - https://phabricator.wikimedia.org/T368148#9913358 (10Papaul) p:05Triage→03Medium [13:30:25] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No unicast IP ranges announced to peers from eqdfw - https://phabricator.wikimedia.org/T367439#9913434 (10cmooney) Gonna copy some of the discussion from the patch here as I think it's easier for discussion and a record of what we decide:... [16:27:34] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9913903 (10BCornwall) [17:23:01] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165 (10BCornwall) 03NEW [17:23:40] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165#9914057 (10BCornwall) p:05Triage→03High [17:24:44] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165#9914063 (10BCornwall) [17:49:03] 10netops, 06Data-Persistence, 06Data-Platform-SRE, 06DBA, and 3 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad - https://phabricator.wikimedia.org/T365986#9914102 (10Ottomata) [22:41:47] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e6-eqiad - https://phabricator.wikimedia.org/T365987#9914610 (10cmooney) 05Open→03Resolved [23:14:28] 06Traffic, 06SRE, 13Patch-For-Review: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9914657 (10Dwisehaupt) Frack config has been updated to use the new ntp-[abc].anycast.wmnet servers. The previous dnsXXXX and ntp.anycast.wmnet entries hav...