[09:04:48] 10netops, 06Infrastructure-Foundations, 06SRE: Decom cookbook: delete virtual interfaces from device - https://phabricator.wikimedia.org/T398412 (10cmooney) 03NEW p:05Triage→03Low [09:08:32] FYI, as part of the bookworm update, the 2x2 clusters need to be temporarily switched to single node clusters, for the VM switches there will be brief anycast monitoring blips for doh3004 and durum3004 [09:11:10] wrong host names, though: should be doh6002 and durum6002 [09:11:53] :) [09:24:57] volans: re T392851 I've manually enabled IPMI on cp2043 to try to get some progress there, now reimage seems to be blocked in the same way as sretest2006 (https://phabricator.wikimedia.org/T392851#10965714) [09:24:58] T392851: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851 [09:26:20] + the additional error of not being able to find any NIC [09:29:50] vgutierrez: ack, in meeting right now, I can have a look later, I'm catching up also from yesterday that I was out, I saw the task updates [09:30:12] sure just let me know if I can help [09:31:32] clearly idrac10 has brought some backward incompatibility changes [09:37:15] the pain of new toys [09:37:25] yep [09:38:42] 10netops, 06Infrastructure-Foundations, 10netbox, 06SRE: Decom cookbook: delete virtual interfaces from device - https://phabricator.wikimedia.org/T398412#10966696 (10ayounsi) [09:39:39] 10netops, 06Infrastructure-Foundations, 10netbox, 06SRE: Decom cookbook: delete virtual interfaces from device - https://phabricator.wikimedia.org/T398412#10966702 (10ayounsi) option 2 lgtm! [09:49:01] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:53:23] 06Traffic, 13Patch-For-Review: Stop issuing RSA certificates - https://phabricator.wikimedia.org/T398020#10966750 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [09:54:01] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:59:00] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [11:49:56] FIRING: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:50:02] FIRING: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:53:20] FIRING: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:08:39] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433 (10ayounsi) 03NEW [12:10:12] 10netops, 06Infrastructure-Foundations: lsw1-a8-codfw: fpc0 PFE Statistics received unknown trigger (type Semaphore, id 0) - https://phabricator.wikimedia.org/T398433#10967385 (10ayounsi) the upside is that there are only 3 hosts on that switch at the moment: * db2146 * wikikube-worker2046 * wikikube-worker204... [12:44:56] RESOLVED: [2x] SLOMetricAbsent: haproxy-combined - https://slo.wikimedia.org/?search=haproxy-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:48:21] RESOLVED: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:50:02] RESOLVED: SLOMetricAbsent: varnish-combined drmrs - https://slo.wikimedia.org/?search=varnish-combined - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [13:00:28] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: eqiad: second frack parent tracking task - https://phabricator.wikimedia.org/T392006#10967614 (10Jclark-ctr) @RobH Can we close this task now that a decision has been made? [13:10:39] doh6002/durum6002 will be unavailable for a second transition step for a while [13:12:38] no worries, thanks [13:19:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:24:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:27:30] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:30:47] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-eqiad: InboundInterfaceErrors reports for fasw2-c1a-eqiad:9804 frmon1002 ge-0/0/11 - https://phabricator.wikimedia.org/T398442 (10Jgreen) 03NEW [13:32:30] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:48:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:53:00] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:58:00] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh6002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=drmrs&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [14:09:09] moritzm: btw we lost metrics in drmrs during a whole hour [14:11:17] yeah, it's kinda unvoidable with the current 2x2 setup when we need to do reimages [14:11:57] the drmrs refresh is up in the next year, then we can reinstall it with routed Ganeti to avoid these pains [15:48:10] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10968457 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a07dbafa-c59a-4244-84d2-35dacdc58bb2) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [15:57:15] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10968522 (10Vgutierrez) [19:33:01] 06Traffic, 10MediaWiki-Core-AuthManager, 06MediaWiki-Platform-Team: [WE5.5.3] Decide how to expose session information to infrastructure layers in front of MediaWiki - https://phabricator.wikimedia.org/T394012#10969542 (10Tgr) >>! In T394012#10929064, @Tgr wrote: > My first stab at how the roadmap for option... [21:09:33] 06Traffic, 10MobileFrontend, 10MediaWiki-Platform-Team (Radar), 10MW-1.45-notes (1.45.0-wmf.8; 2025-07-01), 13Patch-For-Review: MobileFrontend should declare "X-Subdomain" variance via "Vary" response header - https://phabricator.wikimedia.org/T390929#10970005 (10Krinkle)