[01:50:24] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Quiddity) [02:25:18] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Patch-For-Review: Find a sensible way to direct traffic to mw-on-k8s - https://phabricator.wikimedia.org/T331318 (10Krinkle) [06:51:16] 10Traffic, 10SRE-Sprint-Week-Sustainability-March2023, 10envoy, 10serviceops, 10Sustainability (Incident Followup): Raw "upstream connect error or disconnect/reset before headers. reset reason: overflow" error message shown to users during outage - https://phabricator.wikimedia.org/T287983 (10Joe) 05Ope... [07:48:00] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Marostegui) [08:08:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) @papaul looks good to me. I can do them any day this week except today (Tuesday), so whenever... [08:15:30] fabfur: nice you see you around ;) [08:20:57] thanks :) [08:38:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10aborrero) >>! In T327919#8732605, @cmooney wrote: > > @aborrero are we ok to proceed with theis second... [08:55:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) >>! In T296832#8729881, @Volans wrote: > Looks ok to me too, I'm no sure about all the details involved if w... [08:58:09] 10netops, 10Analytics-Radar, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10aborrero) 05Resolved→03Open This happened to me today in a couple of hardware servers, see {T333281} and {T333282}. [09:03:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) >>! In T296832#8729881, @Volans wrote: > Looks ok to me too, I'm no sure about all the details involved if w... [09:03:38] 10Traffic, 10ops-codfw: cp2035 IPMI and management console issues - https://phabricator.wikimedia.org/T333312 (10Vgutierrez) [09:04:28] 10Traffic, 10ops-codfw: cp2035 IPMI and management console issues - https://phabricator.wikimedia.org/T333312 (10Vgutierrez) p:05Triage→03Medium [09:09:03] 10netops, 10Analytics-Radar, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10cmooney) @aborrero do you have more details on what happened with those? I'm not sure the symptoms are the same. In the Ganeti case the hyperviso... [09:11:45] 10netops, 10Analytics-Radar, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10aborrero) >>! In T273026#8732900, @cmooney wrote: > @aborrero do you have more details on what happened with those? > > I'm not sure the symptoms... [09:26:39] 10netops, 10Infrastructure-Foundations, 10SRE: Homer unable to commit config to cloudsw1-b1-codfw (QFX5120 21.4R3.16) - https://phabricator.wikimedia.org/T333316 (10cmooney) p:05Triage→03Medium [09:26:53] 10netops, 10Infrastructure-Foundations, 10SRE: Homer unable to commit config to cloudsw1-b1-codfw (QFX5120 21.4R3.16) - https://phabricator.wikimedia.org/T333316 (10cmooney) [09:27:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [09:32:31] (SystemdUnitFailed) firing: ipmiseld.service Failed on cp2035:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status?orgId=1&forceLogin&editPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:35:04] 10netops, 10Infrastructure-Foundations, 10SRE: Homer unable to commit config to cloudsw1-b1-codfw (QFX5120 21.4R3.16) - https://phabricator.wikimedia.org/T333316 (10cmooney) Logs from switch at during operation: `lines=20 Mar 28 09:28:50 cloudsw1-b1-codfw sshd[11342]: WARNING: could not open /etc/ssh/moduli... [09:36:18] 10netops, 10Infrastructure-Foundations, 10SRE: Homer unable to commit config to cloudsw1-b1-codfw (QFX5120 21.4R3.16) - https://phabricator.wikimedia.org/T333316 (10cmooney) [09:45:17] 10Traffic, 10SRE, 10ops-codfw: cp2035 IPMI and management console issues - https://phabricator.wikimedia.org/T333312 (10Vgutierrez) Unable to reset the management card: ` root@cp2035:~# bmc-device --cold-reset; echo $? ipmi_cmd_cold_reset: driver timeout 1 ` [09:46:14] 10Traffic, 10SRE, 10ops-codfw: cp2035 IPMI and management console issues - https://phabricator.wikimedia.org/T333312 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=07b8190f-1479-43ea-ba98-63f852f30e9e) set by vgutierrez@cumin1001 for 2 days, 0:00:00 on 1 host(s) and their services with r... [09:51:58] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [10:32:36] 10netops, 10Analytics-Radar, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10cmooney) >>! In T273026#8732916, @aborrero wrote: > I don't know exactly what happened. > > My hunch is that the systemd service has been in faile... [10:42:14] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) @ayounsi thanks for the response. Overall I've no objection so let's proceed. I agree in terms of addin... [10:46:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10cmooney) Thanks for the task, does indeed look like a useful tool that could simplify adding additional monitoring without having to modify the LibreNMS codeb... [10:46:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10cmooney) a:03cmooney [11:28:14] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [11:30:36] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fnegri) I "depooled" dbproxy1019 by following the procedure at https://wikitech.wikimedia.org/w/index.php?title=Portal:Data_Services/Admin/Runbooks/Dep... [12:29:17] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) [12:41:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10fgiunchedi) I took a quick look at the exporter and looks good to me too! Also +1 on the general testing/deployment plan re: SSH from a quick read through th... [12:50:47] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [12:56:32] vgutierrez: I added you as reviewer on https://gerrit.wikimedia.org/r/c/operations/puppet/+/900700 if you have some time [12:58:31] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches... [13:02:40] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:18:02] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter depool all active/active services in eqiad: eqiad row B switches... [13:22:56] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ssingh) [13:35:57] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:41:11] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [13:44:25] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10herron) [13:46:13] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10Jelto) [13:49:47] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=4c1e12e1-9d5e-4447-880a-f0ec09133a64) set by ayounsi@cumin1001 for 2:00:00 on 249 host... [13:54:09] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10jbond) [13:55:49] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10MatthewVernon) [13:56:37] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10BTullis) [13:57:16] 10Traffic, 10SRE, 10Upstream: HAProxy 2.6.10 crashing in the text cluster - https://phabricator.wikimedia.org/T332796 (10Vgutierrez) 2.6.12 has been released https://www.mail-archive.com/haproxy@formilux.org/msg43371.html including the patch that we've been testing in text@ulsfo [13:59:37] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [14:02:04] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10jbond) [14:26:59] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10fgiunchedi) [14:32:54] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches up... [14:36:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10Papaul) @cmooney can we do this on Thursday ? Can we also do the other batches(3-4) on the same day? [14:48:05] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ops-monitoring-bot) akosiaris@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all active/active services in eqiad: eqiad row B switches up... [14:49:59] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [14:50:16] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [14:50:32] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) [14:50:42] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [14:57:54] bblack, vgutierrez, thanks for the review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/900700/30..31 ! I think the last comment has been addressed by Jameel [15:00:01] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10ayounsi) The switch upgrade itself went smoothly as well, like the other rows. One issue was that gerrit1001 was missing from the list. This is becaus... [15:23:28] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting: Bonded interface setup for alert hosts - https://phabricator.wikimedia.org/T333371 (10herron) p:05Triage→03Medium [15:50:04] (PyBalBGPUnstable) firing: PyBal BGP sessions on instance lvs1018 are failing - TODO - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=eqiad%20prometheus/ops&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [15:51:10] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [15:53:44] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp1082:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [15:54:57] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 6 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) [15:55:24] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [15:55:31] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 6 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) [15:55:40] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [16:02:26] hmm seems like cp1082 didn't come back up [16:02:26] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Bonded interface setup for alert hosts - https://phabricator.wikimedia.org/T333371 (10ayounsi) See guidelines on https://wikitech.wikimedia.org/wiki/Wikimedia_network_guidelines#Servers_uplinks but it's usually not worth it. We only... [16:04:37] was it intended to just be a network port outage, or was it powered off or something? [16:04:45] sorry I haven't been following closely [16:05:44] no it wasn't powered off, just network [16:05:48] I jumped on the console, it's still alive [16:05:59] it has some repeated errors coming out of dmesg about the nic though [16:06:10] [4622313.647703] bnxt_en 0000:3b:00.0 enp59s0f0np0: TX timeout detected, starting reset task! [16:06:13] ^ every ~6 seconds [16:06:30] werid [16:06:37] weird even [16:06:46] I'll try software reset of iface from console, see what happens [16:07:14] ok thanks! [16:07:31] I see nothing in getsel so definitely not some other hw failure (and the host is up too so) [16:08:29] has some other weird errors too, and soft down->up didn't do anything useful [16:08:32] Mar 28 16:07:44 cp1082 kernel: [4622420.663103] bnxt_en 0000:3b:00.0 enp50: [0]: rx{fw_ring: 1 prod: 22} rx_agg{fw_ring: 13 agg_prod: 1ffc sw_agg_ffc} [16:08:47] we may have just triggered a driver bug with the NIC link blip somehow [16:08:57] can I try a reboot? depooled? [16:09:01] it's depooled yp [16:09:15] the NIC firmware matches the other cp hosts that were affected and they all have the same version [16:09:21] no issues on them though [16:15:10] bblack: seems like the good ol' reboot worked :) [16:15:31] going to reschedule the checks, let it rest for a bit and try to see what happened, and will pool again (you can leave that to me) [16:17:12] ok cool [16:17:54] we use a lot of funky kernel-level customization of nic stuff on these hosts [16:18:21] I'm not shocked that once in a while, in some odd situation with a switch reboot, we sometimes trigger some driver bug that's too rare for most to care about [16:18:43] as long as it's rare, we can let it go :) [16:19:10] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp1082:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [16:19:43] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10ayounsi) [16:27:25] :P [16:36:39] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10MatthewVernon) [16:37:01] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10MatthewVernon) [17:10:39] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Bonded interface setup for alert hosts - https://phabricator.wikimedia.org/T333371 (10herron) 05Open→03Declined Thanks, fwiw I added a talk topic on wiki in hopes that link redundancy can be explored the next time switch upgrades/... [17:57:50] <_joe_> hi, can i be temporarily be invited to your private channel? [18:38:56] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Bonded interface setup for alert hosts - https://phabricator.wikimedia.org/T333371 (10cmooney) Yeah I tend to agree, with one top-of-rack switch two connections only protects against link failure (as they both land on the same switch)... [18:45:34] 10Domains, 10Traffic, 10SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080 (10BCornwall) [18:46:21] 10Domains, 10Traffic, 10SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080 (10BCornwall) 05Open→03Stalled p:05Medium→03Low [19:13:51] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE: Bonded interface setup for alert hosts - https://phabricator.wikimedia.org/T333371 (10herron) >>! In T333371#8736041, @cmooney wrote: > In the case of a server failure do the alert hosts fail over? Not automatically at the present... [19:29:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) >>! In T327919#8735024, @Papaul wrote: > @cmooney can we do this on Thursday ? Can we also do... [19:49:51] 10Traffic, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: eqiad row C switches upgrade - https://phabricator.wikimedia.org/T331882 (10Eevans) [19:50:00] (PyBalBGPUnstable) firing: PyBal BGP sessions on instance lvs1018 are failing - TODO - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=eqiad%20prometheus/ops&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [19:51:41] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Eevans) [19:54:21] 10Traffic, 10DNS, 10SRE, 10Wikimedia-Language-setup, 10Patch-For-Review: Chinese subdomain redirect improvements - https://phabricator.wikimedia.org/T86915 (10BCornwall) [19:54:44] 10Traffic, 10DNS, 10SRE, 10Wikimedia-Language-setup, 10Patch-For-Review: Chinese subdomain redirect improvements - https://phabricator.wikimedia.org/T86915 (10BCornwall) I've updated the description to accurately reflect the current issues. Note that per T230382 there are no longer minnan/zh-cfr aliases. [20:08:42] 10Traffic, 10DNS, 10SRE, 10Wikimedia-Language-setup, 10Patch-For-Review: zh-min-nan.wikinews.org redirects to unprefixed incubator - https://phabricator.wikimedia.org/T86915 (10BCornwall) [20:09:08] 10Traffic, 10DNS, 10SRE, 10Wikimedia-Language-setup, 10Patch-For-Review: zh-min-nan.wikinews.org redirects to unprefixed incubator - https://phabricator.wikimedia.org/T86915 (10BCornwall) Further trimmed some stuff as T173966 is tracking the redirects. [20:31:43] 10Wikimedia-Apache-configuration, 10DNS, 10SRE, 10Traffic-Icebox: Like nan.wikipedia.org, redirect other nan.*.org to the proper zh-min-nan.*.org domains - https://phabricator.wikimedia.org/T173966 (10BCornwall) 05Open→03Resolved a:03BCornwall Thank you for the patch and for your patience, @Fomafix!... [20:46:57] 10Traffic, 10Commons, 10SRE: Specific PNG thumbnail of SVG file is outdated / stuck (European caching cluster) - https://phabricator.wikimedia.org/T333042 (10Lionel_Scheepmans) Hi folks. I'm in front of a very strange phenomenon probably linked to this bug, and this time it concerns a PDF File. So. Go to... [20:53:39] 10Traffic, 10DBA, 10Data Pipelines, 10Data-Engineering-Planning, and 10 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10colewhite) [20:56:20] 10Traffic, 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, and 7 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10colewhite) [21:05:33] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create and deploy per-CDN-site DNS domains - https://phabricator.wikimedia.org/T332025 (10BCornwall) 05Open→03Resolved a:03BCornwall Thanks @JameelKaisar for the patch! Looks like this is resolved. If this was in error, please feel f... [21:05:35] 10Traffic, 10Infrastructure-Foundations, 10SRE: GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10BCornwall) [21:07:08] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Create and deploy per-CDN-site DNS domains - https://phabricator.wikimedia.org/T332025 (10BCornwall) a:05BCornwall→03JameelKaisar [21:07:59] 10Traffic, 10Infrastructure-Foundations, 10SRE: GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10BCornwall) Hi, @CDanis. Thanks for creating this ticket. Would you mind expanding on the nature of the report? Thanks! [21:11:00] 10Traffic, 10Observability-Metrics, 10Patch-For-Review: Add prometheus-https load balancer - https://phabricator.wikimedia.org/T326657 (10BCornwall) 05Open→03In progress p:05Triage→03Low a:03herron [23:50:00] (PyBalBGPUnstable) firing: PyBal BGP sessions on instance lvs1018 are failing - TODO - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=eqiad%20prometheus/ops&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [23:57:37] 10Traffic, 10SRE, 10ops-codfw: cp2035 IPMI and management console issues - https://phabricator.wikimedia.org/T333312 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm confirmed with Sukhe that it was depoooled. worked remotely with Papaul to update the idrac and the bios.