[07:23:18] 10Traffic, 10Observability-Metrics, 10Patch-For-Review: Add prometheus-https load balancer - https://phabricator.wikimedia.org/T326657 (10fgiunchedi) I don't think we're done yet, trafficserver is still using hostnames and not `prometheus.svc` records [09:06:36] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [09:07:12] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) Started working on `purged` and `prometheus-rdkafka-exporter` [09:08:23] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [09:55:51] 10Traffic, 10SRE: Recompile fifo-log-demux with hardening options - https://phabricator.wikimedia.org/T342900 (10Fabfur) Same can be done on the `prometheus-rdkafka-exporter` package (https://gerrit.wikimedia.org/r/admin/repos/operations/software/prometheus-rdkafka-exporter,general) [10:07:53] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) Both `purged` and `prometheus-rdkafka-exporter` are ready for review, and eventually inclusion in wmf repositories. Considering that the `purged` package builds in Bookworm with `pr... [13:45:44] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [14:31:24] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) The following packages are ready to be imported into bookworm-wikimedia: * fifo-log-demux * file-read-backwards * prometheus-rdkafka-exporter * prometheus-varnishkafka-exporter See... [14:59:20] 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) [14:59:27] 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) p:05Triage→03Medium [15:03:55] 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) I'm wondering if reducing the hard-stop-after window from 5m to something smaller than the scrap time from prometheus (once a minute) could get rid of this. What are your thoughts @fg... [15:14:06] 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10fgiunchedi) I'm not familiar with haproxy reload architecture, however your theory seems sound to me @Vgutierrez! [15:58:47] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10MatthewVernon) Here are a couple of rough graphs - frequency distribution of thumbnails (served by swift on 24 July, and all thum... [17:41:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) I've done some work on this to allow for serving the JunOS image as part of the process. In the initial commits... [18:57:12] (LVSHighCPU) firing: (2) The host lvs3005:9100 has at least its CPU 1 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [18:57:43] oh [18:57:46] hmm [19:02:12] (LVSHighCPU) resolved: (8) The host lvs3005:9100 has at least its CPU 1 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU