[00:12:56] 06Traffic, 13Patch-For-Review: Upgrade Varnish from 6.0 to 7.1 - https://phabricator.wikimedia.org/T378737#10660046 (10BCornwall) [00:20:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp4047:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=ulsfo%20prometheus/ops&var-instance=cp4047 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [00:20:38] ^I imagine this is related to robh's taking the host down for bios upgrades [00:20:49] indeed, he silenced it :) [00:21:58] I'll downtime the host [00:29:09] FIRING: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:44:09] RESOLVED: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:49:09] FIRING: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:51:00] 06Traffic: Seeing dropped packets on esams - https://phabricator.wikimedia.org/T389575 (10AlexisJazz) 03NEW [00:52:53] 06Traffic: 40% packetloss on esams - https://phabricator.wikimedia.org/T389575#10660123 (10AlexisJazz) [00:54:09] RESOLVED: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:55:08] 06Traffic: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660124 (10AlexisJazz) [00:55:09] FIRING: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:58:28] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660127 (10AlexisJazz) [01:05:00] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660132 (10AlexisJazz) If I specifically ping ESAMS through my VPN, again packet loss: ` # ping text-lb.esams.wikimedia.org -c10 PING text-lb.esams.wikimedia.org (185.15.59.224) 56(84) bytes of da... [01:15:32] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660148 (10Dylsss) I'm also having some pretty severe packet loss to esams. ` ping phabricator.wikimedia.org -c10 PING phabricator.wikimedia.org (185.15.59.224) 56(84) bytes of data. 64 bytes from... [01:27:39] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660166 (10BCornwall) 05Open→03In progress p:05Triage→03Unbreak! [01:34:24] RESOLVED: [12x] LVSHighCPU: The host lvs3008:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs3008 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [01:50:04] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: 40% packet loss on ESAMS - https://phabricator.wikimedia.org/T389575#10660196 (10BCornwall) 05In progress→03Resolved a:03BCornwall Thank you for the report. We've looked into the issue and now the network is behaving properly again. [04:48:09] FIRING: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [04:53:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:35:09] FIRING: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:40:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [09:02:31] 06Traffic, 06SRE, 10WikimediaDebug, 07Developer Productivity, 13Patch-For-Review: Let X-Analytics response header pass through with WikimediaDebug - https://phabricator.wikimedia.org/T305794#10660809 (10Krinkle) [09:02:32] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660812 (10Fabfur) We tried (many thanks to @brouberol ) to explicitly set the DLQ topic to `LogAppend` and seems to work (meaning... [09:07:16] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660833 (10Fabfur) [09:07:19] 06Traffic, 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: New software: haproxykafka - https://phabricator.wikimedia.org/T370668#10660834 (10Fabfur) [09:08:13] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660835 (10brouberol) > Problem here is that producer overrides are ignored Ish. When the producer set a timestamp type override t... [09:13:10] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660855 (10brouberol) ` brouberol@kafka-jumbo1014:~$ kafka topics --topic webrequest_text --alter --config message.timestamp.type=L... [09:18:27] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660861 (10brouberol) Traffic on `webrequest_text` is stable. I'm going to apply the config change on `webrequest_upload` now. [09:19:30] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660865 (10brouberol) ` brouberol@kafka-jumbo1014:~$ kafka topics --topic webrequest_upload --alter --config message.timestamp.type... [09:19:38] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660866 (10brouberol) [09:32:21] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660881 (10Fabfur) I'd say that this is now done, will wait confirmation from @JAllemandou to check that on their side all is fine... [09:35:59] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660889 (10JAllemandou) Actually we need to change `webrequest_frontend_text` and `webrequest_frontend_upload` topics, as those are... [09:39:05] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660893 (10brouberol) Sure, I can do that. Should I also remove the topic config override on the `webrequest_errors`, `webrequest_t... [09:43:03] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660921 (10Fabfur) >>! In T389521#10660893, @brouberol wrote: > Sure, I can do that. Should I also remove the topic config override... [09:44:17] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660937 (10brouberol) ` [09:48:54] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660964 (10Fabfur) Can confirm that both topics now have messages with `tstype: logappend` Thanks for all the work (again)! [09:50:12] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10660966 (10JAllemandou) Thanks folks! [09:56:25] 10netops, 06Infrastructure-Foundations, 06SRE, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Add QoS markings to profile Hadoop/HDFS analytics traffic - https://phabricator.wikimedia.org/T381389#10660991 (10Gehel) [11:24:07] 10netops, 06Infrastructure-Foundations, 06SRE, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Add QoS markings to profile Hadoop/HDFS analytics traffic - https://phabricator.wikimedia.org/T381389#10661391 (10cmooney) >>! In T381389#10583616, @xcollazo wrote: > @cmooney, should we move forward with this pat... [11:32:58] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: Update HAProxyKafka kafka-timestamp type - https://phabricator.wikimedia.org/T389521#10661402 (10Fabfur) 05Open→03Resolved I think this can be closed now and leave it as reference for this Kafka peculiarity, i... [14:39:33] 06Traffic, 13Patch-For-Review: Private TLS material (TLS keys) should be stored in volatile storage only - https://phabricator.wikimedia.org/T384227#10662162 (10Fabfur) [20:40:39] Hi, I'm wondering if varnish -> ats is unencrypted. I asked as I'm looking into create a varnish -> ats. We only run varnish across two servers and don't send the request to the right server that contains the cache. So it caches twice. [20:46:54] there is no encryption anywhere within those bits. [20:47:19] there is TLS when ATS talks to a backend server [20:47:33] but not between varnish and ATS [20:53:13] Once single-backend is enabled for all dcs it wouldn't make sense to have encryption since it'll pass through the same server anyway [21:08:54] sukhe brett: ah thanks! [21:18:23] Hello Wikimedia Traffic. I'm hoping I can find someone here to purge the following URL from the CDN cache: https://spiderpig.wikimedia.org/api/whoami [21:28:26] dancy: see -ops, done [21:30:17] Many thanks! [21:34:09] 06Traffic, 13Patch-For-Review: Upgrade Varnish from 6.0 to 7.1 - https://phabricator.wikimedia.org/T378737#10664216 (10BCornwall) [22:53:19] 06Traffic, 06Data-Engineering: GeoDNS: Pipeline from event.development_network_probe to operations/dns.git - https://phabricator.wikimedia.org/T380626#10664431 (10CDobbins) A concern that's been brought up is that some of the results from the Probenet data are unexpected (e.g., Paraguay [[ https://gerrit.wikim... [23:56:03] We host mobile UI (MobileFrontend) on the same domain as the desktop site and use https://www.mediawiki.org/wiki/Extension:MobileFrontend/Configuring_browser_auto-detection#Detection_using_Varnish:_same_domain_for_desktop/mobile_site. Does anyone know how you'd do that in ats. Or is doing it in varnish adequate enough