[08:05:09] FIRING: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:15:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs1019:9100 (eno1np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1019 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:49:57] 06Traffic, 10DPE HAProxy Migration: [HAProxy migration] Some 200 requests in VK are logged as 400 in HAProxy - https://phabricator.wikimedia.org/T387451#10677308 (10JAllemandou) a:05JAllemandou→03None This task is a reference for the traffic team. Removing myself as assignee, and pinging @Fabfur :) [08:51:16] 06Traffic, 10DPE HAProxy Migration: [HAProxy migration] Some 200 requests in VK are logged as 400 in HAProxy - https://phabricator.wikimedia.org/T387451#10677311 (10Fabfur) a:03Fabfur [09:07:00] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th): Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029 (10JAllemandou) 03NEW [09:07:51] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677359 (10ayounsi) [09:07:53] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10677360 (10ayounsi) [09:17:47] 10netops, 06Infrastructure-Foundations, 06SRE: WMCS Eqiad: Enable IPv6 in cloud vrf on switches - https://phabricator.wikimedia.org/T389958#10677390 (10aborrero) >>! In T389958#10674585, @cmooney wrote: > @aborrero as discussed we can possibly arrange a window for Thurs Mar 27th to carry out the remaining st... [09:23:48] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677414 (10ayounsi) At least the BFD metrics are not exposed in Junos 21.2, possibly only starting in 22.3 (https://apps.junipe... [09:25:12] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677417 (10ayounsi) [09:25:43] 10netops, 06Infrastructure-Foundations, 06SRE, 07IPv6: WMCS Eqiad: Enable IPv6 in cloud vrf on switches - https://phabricator.wikimedia.org/T389958#10677419 (10taavi) [09:27:02] 10netops, 06Infrastructure-Foundations, 06SRE, 07IPv6: WMCS Eqiad: Enable IPv6 in cloud vrf on switches - https://phabricator.wikimedia.org/T389958#10677421 (10taavi) [09:35:33] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677444 (10ayounsi) [09:45:22] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th): Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677468 (10JAllemandou) [09:50:38] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677498 (10ayounsi) For alarms: https://apps.juniper.net/telemetry-explorer/select-software?software=Junos%20OS&release=21.2R3&... [10:00:14] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th): Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677545 (10elukey) Thanks a lot for the heads up! I am checking the Benthos [[ https://gerrit.wikimedia.org/r/plugins/gitiles/opera... [10:04:57] 06Traffic, 10Data-Engineering (Q3 2025 January 1st - March 31th): Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677578 (10JAllemandou) Nice catch @elukey :) This behavior (no `dt`) doesn't exist anymore with HAProxy. However there still are ba... [10:09:32] 10netops, 06Infrastructure-Foundations, 06SRE, 07IPv6: WMCS Eqiad: Enable IPv6 in cloud vrf on switches - https://phabricator.wikimedia.org/T389958#10677602 (10aborrero) announcement: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/LX6KDZMQHEL3NZ3DMWQERI2O3YVSDDKM/ [11:00:53] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677770 (10cmooney) >>! In T388641#10677498, @ayounsi wrote: > For alarms: https://apps.juniper.net/telemetry-explorer/select-s... [11:10:27] 10netops, 06Infrastructure-Foundations, 06SRE, 07IPv6: WMCS Eqiad: Enable IPv6 in cloud vrf on switches - https://phabricator.wikimedia.org/T389958#10677790 (10cmooney) [11:12:38] Is anyone aware of any changes to varnishkafka lately that might be causing the processes to quit on reload, just after midnight? Ref: T390031 [11:12:39] T390031: Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031 [11:12:54] btullis yes, bre.tt is already working on it [11:13:11] https://gitlab.wikimedia.org/repos/sre/varnishkafka/-/merge_requests/5 [11:13:25] fabfur: Thanks. [11:13:30] 👍 [11:14:41] fabfur: OK, many thanks. I will close my ticket as a duplicate. Has this been discussed with anyone in Data Engineering, to your knowledge? [11:14:57] not yet, I think [11:16:18] OK, thanks. I started a Slack thread. https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1742981592455269 - We might need to file an incident report to account for the data loss. [11:17:12] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10677814 (10fgiunchedi) In this case we could indeed alert on `gnmi_system_alarms_alarm_state_id` and Prometheus will issue aler... [11:37:52] 06Traffic, 13Patch-For-Review: varnishkafka 1.1.0-5 exits on SIGHUP - https://phabricator.wikimedia.org/T389978#10677881 (10BTullis) [11:38:25] 06Traffic, 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 13Patch-For-Review: varnishkafka 1.1.0-5 exits on SIGHUP - https://phabricator.wikimedia.org/T389978#10677885 (10BTullis) [11:53:20] 10Wikimedia-Apache-configuration, 06Discovery-Search, 06serviceops, 06SRE, and 3 others: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10677935 (10Clement_Goubert) The search box now populates correctly. Question left... [11:53:38] 10Wikimedia-Apache-configuration, 06Discovery-Search, 06serviceops, 06SRE, and 3 others: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10677936 (10Clement_Goubert) 05Open→03Resolved [12:02:25] 10netops, 06Infrastructure-Foundations, 06SRE: Classify ceph traffic flows for network prioritization - https://phabricator.wikimedia.org/T390044 (10cmooney) 03NEW p:05Triage→03Low [12:03:30] 10netops, 06Infrastructure-Foundations, 06SRE: Classify ceph traffic flows for network prioritization - https://phabricator.wikimedia.org/T390044#10677969 (10cmooney) [12:56:37] 10netops, 06Infrastructure-Foundations: Enable gNMI on SRX devices and fasw - https://phabricator.wikimedia.org/T390052 (10ayounsi) 03NEW [12:56:51] 10netops, 06Infrastructure-Foundations: Enable gNMI on SRX devices and fasw - https://phabricator.wikimedia.org/T390052#10678232 (10ayounsi) [12:56:54] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10678231 (10ayounsi) [13:46:36] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10678444 (10ayounsi) @cmooney another question is that if the service is not present on the device, for example BFD where BFD is... [13:47:33] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10678447 (10ayounsi) [13:48:10] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10678463 (10ayounsi) [15:03:33] 10netops, 06Infrastructure-Foundations, 10Data-Engineering (Q3 2025 January 1st - March 31th): Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10679023 (10BTullis) Please could someone expedite this, if possible? We still have some alerts that are flag... [15:37:46] 10netops, 06Infrastructure-Foundations, 10Data-Engineering (Q3 2025 January 1st - March 31th): Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10679226 (10JAllemandou) a:03JAllemandou [16:10:31] 06Traffic, 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 13Patch-For-Review: varnishkafka 1.1.0-5 exits on SIGHUP - https://phabricator.wikimedia.org/T389978#10679443 (10BCornwall) 05In progress→03Resolved This appears to be [16:11:30] 06Traffic, 13Patch-For-Review: Upgrade Varnish from 6.0 to 7.1 - https://phabricator.wikimedia.org/T378737#10679454 (10BCornwall) [16:18:48] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10679496 (10RobH) > We'll be installing the new optics into the original ports, and removing the old optics and patch. > > So please remove the optic patch D0100B and the op... [17:02:51] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10679800 (10ayounsi) just got off the phone with the tech, I made a small mistake it was port 1 on cr1, so he called me to double check. He is going to do the patching, updat... [17:52:45] 06Traffic: Unify CDN ats/haproxy/vanrish upgrade cookbooks - https://phabricator.wikimedia.org/T390094#10680100 (10BCornwall) 05Open→03In progress [17:56:09] 06Traffic: Unify CDN ats/haproxy/vanrish upgrade cookbooks - https://phabricator.wikimedia.org/T390094#10680121 (10BCornwall) p:05Triage→03Low [17:56:50] 06Traffic: Unify CDN ats/haproxy/varnish upgrade cookbooks - https://phabricator.wikimedia.org/T390094#10680125 (10Krinkle) [17:57:36] 06Traffic: Unify CDN ats/haproxy/varnish upgrade cookbooks - https://phabricator.wikimedia.org/T390094#10680126 (10BCornwall) @Volans I was originally implementing option 1 but I found the complexity to be a little much. In particular, how would we be able to set the grace sleeps dynamically? Is that simple to do? [19:47:19] 06Traffic: Unify CDN ats/haproxy/varnish upgrade cookbooks - https://phabricator.wikimedia.org/T390094#10680449 (10Volans) Thanks @BCornwall for the moving to a task, easier to discuss :) I think Traffic should decide which UI/UX you prefer for this kind of operation and we can go into implementation details for... [22:14:13] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10681006 (10cmooney) p:05High→03Low Happy to say all looks good following the replacement patch and optics being installed this evening: ` cmooney@cr1-drmrs> show interfa... [22:51:23] 06Traffic, 13Patch-For-Review: Upgrade Varnish from 6.0 to 7.1 - https://phabricator.wikimedia.org/T378737#10681117 (10BCornwall)