[01:44:06] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on Wikidata for Firefox (Browser extension) - https://phabricator.wikimedia.org/T398588#11466958 (10Aklapper) 05Open→03Declined [01:48:05] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11466974 (10Papaul) a:05Papaul→03ayounsi @ayounsi assigned back to you since you are working on it. thanks [04:51:29] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11467201 (10Papaul) I took a quick look at this before getting the support ticket going on. On lsw1-e2-codfw we have ` Frame length statistics for m... [08:58:04] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11467826 (10ayounsi) My guess is that SR-Linux < 25 doesn't have stats for mgmt0 (either not implemented yet or a bug), with the upgrade we've started... [10:00:29] FIRING: HAProxyRestarted: HAProxy server restarted on cp7009:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=magru%20prometheus/ops&var-instance=cp7009&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [10:01:07] ^^pls ignore [10:05:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp7009:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=magru%20prometheus/ops&var-instance=cp7009&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [10:11:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [10:25:40] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#11468080 (10ayounsi) We can set the rule now as non-paging to start collecting data and test it. So we can gain trust in it before... [10:26:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [11:04:08] 👋 I am trying to get some numbers for parsoid performance on idwiki using wmf.webrequest data. Is it safe to assume that wiki article reads are all in `webrequest_source='text'` and not upload ? [11:06:41] I think so [11:11:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [11:27:38] FIRING: [3x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [11:31:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [11:32:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [11:40:37] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#11468298 (10fgiunchedi) >>! In T384052#11462541, @cmooney wrote: > > https://grafana.wikimedia.org/goto/YOk1qBMDg > > In terms of... [11:41:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [11:57:58] hi vgutierrez, I asked above about https://gerrit.wikimedia.org/r/c/operations/puppet/+/1218817 and explained a bit what I'm trying to do with that - what do you think? [11:59:28] hi milimetric, v.gutierrez is on PTO currently, I can have a look at this asap [14:32:26] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#11468772 (10cmooney) >>! In T384052#11468080, @ayounsi wrote: > We can set the rule now as non-paging to start collecting data and... [14:41:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [14:43:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:48:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:51:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [14:56:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [15:00:57] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: switch refresh - https://phabricator.wikimedia.org/T408510#11468976 (10ayounsi) [15:01:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [15:04:01] thanks very much fabfur, like I was saying no rush, just didn't want it to get lost [15:31:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [15:48:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [15:53:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [15:54:04] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971 (10MatthewVernon) 03NEW [16:29:29] FIRING: HAProxyRestarted: HAProxy server restarted on cp7009:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=magru%20prometheus/ops&var-instance=cp7009&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [16:32:25] FIRING: SystemdUnitFailed: haproxy.service on cp7009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:32:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:34:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp7009:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=magru%20prometheus/ops&var-instance=cp7009&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [16:37:25] RESOLVED: SystemdUnitFailed: haproxy.service on cp7009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:37:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7009 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:45:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [17:06:07] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11469757 (10Papaul) Ticket 05304338 has been submitted with Nokia [17:10:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp7009 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7009 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [17:33:38] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11469916 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=4ac5ae06-34f5-425c-b0df-bc77a3758cd3) set by cmooney@cumin1003 for 2:00:0... [17:51:15] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11469994 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ec73e489-e95a-4824-ad67-a99943eae0e7) set by cmoone... [17:51:43] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11470001 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=98bc0d0a-c3e1-4862-b66a-e386322de608) set by cmoone... [18:15:18] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11470088 (10cmooney) >>! In T412733#11467826, @ayounsi wrote: > My guess is that SR-Linux < 25 doesn't have stats for mgmt0 (eit... [18:23:55] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11470106 (10cmooney) @papaul lswtest-d8-eqiad is upgraded to v25.10.1 now for you. {F71107154 width=500} [18:54:47] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: mr1-codfw: add second uplink to lsw1-a2-codfw - https://phabricator.wikimedia.org/T410717#11470232 (10Jhancock.wm) if we use 1G copper, we don't need to order anything. I can probably get it pre-ran tomorrow. Then papaul or I can conne... [19:27:00] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11470334 (10Papaul) We are seeing the same error on lswtest-d8 in eqiad ` in-error-packets 2466 `