[01:06:25] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:06:56] oh snap [01:06:58] lol, rate limit [01:07:11] reset-failed applied [01:11:25] RESOLVED: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:13:43] FIRING: [15x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:18:43] FIRING: [44x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [02:23:43] RESOLVED: [47x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [08:05:07] Hello, would 0815 UTc be a good time to roll this https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178834 out or would it be more appropriate to schedule it for later? [08:07:44] stevemunene: ack for me [08:15:02] Acl, fabfur One more question, for this specific setup, ie adding a new DC for an existing service `k8s-ingress-dse` does the line "you just need to change the state of your service to lvs_setup:" still apply? [08:15:59] not very expert about this, sorry I think the doc should be pretty updated but I summon vgutierrez too! [08:17:11] stevemunene: nope, you shouldn't do that or you'd disable monitoring for eqiad [08:19:04] and FWIW https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178834 should be a NOOP for LVS [08:19:30] you're just creating the data on conftool so you can pool servers for that service [08:22:42] Ack, so just a merge is required for this [08:35:04] indeed [08:35:18] merge it and pool the realservers [08:58:27] Thanks vgutierrez fabfur :) [08:58:38] no problem [08:58:38] 👍 [09:02:51] 10netops, 10Ganeti, 06Infrastructure-Foundations: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372 (10ayounsi) 03NEW [09:07:54] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11101245 (10ayounsi) vlan `sandbox1-b3-magru` deleted as well as its switch IRB. IP ranges converted to the `virtual-machines` role: https://netbox.wik... [09:12:09] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11101278 (10ayounsi) [09:12:47] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11101279 (10ayounsi) [09:12:53] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11101280 (10ayounsi) [10:33:20] 06Traffic, 10HaproxyKafka, 10Data-Platform-SRE (2025.08.16 - 2025.09.05), 13Patch-For-Review: Replicate current low-message alerting from VarnishKafka - https://phabricator.wikimedia.org/T391810#11101630 (10BTullis) [11:11:19] 06Traffic, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06MediaWiki-Platform-Team, and 5 others: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11101868 (10Bugreporter) >>! In T400881#11100318... [13:16:43] FIRING: [23x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [13:18:29] ^ yeah [13:18:47] I couldn't connect to enwiki for a second [13:21:43] FIRING: [40x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [13:26:43] RESOLVED: [42x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:30:42] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#11102470 (10Clement_Goubert) 05In progress→03Resolved [14:48:58] 06Traffic, 06SRE, 13Patch-For-Review: apt-staging: add headers to prevent CDN caching - https://phabricator.wikimedia.org/T402284#11102663 (10MoritzMuehlenhoff) [14:49:01] 10netops, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: magru: move sandbox vlan to routed Ganeti - https://phabricator.wikimedia.org/T402372#11102664 (10MoritzMuehlenhoff) p:05Triage→03Medium [16:38:26] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11103043 (10cmooney) @papaul just FYI the diagram you did is now not accessible for other users for some reason [18:34:43] FIRING: [10x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [18:39:31] esams [18:39:43] FIRING: [26x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [18:44:43] RESOLVED: [26x] HaproxyKafkaSocketDroppedMessages: Unexpected rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:11:20] 06Traffic, 10Beta-Cluster-Infrastructure: Beta cluster is slow. ISDN slow. - https://phabricator.wikimedia.org/T402430 (10AlexisJazz) 03NEW [19:11:45] oh hey ISDN [19:14:05] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling needed between cages to eqiad 2025/6 switch refresh - https://phabricator.wikimedia.org/T402432 (10cmooney) 03NEW p:05Triage→03Medium [19:14:26] 06Traffic, 10Beta-Cluster-Infrastructure: Beta cluster is slow. ISDN slow. - https://phabricator.wikimedia.org/T402430#11103654 (10AlexisJazz) [19:15:52] but ISDN is super-fast.... hook it to my veins! [19:18:26] ! [19:18:52] 06Traffic, 10Beta-Cluster-Infrastructure: Beta cluster is slow. ISDN slow. - https://phabricator.wikimedia.org/T402430#11103707 (10cmooney) I don't believe this is the beta cluster itself. ` root@nyc ~ # time wget "https://upload.wikimedia.beta.wmcloud.org/wikipedia/commons/7/75/Cute_grey_kitten.jpg" --2025-... [19:28:33] 06Traffic, 10Beta-Cluster-Infrastructure: Beta cluster is slow. ISDN slow. - https://phabricator.wikimedia.org/T402430#11103848 (10AlexisJazz) 05Open→03Invalid Looks like it's an issue with my provider. Using a VPN (without changing country) fixes it. [19:33:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling needed between cages to eqiad 2025/6 switch refresh - https://phabricator.wikimedia.org/T402432#11103889 (10cmooney) [19:38:15] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling needed between cages to eqiad 2025/6 switch refresh - https://phabricator.wikimedia.org/T402432#11103905 (10cmooney) Ok so I spoke to traffic and while they are close to ditching the need for L2 adjacency...