[06:29:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [06:39:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [06:50:39] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [07:05:39] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:47:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [08:57:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [10:30:12] 10netops, 06Infrastructure-Foundations, 06SRE: Manage VRRP priority from Netbox - https://phabricator.wikimedia.org/T381873#10478784 (10cmooney) 05Open→03Resolved a:03cmooney This is all complete and I've set priorities in Netbox to balance traffic from the 4 legacy rows in eqiad across the CRs there. [10:44:56] 10netops, 06Infrastructure-Foundations, 06SRE: Improve Eqiad outbound traffic balance - https://phabricator.wikimedia.org/T384253#10478825 (10cmooney) FWIW I have made the same change in codfw for routes learnt from eqord (Chicago). Locally-learnt routes will now be preferred unless the AS-Path from Chicago... [11:15:20] 10netops, 06Infrastructure-Foundations, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288 (10cmooney) 03NEW p:05Triage→03Medium [11:15:36] 10netops, 06Infrastructure-Foundations, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10478967 (10cmooney) [11:16:42] 10netops, 06Infrastructure-Foundations, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10478971 (10cmooney) [11:54:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [11:59:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [12:11:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [12:15:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp5025:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5025 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:20:00] FIRING: [16x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:26:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [12:52:47] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10479262 (10cmooney) So looking at a specific peer - 2620:0:863:1:198:35:26:6 on cr4-ulsfo - I can see the SNMP 'index... [13:00:00] FIRING: [22x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:05:00] RESOLVED: [29x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:39:23] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10479515 (10cmooney) >>! In T384258#10477783, @ssingh wrote: > Might be a red herring: The only thing I see that might... [14:11:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:21:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:44:34] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10479889 (10Volans) If I understand the db structure correctly that should convert into this query: ` select * from b... [14:45:36] 10netops, 06Infrastructure-Foundations, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10479894 (10RobH) @cmooney, I'm updating the order task, but this was delivered in December so I can open a remote hands to get it fixed. Do we need to schedule th... [15:23:37] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10480120 (10cmooney) Thanks @volans you have helped me a lot with this and given me confidence to look at the DB. I s... [15:27:26] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on schoolwiki.in - https://phabricator.wikimedia.org/T383210#10480125 (10jcrespo) I believe this is something to be handled by #traffic at varnish level, more than a maps task. Is this something you handle (I am not familiar with the process) @Vgutierrez @... [15:28:01] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on schoolwiki.in - https://phabricator.wikimedia.org/T383210#10480128 (10jcrespo) p:05Triage→03High [15:29:34] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on schoolwiki.in - https://phabricator.wikimedia.org/T383210#10480136 (10ssingh) Thanks @jcrespo; Traffic will take care of it. @MSantos: This requires your approval before we can continue. Thanks. [15:31:58] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on schoolwiki.in - https://phabricator.wikimedia.org/T383210#10480174 (10ssingh) a:03ssingh [15:35:19] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10480208 (10cmooney) It also appears we are getting values populated for AcceptedPrefixes for IPv6 peers for some devi... [15:40:34] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, 06serviceops: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10480245 (10hnowlan) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/973362 has been merged (wrong ticket tagged in... [15:42:30] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480253 (10jcrespo) I believe authentication on blusky happens through DNS. Adding #DNS and #Traffic for awareness. I can handle this, as we did it to authenticate the search engines consoles. @LPasqual... [15:44:17] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480259 (10jcrespo) p:05Triage→03Medium [15:44:20] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480260 (10jcrespo) a:03jcrespo [15:53:28] greetings traffic, I'd like to roll out a patch to introduce a new ATS Lua script for (https://gerrit.wikimedia.org/r/1082581) for the upcoming migration to PHP 8.1. [15:53:28] tl;dr - the script should do nothing, as it's behavior by the presence of a cookie that nothing yet sets. [15:53:28] my plan would be to disable puppet on all cache-text, pilot on a single host, and then apply globally (incrementally). [15:53:28] any objections my doing that today or preferences on timing? perhaps the 18:00 UTC hour? [15:54:01] swfrench-wmf: that works, thanks [15:54:14] the NA folks will be around (brett, myself) [15:55:04] sukhe: awesome, thank you! [15:56:26] sukhe: the last time I rolled out changes like these, the recommendation was to `run-puppet-agent` on the rest of the fleet (`cumin -b11`), rather than simply reenable puppet. is that still the case? [15:56:46] (the benefit being that it would make issues applying obvious vs. letting the puppet timer run into them) [15:57:24] swfrench-wmf: yes please. the idea is that if something goes wrong, you will see it during that immediate cumin run [15:58:11] sukhe: great, thanks for confirming [15:59:09] I'll follow up here as the hour approaches before switching to -operations [15:59:44] thanks <3 [16:00:54] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480336 (10LPasqual_WMF) Thank you for such a quick reply, @jcrespo. Here's the info you requested: Host: _atproto Type: TXT Value: did=did:plc:plla3i7zproko3ekdnkoykhe And a screenshot, just in case: {... [16:14:08] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10480398 (10cmooney) Running the poller manually on netmon1003 I can also see it's getting the right value back, but i... [16:19:30] 06Traffic, 10DNS, 06SRE, 13Patch-For-Review: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480414 (10jcrespo) @LPasqual_WMF The deploy for `@wikipedia.org` should already be working, but don't be surprised if you get an error (there could be ~5 minutes of cache), if it ha... [16:33:05] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480451 (10LPasqual_WMF) @jcrespo Happy to say it is already working! [[ https://bsky.app/profile/wikipedia.org | @wikipedia.org ]] is live. Thanks, Jaime and team. I'll follow up with a separate ticket... [16:36:17] 06Traffic, 10DNS, 06SRE: Verify Wikipedia's Bluesky account - https://phabricator.wikimedia.org/T384332#10480475 (10jcrespo) 05Open→03Resolved [16:39:57] 06Traffic, 06SRE, 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: Refine add_is_wmf_domain TransformFunction fails if no source field exists - https://phabricator.wikimedia.org/T383914#10480486 (10Ahoelzl) [17:09:35] 10netops, 06Infrastructure-Foundations, 06SRE: Configure gnmic to collect data from routers at network pops - https://phabricator.wikimedia.org/T384345 (10cmooney) 03NEW p:05Triage→03Medium [17:24:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [17:37:53] 10netops, 06Infrastructure-Foundations, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10480844 (10cmooney) >>! In T384288#10479894, @RobH wrote: > I'm assuming we need to schedule it, and we should give them a couple days notice if we want a set sched... [17:49:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [18:00:46] sukhe: FYI, getting started on this now [18:00:58] swfrench-wmf: noted! gl [18:01:11] thanks :) [18:35:48] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10481199 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=d0f01fc7-5a29-49c5-8292-aebad021ff73) set by cmooney@cumin1002 for 2:00:00 on 1 host(s) and th... [18:44:27] swfrench-wmf: going good I am guessing? [18:45:02] sukhe: yup, thank you! on the last batch of running / reenabling the agent :) [18:45:34] the timestamp format in the trafficserver error logs is amazing ... e.g., `20250121.15h26m48s` [18:46:08] fancy :D [18:46:31] I don't even know what one might call that, lol [18:50:44] sukhe: all done for real now [18:51:03] thanks for running it! [18:51:16] no problem at all! :) [18:51:31] 06Traffic, 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Experimentation Lab Radar: Cookie % has been rejected because it is foreign and does not have the "Partitioned" attribute - https://phabricator.wikimedia.org/T375256#10481296 (10Ahoelzl) [18:51:38] 06Traffic, 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work, 10Experimentation Lab Radar: Cookie % has been rejected because it is foreign and does not have the "Partitioned" attribute - https://phabricator.wikimedia.org/T375256#10481297 (10Ottomata) > @mforns to confirm whether this... [18:52:11] 06Traffic, 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work, 10Experimentation Lab Radar: Cookie % has been rejected because it is foreign and does not have the "Partitioned" attribute - https://phabricator.wikimedia.org/T375256#10481301 (10Ahoelzl) a:03mforns [19:34:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [19:39:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [19:43:52] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10481570 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=26b7dbb9-1906-4b10-a433-cc2ffb6bdb61) set by cmooney@cumin1002 for 2:00:00 on 1 host(s) and th... [22:39:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [22:44:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [22:52:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:07:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:20:49] denisse: What's the status of lvs2013 and thanos-swift's hammering of it? [23:20:59] Having a hard time finding the phab task [23:22:27] brett: I'm unsure, let me check. [23:35:58] brett: https://phabricator.wikimedia.org/T383147 [23:37:18] thanks!