[06:12:47] 06Traffic, 06SRE: Move contact info detection at the edge to a lua module - https://phabricator.wikimedia.org/T414300 (10Joe) 03NEW [06:13:08] 06Traffic, 06SRE: Move contact info detection at the edge to a lua module - https://phabricator.wikimedia.org/T414300#11510940 (10Joe) p:05Triage→03High a:03Joe [06:47:12] 10netops, 06Infrastructure-Foundations, 06SRE: rancid: message has lines too long for transport - https://phabricator.wikimedia.org/T410606#11510983 (10ayounsi) Even more odd is that it flaps, even when no change is being done on the device. One mail it will remove it, another mail it will re-add it. Nothing... [08:42:44] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11511124 (10ayounsi) > Is sretest2003 the only one that shows this behavior, or do we have others? I am particularly interested in if you were able to set the... [09:00:14] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Remove pfw configuration related to former pybal/LVS service - https://phabricator.wikimedia.org/T414015#11511149 (10ayounsi) 05Open→03Resolved All done! [10:17:15] 06Traffic: upgrade to HAProxy 2.8.18 - https://phabricator.wikimedia.org/T414318 (10Vgutierrez) 03NEW [10:17:24] 06Traffic: upgrade to HAProxy 2.8.18 - https://phabricator.wikimedia.org/T414318#11511482 (10Vgutierrez) p:05Triage→03Medium [10:23:37] 10netops, 06Infrastructure-Foundations, 06SRE: Offline script - adjust to work with fundraising - https://phabricator.wikimedia.org/T414321 (10cmooney) 03NEW p:05Triage→03Medium [10:35:39] 06Traffic: upgrade to HAProxy 2.8.18 - https://phabricator.wikimedia.org/T414318#11511567 (10Vgutierrez) [10:52:44] 06Traffic, 06SRE: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11511679 (10Xqt) 05Resolved→03Open This is not solved yet for Pywikibot tests. A significant number of tests are still failing, and I have not been able to fin... [11:04:41] 06Traffic, 06SRE: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11511748 (10Xqt) [11:07:34] 06Traffic, 06SRE: All github action tests of Pywikibot fails due to 429 status code (TOO MANY REQUESTS) - https://phabricator.wikimedia.org/T414173#11511756 (10Xqt) >>! In T414173#11509876, @Benwing2 wrote: > i dunno why pywikibot is having issues with retry-after or why it's ending up as a float. My bot has... [12:11:51] 10netops, 06Infrastructure-Foundations, 06SRE: Offline script - adjust to work with fundraising - https://phabricator.wikimedia.org/T414321#11511995 (10cmooney) [12:59:05] Yo traffic peeps, we're working on a change to the rest-gateway that would make it return Retry-After on 503/504 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1224937 Do y'all have any opinions about that? [13:34:43] FIRING: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqiad&var-instance=cp1114&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [13:49:43] RESOLVED: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqiad&var-instance=cp1114&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [13:51:55] FIRING: [2x] MaxConntrack: Max conntrack at 81.11% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [13:55:19] someone is having some fun with ncredir [13:56:55] RESOLVED: [3x] MaxConntrack: Max conntrack at 81.11% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [13:57:25] FIRING: [2x] MaxConntrack: Max conntrack at 86% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [14:02:25] RESOLVED: [3x] MaxConntrack: Max conntrack at 86% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [14:16:45] we'd go ahead and upgrade the durum* nodes not in routed ganeti also to Bird 2.18, ok? [15:06:53] 06Traffic, 06serviceops, 07Epic, 05FY2025-26 KR 5.1, 07OKR-Work: Log rate limits from rest-gateway in webrequests - https://phabricator.wikimedia.org/T414349 (10Clement_Goubert) 03NEW [15:07:23] 06Traffic, 06serviceops, 07Epic, 05FY2025-26 KR 5.1, 07OKR-Work: Log rate limits from rest-gateway in webrequests - https://phabricator.wikimedia.org/T414349#11512639 (10Clement_Goubert) p:05Triage→03Medium [16:19:01] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons (iOS app by default hiding the Referrer header) - https://phabricator.wikimedia.org/T413570#11512977 (10TheDJ) I ran into this today, when right clicking to download an original with "Download links file as". Seems that doesn't send a r... [18:27:34] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11513615 (10Ragesoss) @Joe checking my Sentry logs, I see we're still getting 429 for some types of queries, including Commons API queries and fetching page content (... [20:17:34] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11514052 (10Jhancock.wm) @ssingh do you need assistance getting these reimaged? [20:36:55] FIRING: [2x] MaxConntrack: Max conntrack at 92.68% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:37:37] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11514147 (10TheDJ) >>! In T412971#11498230, @AntiCompositeNumber wrote: > Special:NewFiles doesn't appear to be as bad as it was a few ye... [20:41:55] RESOLVED: [2x] MaxConntrack: Max conntrack at 100% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:42:56] FIRING: MaxConntrack: Max conntrack at 90.76% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:47:56] RESOLVED: [3x] MaxConntrack: Max conntrack at 98.44% on ncredir3005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack