[00:03:44] A friend told me that today making requests using Python 2's `urllib` to the MW API became very slow, but using curl or urllib2 are fine [00:03:57] Not sure if it's related to some of the frontent cache changes or not [00:04:24] But just wanted to note that since it was weird/hard to figure out [00:04:58] (by very slow I mean 20s for a request that responds near instantly otherwise) [00:20:30] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10BBlack) I can fill in the scenario/story part a bit! For background: * Technically, LVS and pybal are separate things running on the same server. L... [06:56:56] (EdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [07:01:56] (EdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [08:51:22] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10akosiaris) >>! In T300877#7677259, @BBlack wrote: > I can fill in the scenario/story part a bit! For background: > > * Without static routes, if pyb... [08:55:44] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10Volans) For the human-generated part that seems easy to prevent automating the process via a cookbook that can have all the checks and fail safes need... [09:01:14] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10akosiaris) >>! In T300877#7677666, @Volans wrote: > For the human-generated part that seems easy to prevent automating the process via a cookbook that... [09:16:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, and 3 others: Investigate Capirca - https://phabricator.wikimedia.org/T273865 (10ayounsi) 05In progress→03Resolved We're now using Capirca to manage most of our router ACLs [09:22:44] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877 (10akosiaris) Re-reading my reply, I realized I may appear pro having those static routes (I am actually not) whereas my intent was to just provide a dat... [09:23:00] Hi traffic - I'd like to add a low-traffic, non-paging LVS service (https://gerrit.wikimedia.org/r/c/operations/puppet/+/759260) if you have on objections [09:30:32] on a lovely friday? ;P [09:30:35] morning jayme [09:30:54] vgutierrez: morning :) [09:31:25] I can as well wait till monday if you think that's better [09:31:58] hmmm [09:32:23] so you're configuring LVS for port 30443 but healthchecking 30080? [09:32:36] 30443 is still the port of the service right? [09:33:17] Yes. 30080 is the same service just without TLS [09:34:21] As the service does not respond to it's discovery name, I need to check a different port/come up with a way to have it respong to discovery [09:35:12] ultimately it will respond (via 30443) to bunch of different FQDN's (SNI) - it's disvocery name is more like a placeholder [09:43:27] ok [09:44:57] I plan to come up with something more straight forward/understandable regarding the probes :) [09:54:15] hmmm which cluster should respond to port 30443? [09:54:23] kubernetes1001 is timing out for me from lvs1015 [09:55:23] ouch, kubernetes-staging :) [09:55:56] vgutierrez@lvs1015:~$ nc -w3 -zv kubestage1003.eqiad.wmnet 30443 [09:55:56] nc: connect to kubestage1003.eqiad.wmnet port 30443 (tcp) timed out: Operation now in progress [09:55:56] nc: connect to kubestage1003.eqiad.wmnet port 30443 (tcp) failed: Connection refused [09:56:11] hmm IPv6 timeouts and IPv4 gets refused (or the other way around)? [10:12:48] legoktm: ^^ I've found the very same 20s issue [10:18:06] hum..you got me there vgutierrez. Double checking ... in codfw this works, though [10:19:49] 10Traffic, 10SRE: Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required') - https://phabricator.wikimedia.org/T300366 (10Vgutierrez) so... right now HTTP/1.0 requests from PHP 7.3 are technically working but there is some obvious issue as those requests are really slow.... [10:40:43] !log rebalancing row A in ganeti/eqiad, all nodes of that row are now running Buster T296721 [10:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:48] T296721: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721 [10:44:50] vgutierrez: I obviously refrain from adding the LVS service today. Will take a step back and do my homework - thanks for checking [10:45:17] jayme: no problem, let me know if you need anything else from our side [10:45:55] will ping you again when I think that I DTRT next time :D [10:58:05] 10Traffic, 10SRE: Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required') - https://phabricator.wikimedia.org/T300366 (10Vgutierrez) the described issue has been reported to upstream on https://github.com/envoyproxy/envoy/issues/19821 [11:02:44] 10Traffic, 10SRE, 10Upstream: Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required') - https://phabricator.wikimedia.org/T300366 (10Aklapper) [14:50:27] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Ottomata) [15:57:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Create Generalised blocking strategy - https://phabricator.wikimedia.org/T270618 (10MoritzMuehlenhoff) > I think we could also take the decision to no bother with this additional complexity and take the stance that if so... [16:19:09] 10Traffic, 10SRE, 10Upstream: Problem loading thumbnail images due to Envoy (HTTP/1.0 clients getting '426 Upgrade Required') - https://phabricator.wikimedia.org/T300366 (10Vgutierrez) this no longers seems to be related to HTTP/1.0 as the following code also triggers the issue: `lang=php 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) Thanks for creating this task Andrew, Just wanted to copy paste the following from the parent task in-case there... [17:42:55] vgutierrez: yay, but also :( [17:43:02] thanks for looking into it [17:43:53] That will be considered into our evaluation of course