[03:17:53] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 2 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) >>! In T279664#8122731, @Joe wrote: > Do we expect that to happen regularly on a high percentage of requests? If 17% of all requests need to make... [03:47:18] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 2 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) >>! In T279664#8123041, @MatthewVernon wrote: > Without that, I'm not sure what we can do to work around the fact that MW doesn't reliably write/d... [07:25:53] brett, bblack, vgutierrez: https://gerrit.wikimedia.org/r/c/operations/dns/+/816028 has been merged but not deployed and it's blocking a change I'm deploying. Should I deploy it or revert it so you can take care of it? [07:35:09] I reverted it as it's better to have someone from Traffic around in case there is any issue with deploying such change [09:01:36] XioNoX: we're bringing some bgp sessions (k8s <-> core routers) down again in codfw due to shutdown of k8s nodes. As suggested yesterday I create a downtime for BGP Status check in icinga until end of maintenance, 21:00Z today. Are you fine with that? [09:02:18] jelto: yep! [09:02:26] thanks for the head's up [09:08:44] 19:00Z / UTC, sorry. I was confusing time zones :) [10:19:26] <_joe_> hello traffic folks! I have a couple things for you! [10:20:09] <_joe_> 1) I'd like to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/819510, it had a +1 from valentin, it passes the tests I added. Is it ok for me to merge it? [10:21:37] <_joe_> 2) Tomorrow night (my time, 16:30 UTC) we have a scheduled maintenance on codfw rack D8, which means conf2006 will be rebooted. We'll need to move pybals away from it ahead of time [10:21:37] D8: Add basic .arclint that will handle pep8 and pylint checks - https://phabricator.wikimedia.org/D8 [10:21:53] <_joe_> and move some of them back to conf2004 too [10:22:13] <_joe_> I can prepare the patches, but I'd appreciated if some of you would take care of the pybal restarts [11:01:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10ayounsi) [13:29:40] _joe_: 1) since valentin has +1, please feel free to, 2) happy to take care of it! [13:30:08] +1ed [13:30:38] <_joe_> sukhe: thanks [15:47:22] XioNoX: Ugh, sorry about that, for some reason I thought it was auto-deployed [15:49:06] brett: let's talk about deploying it [16:04:08] Hey, I'm noticing that the nsX.wikimedia.org servers do not have their fingerprints published on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints - Any issues with including them there? [16:07:36] brett: the nsX names/addresses are virtual addresses, the actual servers (dns*/authdns*) they're currently routed to are listed on the config-master.wikimedia.org fingerprint lists [16:08:57] aha, thanks taavi, that's very helpful [16:11:26] for posterity, https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints#Collecting_or_updating_fingerprints [17:29:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [17:39:56] (HAProxyEdgeTrafficDrop) resolved: 60% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:00:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 56.301316044535994% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [19:00:56] (HAProxyEdgeTrafficDrop) firing: 52% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [19:05:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in esams has dropped 48.430847481972485% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [19:05:56] (HAProxyEdgeTrafficDrop) resolved: 54% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [22:15:18] bblack: I'm increasingly fond of the idea of having querysort put 'title=' into the first position, and thus match the canonical URLs that MediaWiki has been generating since time immemorial. It's a lot simpler to roll out -- no change to MW, no increase in the volume of PURGEs to account for a new canonical form, and fewer cache misses during the rollout since the normalized form will [22:15:20] match the most popular form [22:15:29] WDYT? [22:21:35] what I don't like about it is that it makes title a special case, and "Special cases aren't special enough to break the rules."