[04:32:25] 10Traffic, 10SRE, 10SRE-swift-storage, 10Thumbor: Cache thumbs in our caching infrastructure (e.g. ATS) - https://phabricator.wikimedia.org/T345334 (10Midleading) 05Open→03Stalled Thumbor is currently heavily overloaded (T337649). As a result, traffic to thumbor should be reduced as much as possible un... [07:44:38] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 2 others: Update puppet's topology.kubernetes.io/zone logic to take into account the new setup - https://phabricator.wikimedia.org/T352893 (10ayounsi) Nice !! The v6 one is probably just a fluke, we should investigate it only if it happ... [09:46:40] (VarnishHighThreadCount) firing: (8) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:51:21] btullis: I'm off today, please ping fabfur to deploy the revert after the ongoing incident has been solved [09:51:39] 👍 [09:51:40] (VarnishHighThreadCount) firing: (10) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:51:41] vgutierrez: Will do, thanks. [09:54:12] fabfur: I'm checking with aqu and joal whether they are ready for the revert to be deployed immediately. Will get back to you asap. [09:56:40] (VarnishHighThreadCount) firing: (11) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:58:07] fabfur: The revert is ready to be deployed, whenever is convenient for you: https://gerrit.wikimedia.org/r/c/operations/puppet/+/991563 - Thanks for your help. [10:00:32] btullis: looks good to me, but I'll wait a bit because we're investigating also a traffic spike (see security) [10:01:17] OK, thanks. This can wait for that to be resolved. [10:01:40] (VarnishHighThreadCount) firing: (11) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:03:22] btullis: ack, whenever you want [10:05:08] fabfur: ok, shall I submit and puppet-merge, or would you rather drive it? [10:06:16] usually I always thought that if you merge on gerrit you also take care of do the merge on puppetmaster, but if you want I can do it, no prob at all! [10:06:40] (VarnishHighThreadCount) firing: (11) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:09:51] Yes, that's what I normally do too, but I just wanted to check whether or not you have any special procedures in place for deploying varnish changes. I've submitted and merged. [10:10:25] I think is fine, do you already have something to monitor the change? [10:11:06] Btullis for a revert like this one nothing crazy.. if it's a new feature we usually disable puppet on A:cp and let puppet run on a single host per cluster first [10:11:40] (VarnishHighThreadCount) firing: (9) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [10:11:56] vgutierrez: Ack, thanks. [10:16:40] (VarnishHighThreadCount) resolved: (7) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:36:14] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate IP gateway for public1-a-codfw to spine switches - https://phabricator.wikimedia.org/T351532 (10cmooney) p:05Medium→03Low Going to delay this for now. We have enough disruptive changes planned not to burden wider SRE with this one in the next few we... [12:36:18] 10netops, 10Infrastructure-Foundations, 10SRE: Migrate IP gateway for private1-b-codfw to spine switches - https://phabricator.wikimedia.org/T351534 (10cmooney) p:05Triage→03Low Going to delay this for now. We have enough disruptive changes planned not to burden wider SRE with this one in the next few w... [12:54:15] 10Traffic, 10SRE, 10SRE-swift-storage, 10Thumbor: Cache thumbs in our caching infrastructure (e.g. ATS) - https://phabricator.wikimedia.org/T345334 (10hnowlan) >>! In T345334#9471632, @Midleading wrote: > Thumbor is currently heavily overloaded (T337649). As a result, traffic to thumbor should be reduced a... [12:58:21] 10Traffic, 10SRE, 10SRE-swift-storage, 10Thumbor: Cache thumbs in our caching infrastructure (e.g. ATS) - https://phabricator.wikimedia.org/T345334 (10taavi) 05Stalled→03Open [13:46:00] 10netops, 10Infrastructure-Foundations, 10SRE: Codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [13:46:06] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Add new codfw private vlan sub-interfaces to lvs2013 and lvs2014 - https://phabricator.wikimedia.org/T348225 (10cmooney) 05Open→03Resolved Done under {{T348218}} [18:12:50] thanks v.gutierriez and b.tullis and f.abfur for rewinding the deployment to buy time for the fix to downstream processing. [18:28:12] yw! [19:14:33] 10Traffic: Synchronize and rotate TCP Fastopen keys for various use-cases - https://phabricator.wikimedia.org/T355446 (10BBlack) p:05Triage→03Medium [19:17:05] 10Traffic: Synchronize and rotate TCP Fastopen keys for various use-cases - https://phabricator.wikimedia.org/T355446 (10BBlack) We discussed this in #Traffic earlier this week, and I ended up implementing what I think is a reasonable solution already, so now I've made this ticket for the paper trail and to cove...