[13:09:13] folks you might be interested in a patch we have prepped to revert to kernel defaults for the size of the "ip fragment" buffers [13:09:14] https://gerrit.wikimedia.org/r/c/operations/puppet/+/992682 [13:09:35] we had been overriding the defaults to mitigate a vulnerability, but all our hosts are now patched for that [13:10:15] defaults to us seem large, but we figure simpler to go with them than to maintain the override, they won't cause us any problems [14:01:56] defaults seems correct to me in the meta view for now. No point randomly customizing without a solid rationale. [14:04:40] (VarnishHighThreadCount) firing: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:24:40] (VarnishHighThreadCount) resolved: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:09:41] topranks: maybe I misunderstood at first. will re-read ticket and comment there [16:10:13] bblack: our conclusion was to use defaults [16:11:25] 4MB is a lot of packets to be holding on to waiting for more fragments [16:11:46] the actual stats show us having nothing close in production so I don't think we need to worry or try to over-customize things [16:12:50] I guess I understand this backwards or something [16:13:08] 4MB would be the defaults, but we're sticking to the customization we had (~256K) for high thresh, right? [16:13:53] https://gerrit.wikimedia.org/r/c/operations/puppet/+/992680/1/modules/base/manifests/kernel.pp [16:18:52] sry confusion reigns [16:19:08] we chatted again and decided to revert to defaults. [16:19:16] that patch was abandoned [16:19:51] oh right ok [16:20:11] makes sense now, sorry :) [16:20:18] ah sorry I pasted wrong link [16:20:21] https://gerrit.wikimedia.org/r/c/operations/puppet/+/992682 [16:21:14] nah my bad [16:29:43] Does anyone know where the DNS config for cloudelastic.wikimedia.org lives? FWiW this domain doesn't go thru discovery. I can't find its records in the dns repo [16:30:18] inflatador: try netbox [16:30:35] https://netbox.wikimedia.org/ipam/ip-addresses/3500/ see DNS name here [16:31:56] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Move 40% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T355532 (10Clement_Goubert) [16:32:19] Thanks sukhe . Working on migrating this service to go thru discovery [16:32:34] inflatador: happy to help, ping us anytime [16:32:41] (for discovery even) [16:33:13] Awesome, thanks [16:36:22] inflatador: what are you planning to do? we don't have cloudelastic in codfw [16:37:18] taavi T355617 has more details, just migrating cloudelastic from public to private IPs [16:37:18] T355617: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 [16:37:31] 10Traffic: Improve ncmonitor's packaging - https://phabricator.wikimedia.org/T354988 (10CodeReviewBot) brett merged https://gitlab.wikimedia.org/repos/sre/ncmonitor/-/merge_requests/1 Packaging/repo improvements [16:37:59] we aren't touching codfw at all [16:38:34] https://phabricator.wikimedia.org/T355720#9483823 [16:39:00] the nodes are moving to private ips, the cloudelastic.wikimedia.org is not [16:39:05] correct [16:40:20] we're going to walk thru w/a canary and test VIP first, but I'm planning on adding to discovery, the same way we do query.wikidata.org for example [16:41:07] My understanding is if we go thru ATS, we'll have a frontend cert that matches *.wikimedia.org [16:41:15] but if that's not the case LMK, happy to adjust the plan [16:42:50] 10Acme-chief, 10Traffic, 10Patch-For-Review: Create automation for registered MarkMonitor DNS and acme-chief/ncredir - https://phabricator.wikimedia.org/T355189 (10CodeReviewBot) brett opened https://gitlab.wikimedia.org/repos/sre/ncmonitor/-/merge_requests/3 Add configuration, user-supplied conf file/path [16:49:38] 10Traffic: Improve ncmonitor's packaging - https://phabricator.wikimedia.org/T354988 (10BCornwall) 05In progress→03Resolved [16:49:41] 10Acme-chief, 10Traffic, 10Patch-For-Review: Create automation for registered MarkMonitor DNS and acme-chief/ncredir - https://phabricator.wikimedia.org/T355189 (10BCornwall) [16:50:16] 10Acme-chief, 10Traffic, 10Patch-For-Review: Create automation for registered MarkMonitor DNS and acme-chief/ncredir - https://phabricator.wikimedia.org/T355189 (10BCornwall) [16:50:29] 10Acme-chief, 10Traffic, 10Patch-For-Review: Puppetize deployment of ncmonitor - https://phabricator.wikimedia.org/T355190 (10BCornwall) 05Open→03In progress p:05Triage→03Medium [16:57:38] (LVSRealserverMSS) firing: (4) Unexpected MSS value on 185.15.59.226:443 @ ncredir3004 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=esams&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:59:55] yikes [17:02:40] (LVSRealserverMSS) resolved: (4) Unexpected MSS value on 185.15.59.226:443 @ ncredir3004 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=esams&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [17:58:12] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10cmooney) [18:03:48] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui) @cmooney will you issue a downtime before the maintenance for each host? [20:20:02] 10Traffic, 10Automoderator, 10Data Products, 10Product-Analytics, and 2 others: Add revision ID to X-Analytics header - https://phabricator.wikimedia.org/T346350 (10Dogu) a:03Dogu [21:02:59] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10hashar) + @jnuche from release engineering who knows even more about Jenkins than me :-) `contint2002` hosts...