[01:25:35] 10Traffic, 10DNS, 10Patch-For-Review: add wikimedia.social to WMF DNS (was: Update DNS records for mastodon.wikimedia.org) - https://phabricator.wikimedia.org/T337586 (10ssingh) 05Open→03Resolved ` dig wikimedia.social NS +short ns1.wikimedia.org. ns0.wikimedia.org. ns2.wikimedia.org. ` ` dig wikimedia.... [05:10:34] 10Traffic, 10DNS, 10Patch-For-Review: add wikimedia.social to WMF DNS (was: Update DNS records for mastodon.wikimedia.org) - https://phabricator.wikimedia.org/T337586 (10Dzahn) Thanks a lot @ssingh :) [05:17:44] (VarnishHighThreadCount) firing: (3) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [05:22:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [05:27:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [05:32:44] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:29:07] 10Traffic, 10Anti-Harassment, 10Data-Engineering, 10SRE, and 2 others: Include User-Agent Client Hints in WebRequest logs - https://phabricator.wikimedia.org/T337947 (10kostajh) [08:08:07] 10Traffic, 10SRE, 10Patch-For-Review: port 80 paging on scheduled single host maintenance in text@esams - https://phabricator.wikimedia.org/T339898 (10Vgutierrez) `counterexample vgutierrez@prometheus3002:~$ curl -w @- -o /dev/null --resolve www.wikipedia.org:80:91.198.174.192 -s http://www.wikipedia.org <<'... [09:32:13] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:36:51] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:41:45] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [09:46:45] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:04:07] (VarnishHighThreadCount) firing: (3) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:09:16] (VarnishHighThreadCount) firing: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:12:44] (VarnishHighThreadCount) firing: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:17:45] (VarnishHighThreadCount) resolved: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:39:51] hello folks [12:40:01] if you are ok I'd roll out pki to vk instances in codfw [12:40:14] https://gerrit.wikimedia.org/r/c/operations/puppet/+/931498 [12:41:30] elukey: please go ahead [12:42:00] vgutierrez: mmm role::cache::text for codfw in hiera disappeared :D [12:42:29] elukey: yeah, it was consolidated [12:42:37] cause right now it's applied globally [12:42:41] so no need to tell between DCs [12:43:10] you can blame fabfur [12:43:20] that's https://gerrit.wikimedia.org/r/c/operations/puppet/+/929989 [12:43:33] ahhhh [12:43:53] can I recreate it, just to support the migration? [12:43:57] of course [12:46:30] thanks! patch updated, running pcc again [12:49:41] vgutierrez: when you have a moment, could you double check https://gerrit.wikimedia.org/r/c/operations/puppet/+/931498 ? [12:49:50] it is trivial but better safe than sorry [12:50:48] <3 [12:57:23] interesting, I didn't expect so much difference between eqiad and codfw for vk traffic [12:57:51] I assumed that frontend traffic hitting codfw was more [13:07:18] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:12:14] (VarnishHighThreadCount) firing: (11) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:15:07] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10SRE, 10serviceops: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) [13:15:45] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10SRE, 10serviceops: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) p:05Triage→03High [13:16:44] (VarnishHighThreadCount) firing: (12) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:16:55] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10SRE, 10serviceops: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) [13:17:55] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10SRE, 10serviceops: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) [13:19:45] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10RESTbase Sunsetting, and 2 others: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) [13:21:44] (VarnishHighThreadCount) firing: (13) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:25:25] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: decide on an aggregation function to combine multiple probes into a single measurement - https://phabricator.wikimedia.org/T337318 (10JameelKaisar) You can see Box Plots and Violin Plots of the per-country latency data here: ` stat1009.e... [13:26:44] (VarnishHighThreadCount) firing: (13) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:29:27] all codfw vk instances migrated! [13:31:44] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:36:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:38:33] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10RESTbase Sunsetting, and 2 others: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10akosiaris) So, we need something to identify those users. wikiwand, if I understand the usage of the MC... [13:41:44] (VarnishHighThreadCount) resolved: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:45:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:52:58] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10RESTbase Sunsetting, and 2 others: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10MSantos) >>! In T340036#8952804, @akosiaris wrote: > So, we need something to identify those users. wik... [13:55:29] 10Traffic, 10Mobile-Content-Service, 10Product-Infrastructure-Team-Backlog-Deprecated, 10RESTbase Sunsetting, and 2 others: Setup allowed list for MCS decom - https://phabricator.wikimedia.org/T340036 (10akosiaris) >>! In T340036#8952836, @MSantos wrote: > Sounds great to me, no objections. Cool. Do we ha... [13:55:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:00:45] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:02:07] any work going on in codfw we just got a nel report for tcp resets from text-lb.codfw.wikimedia.org (allrady seems to be recovering) [14:05:45] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:06:58] jbond: hmmm not besides elukey migrating varnishkafka [14:07:34] yep exactly but completed half an hour ago (and touching only varnishkafka) [14:07:35] vgutierrez: ack it was only a small peak so could equally be some isp rebooting a cgnat or somethinfg [14:07:37] but I'm guessing that PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS1299/IPv4: Connect - Telia https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status could have triggered that [14:07:53] yep --^ [14:07:57] ahh yes thanks [14:49:15] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [15:01:32] hey vgutierrez - could you give https://gerrit.wikimedia.org/r/c/operations/puppet/+/929674 another look when you have time please? [15:01:46] yeah.. I was checking it [15:03:05] https://www.irccloud.com/pastebin/amDydVKI/ [15:03:16] endpoint looks healthy now in terms of DNS and TLS [15:05:15] vgutierrez: You running pcc or shall I? [15:05:25] against..? :) [15:05:32] https://gerrit.wikimedia.org/r/c/operations/puppet/+/931947/5 [15:05:59] brett: I did it for PS4 and PS5 is just a rebase on top of the previous change [15:06:56] but 4 → 5 seems to be an underscore fix? [15:07:20] It seems I'm missing something, I'll get out of your way [15:08:45] brett: nope, you're right [15:08:57] rerunning right now :) [15:15:27] hnowlan: happy to help you moving forward with that CR [15:15:50] vgutierrez: thank you! I am PTO for the next 2 days so I might hold off on merging it until Monday if that's okay [15:16:08] hnowlan: perfect [19:34:05] 10Traffic, 10Infrastructure-Foundations, 10SRE: decide on an aggregation function to combine multiple probes into a single measurement - https://phabricator.wikimedia.org/T337318 (10JameelKaisar) Update: - Instead of trimming bottom 10 %, we are trimming bottom 5 % only. - We are plotting Box plots as we...