[09:56:29] 06Traffic, 06Infrastructure-Foundations: Q4:magru VM tracking task - https://phabricator.wikimedia.org/T364016#9773896 (10MoritzMuehlenhoff) [13:02:56] there are some initial results that the BR datacenter is better for some interesting locations, like `SH` (St Helena, Ascension and Tristan da Cunha) [13:05:32] sample size 4, and that's also a case where I'd expect subdivision to matter a *lot* if it is accurate, but still [13:07:45] ah the islands, interesting! [13:08:00] personally I had forgotten all about them, given the focus on the main contintent in a way [13:08:38] well -- better than eqiad, which in retrospect I don't think we're using for SH [13:09:49] yeah, it would be esams by default [13:10:13] anyway I'll generalize what I have a bit and I'll look for cases where magru is better than whatever the current mapping is [13:10:58] > generic-map 91.232.198.4 [13:10:58] generic-map => 91.232.198.4/20 => esams, drmrs, eqiad, codfw, ulsfo, eqsin, magru [13:11:11] yeah, I don't know but I was expecting eqiad [13:11:22] it's part of AF continent [13:11:29] and if you look at submarinecablemap it should be for sure [13:11:40] that being said, I totally believe that magru is better than eqiad and should be higher in that list [13:12:12] the list as it exists right now in a way is just simply appending magru to everything, except setting prepending it for the magru subnets [13:12:16] yeah [13:13:18] https://phabricator.wikimedia.org/F50517022 [13:14:01] this incorporates all data since we turned on magru measurements [13:14:09] the sample sizes for each country will also give you an idea of the magnitude of userbase there [13:14:32] nice! [13:14:40] FK == Falkland ? [13:14:43] fabfur: yes [13:14:49] and a very small sample size ofc [13:14:49] interesting [13:14:57] there are still some interesting cases, like in AR magru is mostly faster, except when it isn't [13:15:03] same for PE [13:16:02] but in BR and UA, clearly a win [13:16:36] I guess for close cases such as PE, we go down the subdivision route? [13:17:10] because maybe North PE vs South PE is a factor in that (given how CO is better routed to eqiad for example) [13:20:28] the subdivision data provided by maxmind has a lot lower accuracy than the country data (outside of US) [13:21:42] https://i.imgur.com/98r6P1t.png [13:21:44] and the same issue :) [13:22:18] I'm hopeful that adding more transits and peerings will help [13:23:04] no updates on that yet [13:23:26] at least on the transit (IX.br is another story :) [14:12:01] 06Traffic, 06Infrastructure-Foundations, 13Patch-For-Review: Q4:magru VM tracking task - https://phabricator.wikimedia.org/T364016#9774471 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by filippo@cumin1002 for hosts: `prometheus7001.magru.wmnet` - prometheus7001.magru.wmnet (**WARN**) -... [14:25:58] 06Traffic, 13Patch-For-Review: replace mtail with benthos on ncredir instances - https://phabricator.wikimedia.org/T362776#9774490 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [14:27:17] 06Traffic, 06Data-Platform-SRE, 06serviceops: Investigate why pools.json does not match https://config-master.wikimedia.org/pybal/${datacenter}/${service} T363702 - https://phabricator.wikimedia.org/T364037#9774498 (10Volans) Fixing tags [14:45:06] 06Traffic, 06Infrastructure-Foundations, 13Patch-For-Review: Q4:magru VM tracking task - https://phabricator.wikimedia.org/T364016#9774569 (10fgiunchedi) I've tried installing `prometheus7001` today with help from @Muehlenhoff although there's no console and some pxe/tftp interaction with install7001 is susp... [14:49:57] sukhe: FYI this too needs to be updated for magru, I'll leave it to you when it's a good time for it ;) [14:50:00] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/cumin/aliases.yaml.erb#6 [14:50:48] thanks, will do shortly and walk through the entire list [14:53:21] 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#9774609 (10KOfori) 05Open→03Stalled p:05Medium→03Triage a:05KOfori→03OSefu-WMF We're going to put this on hold for now. [14:53:22] i can take that [14:53:28] fabfur: thanks please do [14:53:43] please check for all Traffic services [14:53:50] vgutierrez: FYI re: ncredir I noticed the prometheus job 'ncredir' is down, which makes sense now since there's no mtail anymore, the job should be removed from prometheus/ops [14:54:46] godog: will do, thanks for the reminder [14:54:48] for services in all DCs you can use the each loop [14:54:56] without having to do them one by one [14:55:06] vgutierrez: sure np [14:55:23] volans: fabfur: yeah, lvs-* etc, better to do site [14:55:37] as in, do the loop we are doing over in other places [14:57:08] ack [15:01:02] should be as simple as https://gerrit.wikimedia.org/r/c/operations/puppet/+/1028523 [15:10:41] fabfur: ship it! [15:10:49] https://puppet-compiler.wmflabs.org/output/1028523/2287/cumin1002.eqiad.wmnet/index.html [15:10:58] 👍 [15:35:22] 06Traffic, 06Data-Platform-SRE, 13Patch-For-Review: LVS hosts: Monitor/alert on when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9774796 (10CodeReviewBot) bking opened https://gitlab.wikimedia.org/repos/search-platform/sre/lvs_l2_checker/-/merge_requests/1 LVS: moni... [15:35:55] sukhe vgutierrez topranks I've got the basic skeleton of an L2 pybal check script, would appreciate y'all's feedback if you have time. https://gitlab.wikimedia.org/repos/search-platform/sre/lvs_l2_checker . Created an empty MR if you wanna discuss there: https://gitlab.wikimedia.org/repos/search-platform/sre/lvs_l2_checker/-/merge_requests/1 [15:37:19] inflatador: ok, thanks, will check and discuss there! [15:37:57] already found the dreaded 'FDQN' typo ;P [15:41:45] I pulled a copy of pybal.conf onto my homedir cumin2002 (no secrets in the file) and I've been testing from there [15:43:31] btw -- just realized I was overcounting sample sizes as shown on the graph -- what was shown was the total number of points for that origin for ALL target datacenters, not just the current one -- so multiplied by 4 for most of these charts [17:33:24] 06Traffic, 06DC-Ops, 10ops-magru: Q4:rack/setup/install cp70[01-16] - https://phabricator.wikimedia.org/T362729#9775267 (10RobH) 05Open→03Resolved a:03RobH [17:33:33] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9775264 (10RobH) 05Open→03Resolved a:03RobH [19:05:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9775442 (10Papaul) [19:58:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9775697 (10Papaul)