[09:21:30] interesting link [09:24:32] <_joe_> Emperor: yeah I considered moving mediawiki's container to use non-burstable configs and enable the intel cpu pinning, that article is pretty thorough and has some stuff I didn't know [15:32:32] a SRE around to supervise a deploy (see _security)? [15:43:13] (resolved) [17:25:25] Both of the hiera keys, profile::openstack::{codfw1dev,eqiad1}::cloudgw::dmz_cidr, have entries for sodium.wikimedia.org, does anyone know if those are necessary? majavah, was wondering if I needed to add the new mirror, mirror1001.wikimedia.org to that list, but I am not sure? [17:37:36] jhathaway: yes please. But then we also need an update in homer/public [17:38:44] arturo: thanks, would it be possible to explain why it is needed, I have having trouble visualizing the architecture? [17:38:58] yes, I can explain it [17:39:59] there are references also for analytics there for sodium fwiw [17:41:59] jhathaway: that hiera key contains a list of NAT exceptions for Cloud VPS egress traffic. Meaning that the destination addresses listed there will see actual CloudVPS VMs internal IP addresses, instead of the general cloud egress NAT address [17:42:40] ah, that was the reverse of my mental model, thanks [17:43:04] jhathaway: a few more hints here: https://wikitech.wikimedia.org/wiki/Cross-Realm_traffic_guidelines [17:44:01] jhathaway: in particular, sodium is listed here https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/notes/Service_predictions_for_cross_realm_situation as potential low hanging fruit [17:44:07] arturo: thanks, I'll give that a read over [17:44:30] so it sounds like I should add mirror1001 to that list for both keys [17:46:00] jhathaway: I think so, yes. But I don't remember from the top of my head why that particular exception is there. I'm genuinely curious now :-P [17:46:35] I assume so we know what cloud boxes are pulling from our mirror, we can point to a specific vm? [17:47:39] but the usual reason to that is to detect misbehaving stuff. In this case a VM misbehaving would... download a bunch of apt packages? [17:48:32] I have proposal: don't add it, and let's see what happens [17:50:55] jhathaway: do you have a ticket associated with the sodium migration? [17:51:00] (phab) [17:53:31] T286898 ? [17:53:31] T286898: Setup new mirror server (mirror1001.wikimedia.org) - https://phabricator.wikimedia.org/T286898 [17:53:44] yes, https://phabricator.wikimedia.org/T286898 [17:54:31] jhathaway: I just created T298042 [17:54:31] T298042: evaluate & drop cloud NAT exceptions for APT repositories - https://phabricator.wikimedia.org/T298042 [17:54:49] which should be enough papertrail to do the experiment I'm proposing [17:54:54] arturo: thanks [18:02:21] jhathaway: I will think about this patch in the next few days https://gerrit.wikimedia.org/r/c/operations/puppet/+/748771 and merge it when ready. If that happens, then it turns out the answer to your original question is now "no, not required" [18:03:39] arturo: ok thanks [18:06:59] BTW given mirror1001 is a public IP, no need for changes in homer/public [18:07:06] (I think) [18:08:42] but sodium has 3 references in homer/public [18:09:02] 2 for analytics, one for cloud, so those might need to be updated [18:09:53] in particular the analytics one I think is the one that allows analytics hosts to contact the mirror [18:10:00] from a first look [18:10:07] jhathaway: ^^^ [18:10:50] volans: thanks, which repo has the references? [18:11:08] operations/homer/public [18:11:18] https://wikitech.wikimedia.org/wiki/Homer#Editing_the_public_repository [18:11:34] https://gerrit.wikimedia.org/r/c/operations/homer/public/+/748774 [18:11:37] sent a patch for that too [18:11:52] both changes go together [18:11:53] I can help you through if this is the first time running homer jhathaway [18:13:48] volans: okay thanks [18:15:39] arturo, volans: so current plan is to cut over mirror1001 without the above changes, and then add them if we need them later [18:15:56] the changes for analytics are surelyneeded [18:21:01] jhathaway: Yes. I think we're good on the cloud side. We don't need anything specific here. Will ping you otherwise [18:21:31] volans: ok [19:25:10] 19:22:34 rake aborted! [19:25:11] 19:22:34 KeyError: key not found: "PARALLEL_PID_FILE [19:25:22] ^ CI failure on puppet, but looks like possibly something generic/intermittent? [19:27:29] it passed on recheck, but there's probably something to look into there on CI reliability in general [19:27:43] https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/37151/console was my console output for the fail [21:29:13] I am trying to grok policies/cr-analytics.inc anyone have pointers to any diagrams? [21:29:44] i.e. network diagrams of traffic between analytics & prod?