[07:27:53] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10elukey) Quick question about how to proceed. Would it make sense to start testing adding manual labels in the ml-serve-eqiad clu... [08:08:27] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10cmooney) @elukey yes I think that makes sense, no need to hold off on testing. Your suggested label naming makes sense so let's... [09:05:57] 10netbox, 10Infrastructure-Foundations: Make more extensive use of Netbox custom fields - https://phabricator.wikimedia.org/T305126 (10cmooney) Two other things which we might want to add custom fields on interfaces: - Flag to indicate a "dymaic neighbor" BGP peer should be configured for the interfaces IP su... [10:30:18] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: wait reboot time timeout on aqs nodes - https://phabricator.wikimedia.org/T307260 (10Volans) We looked at the logs with John and Papaul during our last meeting and agreed that it took a long time for mdadm+mkfs to create the software rai... [13:04:44] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: replace all puppet crons with systemd timers - https://phabricator.wikimedia.org/T273673 (10Zabe) [14:05:59] topranks, XioNoX: I was chatting with jelto and a gitlab-related question came up. They will need a dedicated public IP specifically used for the SSH service. Given that the model will be that only one host will be active at any point in time for the SSH service, instead of wasting one IP in assignment, would it be possible to use a single IP that could be migrated at will? I was [14:06:05] thinking of the anycast prefixes for example. [14:06:52] to clarify: wasting in the sense that they will have 4 hosts (2 in codfw and 2 in eqiad) at that would mean reserving 4 IPs while only 1 will ever be used at any given time [14:06:52] volans: In theory, yes. [14:07:06] hmm. Could it be put behind LVS ? [14:07:41] was going to say this ^ :) [14:07:42] I guess maybe a better question is what is the failover mechanism? What determines which one is live at a given time? [14:07:42] I'll let jelto reply on this one :) [14:07:53] yeah cool, I'm sure we can work something out [14:07:58] topranks: AFAIK for now will be manual failover [14:08:25] Ok thanks [14:08:45] yes manual failover with also some puppet code changes, no automatic failover [14:10:15] The codfw and eqiad public IP ranges are different, the requirement for 1 IP spread across both DCs is a little tricky [14:10:56] DNS based manual change might also be as easy, and give them all their own IP, or 1 IP per DC behind LVS [14:11:35] I probably need to think it through further, might even be good to have a quick sync on it [14:13:21] and a diagram on the different flows, usually the recommended way is real server on a private IP, exposed to the internet via LVS [14:18:25] topranks: we can also have a quick chat about that sure. My goal is that we have a second public ip for SSH. There is no automated failover currently. This is a manual process involving puppet patches. [14:18:25] We can also have the server on a private IP and add a second public IP for end-user git/ssh access. Currently the machines has also a public address for the main interface which is not striclty needed. [14:21:56] why second IP? [14:22:25] to have SSH to gitlab on port 22 without interfering with the host [14:22:32] Because we want to separate the machines SSH daemon from the end-user SSH daemon, which is used to clone repos [14:23:28] makes sens! [14:28:49] I agree allocating extra host IPs is not great here [14:29:35] that also doesn't seem to fit the usecase for the public anycast prefix (even thought it should technically works) [14:37:23] jelto: do you have any doc on how the system is currently setup? [14:41:00] XioNoX: there is a one-liner regarding network here: https://wikitech.wikimedia.org/wiki/GitLab#SSH_fingerprints. I can also draw something and add it there if you give me some minutes [14:48:17] jelto: could be useful if you don't mind [14:49:44] I see that each servers already have 2 host public v4 IPs [14:52:58] yes the virtual ganeti hosts gitlab1001 and gitlab2001 have a public host ipv4 and a second ipv4 address assigned. Now we want to migrate to physical machines and don't want to waste 8 public IPs for the 4 physical gitlab hosts. So we are looking for a better solution there :) [14:53:47] jelto: thanks for thinking about that :) [14:54:19] and being considerate about public v4 usage [15:36:10] topranks: gentle reminder to run the dns.netbox cookbook due to the changes for cloudsw1-e4-eqiad.eqiad1 ;) [15:56:30] volans: sorry bout htat!! thanks I will run now [15:56:58] np, thanks! :) [15:58:14] XioNoX, volans: I actually hit on something strange there with Jinja2/homer templates. [15:58:21] Basically this issue: [15:58:25] https://stackoverflow.com/questions/8818872/jinjas-loop-variable-is-not-available-in-include-d-templates [15:58:59] The work-around in that does work - if I use an {% include %} statement [15:59:38] But with our "section" macro the vars from the parent file were never usable [15:59:42] topranks: what's the change/code in our code that fails? [16:02:05] simplified version is basically this: [16:02:06] https://phabricator.wikimedia.org/P27820 [16:02:28] I tried adding "with context" in the 'inlcude' statement int he macro definition but it didn't make a difference [16:02:42] It's no problem for me as long as we're happy with me using the 'include' rather than 'section' statement [16:02:57] seemed odd though [16:03:41] try to see if 'scoped' helps, see https://jinja.palletsprojects.com/en/3.1.x/templates/#block-nesting-and-scope [16:03:50] the second part of the paragraph [16:05:32] Can that be used with other constructs not just "block" ? [16:05:40] I don't believe we use 'block' anywhere [16:06:09] is section imported with context? [16:06:47] the problem of using the include directly is the indentation [16:06:54] section() does take care of that [16:07:33] Yeah, the indentation isn't a massive concern here, but yep that is the difference [16:07:43] I added "with context" in the macro it made no difference [16:07:58] I'm happy to stick with the include and move on unless anyone feels that's problematic [16:08:50] give me another 5 and then for me the include is ok ;) [16:09:18] yeah happy to get to the bottom of it, for curiosity's sake if nothing else [16:09:59] I modified the macro line like this fwiw: [16:10:00] {% include category_name + "/" + section_name + ".conf" with context %} [16:10:08] but same result [16:14:50] topranks: could you try this? https://phabricator.wikimedia.org/P27820#119659 [16:15:00] it *might*... or might not... work [16:15:26] I *think* I tried it, may have been slightly different though let me give it a whirl [16:20:11] Doesn't seem to work unfortunately :( [16:20:38] I think the culprit is the macro [16:21:06] yeah seems so, but it's so simple it's kind of surprising [16:21:29] I change to "include" as a last ditch "try anything", fully expecting I'd need to refactor all of the ospf.conf [16:21:54] wasn't expecting it to work, but it did, so seems to be the macro [16:22:19] context is automatically passed to includes since v2.2 [16:22:34] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Netbox to 3.2 - https://phabricator.wikimedia.org/T296452 (10ayounsi) Spicerack module tested by editing `/etc/hosts` on cumin2002 and following https://wikitech.wikimedia.org/wiki/Spicerack#Test_newly_released_Spicerack_features. Cookbooks... [16:23:20] I guess the macro breaks that chain, would have thought it'd work the same as the macro simply does an include [16:24:39] I guess you could put a comment there and implement the same filter than the macro there inline [16:24:43] and be done with ti [16:24:44] *it [16:27:47] yeah I'll do that I think. The alternative is a lot of duplicate config, or a messy refactor of the ospf file. [16:27:53] thanks for the help! [16:28:17] no prob, sorry for not being hepful [19:52:39] https://github.com/netbox-community/netbox/releases/tag/v3.2.3 [19:53:00] yeah [20:43:02] nice :) [20:43:16] I'm sure datacenters love these yokes :D [20:43:16] https://github.com/netbox-community/netbox/issues/8805 [21:16:04] that is a pretty amazing design indeed! [21:24:16] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Upgrade pfw to Junos 20+ - https://phabricator.wikimedia.org/T295691 (10Papaul) The Junos image is now on both pfw ` root@pfw3-eqiad% ls /var/tmp/junos-srxentedge-x86-64-20.4R3-S1.3.tgz /var/tmp/junos-srxentedge-x86-64-20.4R3-S1.3.tgz...