[09:41:32] 06Traffic, 10Data-Platform-SRE (2024.05.27 - 2024.06.16), 13Patch-For-Review: Point datahub and datahub-next subdomains to traffic server - https://phabricator.wikimedia.org/T365668#9841202 (10Gehel) [14:25:14] Hi. I'm trying to get a new discovery record &c &c set up for my new ceph cluster ("apus"), which I need to do so I know what to put into the frontend setup. I'm a bit confused. I tried starting at https://wikitech.wikimedia.org/wiki/DNS/Discovery#Add_a_service_to_production but then got stuck pretty quickly on the need of IPs to put into the Wmflib::Service entry in hieradata/common/service.yaml [14:25:37] [maybe I should have started at https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service but that seems to presuppose the service is ready to go] [14:29:38] Emperor: so the question if I understand correctly is that you need to assign svc IPs for this? [14:29:44] and then proceed with the rest of the steps? [14:30:04] I _think_ so, but maybe I am holding this all backwards and should be starting elsewhere :-/ [14:30:52] I think the first step is that you need to assign the service IPs and then we can move forward [14:31:00] https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox [14:31:22] OK, cool. [14:31:38] basically, you will need to pick one of the IPs here and assign the DNS name .svc..wmnet [14:31:56] and then run the netbox DNS cookbook to push those changes [14:32:21] that will give us for example apus.svc.eqiad.wmnet pointing to say 10.2.2.foo [14:33:36] that I think is step 0 and once you have that, you will need to figure out the "class" it goes behind (high-traffic1/2,low-traffic) and then follow https://wikitech.wikimedia.org/wiki/DNS/Discovery#Add_a_service_to_production [14:34:09] if you don't want to touch Netbox, you can create a task for the IPs to be created I think and that's fine (sorry netops if it's not :) [14:34:52] also happy to do it if it is helpful [14:35:28] let me have a go. [14:39:14] OK, I've made an IP in each, next is sre.dns.netbox cookbook. [14:39:42] ye[ [14:39:44] p [14:44:22] OK, that didn't catch fire, so now I need to update operations/dns with the new records too [14:44:26] you will also need manual patch in the dns repo [14:44:43] Emperor: yes, you can see some example commits but also the docs talk about that [14:44:47] due to T270071 [14:44:48] T270071: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 [14:45:09] thanks volans! [14:46:26] Emperor: happy to review [14:51:13] sukhe: https://gerrit.wikimedia.org/r/c/operations/dns/+/1037095 is I think correct [14:53:04] Emperor: we will also need the geoip/metafo record, depending on if this service will be active/active or active/passive [14:53:28] and PTRs as well in templates/10.in-addr.arpa [14:54:03] ack, thanks [14:54:57] don't forget the ordering is important there, of the documented steps [14:55:08] (of when you push the ops/dns change vs ops/puppet change) [14:55:25] I think the puppet change goes first [14:55:27] but let me check [14:55:43] yeah it does, for new things [14:55:46] reverse for removing things [14:55:49] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/refs/heads/master/utils/mock_etc/discovery-metafo-resources [14:56:05] ^ this has the big warning text in it, maybe some shorter variant should exist on the wikitech page with the instructions. [14:56:07] > Before adding the dns discovery entries you need to make sure the services is at least in state: production so the various definitions exist on the dns servers [14:57:51] when we first created this interlocked stuff between ops/dns and ops/puppet for discovery, it's what made me try to push for just moving our ops/dns data into the puppet repo :P [14:57:58] but there's lots of downsides to that model, too [15:00:07] OK, but catch-22 here: it's hard (impossible?) for me to set up my service without the DNS entries (e.g. because I need to generate a TLS cert, the service needs to know what DNS entries to use and will probably fall over if those don't exist) and I need the IPs to put into hieradata/common/service.yaml; but I have to have it declared in state: production (which I think is in hieradata/common/service.yaml) before I can actually finish [15:00:07] doing the DNS changes to make the hostnames work? [15:02:16] Emperor: the docs aren't that great yeah but: [15:02:38] (and sorry, I had to do re-read of them) [15:03:40] I am checking some of the existing patches to confirm [15:04:11] ok so in https://gerrit.wikimedia.org/r/c/operations/dns/+/1037095 [15:04:18] let's just add the PTRs [15:05:12] and then create the service.yaml entries [15:05:17] for which you just need the IPs above that we created [15:05:47] and then after that is done and merged everywhere, we will add the DNS discovery entries [15:05:50] that looks fine IMO [15:06:07] does that seem fine? [15:07:33] sukhe: so https://gerrit.wikimedia.org/r/c/operations/dns/+/1037095 (just updated) to add the PTRs and hold off the geoip change until service.yaml &c is done ? [15:07:50] if I've understood correctly, yes, I think that's plausible [15:08:04] yep [15:08:09] looking [15:09:05] Emperor: on netbox, the DNS names need to be swapped [15:09:12] 10.2.1.10 should be codfw [15:09:16] 10.2.2.10 should be eqiad [15:09:26] https://netbox.wikimedia.org/search/?q=apus&obj_type= [15:09:54] your patch is correct but Netbox data needs to be swapped [15:11:26] oh, darn it [15:12:25] {{done}} [15:14:34] looks good, you should re-run the netbox DNS cookbook I think and I will check the patch in the meantime [15:23:46] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193 (10ssingh) 03NEW [15:23:56] there are so many better ways this could be done than this pile of hackery, sorry :) [15:24:08] https://xkcd.com/421/ [15:24:10] I promise to update the docs after this is done at least [15:25:12] <3 [15:27:23] cookbook re-run, so I'll merge that CR now. Once that's merged, I need to do authdns-update on a DNS server? [15:27:28] yes [15:27:35] any one DNS server, such as dns1004 [15:30:19] {{done}}, thanks for the help! I'll come back to the geoip record when nearer to going live [15:30:40] hth, also we can review the service change for you so feel free to add me [15:30:52] thanks :) [15:32:39] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193#9842720 (10ssingh) p:05Triage→03Medium [16:49:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9843219 (10Papaul) [18:12:44] dduvall: Sorry to abandon you like that - bedridden since friday. I should be available tomorrow. [18:13:43] brett: oh man, sorry to hear that. no worries. sukhe helped me out with the deployment, so i think we're all good [18:14:11] blubberoid has been completely removed from production and i have a patch to remove it from the deployment-charts repo [18:23:42] 🎉