[08:24:54] moritzm: good morning, volans mentioned the `reprepro` doc for Jenkins could use some adjustment when they cut two releases. To recap, we had 2.375.3 , they have released a security update as 2.375.4 AND a new LTS 2.387.1 [08:25:23] thus reprepro proposed to pickup the new LTS (which would lead to an upgrade) when I wanted the update that only introduced the security release [08:26:36] to prevent reprepro from updating every packages in the component I came up with `--restrict=jenkins` and further down there is a mention of being able to pick a specific version with eg `--restrict-binary jenkins=2.100.2` [08:27:11] and apparently "one of the command" is wrong but I don't have further details :] [08:27:45] source: https://wikitech.wikimedia.org/wiki/Jenkins#Get_the_package [08:27:54] the command: $ reprepro --restrict-binary jenkins=2.100.2 [08:27:55] at least the case of two releases is covered on https://wikitech.wikimedia.org/wiki/Jenkins#Upgrading :] I am guessing we need to adjust the command so one can copy paste [08:28:08] is not a valid command, as it's missing the actual reprepro command to execute [08:28:14] ahhh great :] [08:28:38] I didn't manage to use it effectively, in the end to make it work I temporarily modified the updates file adding a vesion resriction there and then restore the existing version [08:29:41] the man page states we can pass the version to the `--restrict` parameter as well and that faulty command is missing the `-C thirdparty/ci` so that might be the cause [08:30:23] this should be "reprepro -C thirdparty/ci --restrict-binary=jenkins=2.100-2 checkupdate buster-wikimedia" [08:30:42] the current example there simply lacks the command, the rest of the args is fine, I'll fix it up [08:31:13] moritzm: I tried that [08:31:17] from my history [08:31:18] maybe we can always pass the version we want and reduce the instruction to only use checkupdate and update? [08:31:19] reprepro -C thirdparty/ci --restrict=jenkins --noskipold checkupdate buster-wikimedia [08:31:29] the noskipold was because it did failed without [08:31:33] not finding anything [08:31:36] which would remove the exception of having to deal with two releases [08:31:38] it did not find anything anyway [08:32:16] but this should work: [08:32:17] reprepro -C thirdparty/ci --noskipold --restrict-binary=jenkins=2.100-2 checkupdate buster-wikimedia [08:32:54] for a the correct LTS version instead of 2.100-2 ofc [08:33:10] JENKINS_VERSION=2.375.4 [08:33:10] cd /srv/wikimedia [08:33:10] reprepro -C thirdparty/ci --restrict jenkins=$JENKINS_VERSION checkupdate buster-wikimedia [08:33:10] reprepro -C thirdparty/ci --restrict jenkins=$JENKINS_VERSION update buster-wikimedia [08:33:11] moritzm: sorry wrong paste [08:33:13] reprepro -C thirdparty/ci --restrict=jenkins --restrict-binary jenkins=2.375.4 --noskipold checkupdate buster-wikimedia [08:33:16] this didn't work [08:33:34] (sorry for nagging you in this in the early morning :D ) [08:34:47] but your command has the additional --restrict jenkins which is already covered by the versioned equivalent [08:35:03] and why shouldn't work? [08:35:18] applies the first restrict and ignores the second? [08:36:08] I think so, yes [08:36:13] :facepalm: [08:36:30] at least when I run it on apt1001 [08:36:40] it tells me that no new updates are needed [08:36:45] so that seems to work [08:36:48] I'll update the docs [08:38:20] also adding it to https://wikitech.wikimedia.org/wiki/Reprepro [08:38:46] which is _THE_ canonical placs for docs on the internet for reprepro since it basically shows up first in most search results... [08:41:14] well done, that sounds well aligned with our value to collect the sum of all knowledge [08:41:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [08:42:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) a:03cmooney [08:42:13] as for the reprepro commands being on https://wikitech.wikimedia.org/wiki/Jenkins#Get_the_package , I wanted an easy way to point SRE to the instruction so they can more or less copy paste without having to read through the whole of [[Reprepro]] and have to figure out the exact command [08:42:23] at the price of some duplication unfortunately [08:44:39] yeah, that's perfectly fine. It's more of a matter of also adding that info to the generic reprepro page, which currently lacks it [09:42:30] volans: moritzm: thank you to both of you :) [10:21:13] hi, I was looking for the cumin syntax to query all hosts of a given role contacts team, what was it again? thank you [10:21:27] I checked https://wikitech.wikimedia.org/wiki/Cumin but no luck [10:25:36] godog: there are aliases [10:25:36] A:owner-observability [10:25:51] grep owner /etc/cumin/aliases.yaml [10:32:20] ah of course, the aliases, thank you volans [10:33:12] 84 hosts will be targeted: [10:33:12] 84 hosts will be targeted: [10:33:22] oops, but yeah that was surprising [10:37:02] :) [11:08:35] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Prepare puppet master infrastructure for bullseye - https://phabricator.wikimedia.org/T285086 (10MoritzMuehlenhoff) 05Open→03Declined This task got replaced/superceded by https://phabricator.wikimedia.org/T330490 [11:14:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [11:25:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) >>! In T327919#8664016, @aborrero wrote: > Please let me know if there is something I can do t... [11:46:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10aborrero) >>! In T327919#8679314, @cmooney wrote: >>>! In T327919#8664016, @aborrero wrote: >> Please l... [12:29:48] 10netbox, 10Infrastructure-Foundations: Should we have two versions of the Juniper QFX5120-48Y in Netbox? - https://phabricator.wikimedia.org/T331519 (10Aklapper) a:03cmooney [13:54:03] volans, slyngs: BTW, the sre.ganeti.remage failure from yesterday wasn't failure, the creation of the VM was simply blocked by an ongoing VM migration as part of the ganeti1011 reimage, once that was completed, it proceeded just fine [13:54:33] glad to hear that :) [13:54:38] thx for closing the loop [13:57:14] Nice, because I broke my head on some OIDC and it not really in a state where I'm able to fix anything :-) [14:15:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) >>! In T327919#8679398, @aborrero wrote: > In the past we had problems with DHCP forwarding be... [14:22:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Run 2x1G links from asw-b1-codfw to cloudsw1-b1-codfw - https://phabricator.wikimedia.org/T331470 (10Jhancock.wm) @cmooney I got these repatched as depicted in the links. Thanks for waiting. Please let me know if you need anything else! [14:30:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [14:31:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Run 2x1G links from asw-b1-codfw to cloudsw1-b1-codfw - https://phabricator.wikimedia.org/T331470 (10cmooney) 05Open→03Resolved That's great Jenn thanks! All looking good and working now :) ` cmooney@cloudsw1-b1-codfw> show interfaces descrip... [15:23:05] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10hashar) I know nothing about the `mediawiki-commits`, in mailman3 it doesn't show up at all and the archive show 0 discussions at https://lists.wikimedia.org... [15:37:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) Some updates on the physicals for the new cloudsw. The links to core routers are now up and c... [15:54:06] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10hashar) I am guessing it is an issue with Mailman. https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 shows a large queue **since March 7th 14:12**:... [15:57:21] 10netbox, 10netops, 10Infrastructure-Foundations, 10SRE: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10jbond) [15:57:32] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Upgrade IDPs to CAS 6.6/Bullseye and enable webauthn - https://phabricator.wikimedia.org/T305518 (10jbond) [15:57:38] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10jbond) 05Open→03Resolved a:03jbond [16:04:44] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10hashar) Icinga says OK: mailman3 queues are below the limits, but there is an alert about the runners: PROCS CRITICAL: 13 processes with UID = 38 (... [16:15:32] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10Marostegui) It looks like the restart I made fixed it or at least it is slowly going down: https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3?orgId=... [16:16:15] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10hashar) 05Open→03Resolved a:03hashar Mail should be emitted again, it will take a bit of time to clear the queue though. That can be monitored... [16:42:57] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10hashar) >>! In T331626#8680354, @hashar wrote: > PROCS CRITICAL: 13 processes with UID = 38 (list), regex args '/usr/lib/mailman3/bin/runner' > Last... [17:16:23] 10Mail, 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) Sorry, these two patches are unrelated to this patch. Added by mistake. [18:04:09] volans: TIL about the DNS Netbox script [18:04:32] ?? [18:04:39] I added a "container" in Netbox as a bit of housekeeping earlier [18:05:00] https://netbox.wikimedia.org/ipam/prefixes/648/ [18:05:16] sothe pending diff is you [18:05:17] rename from 57.15.185.in-addr.arpa [18:05:18] rename to 0-27.57.15.185.in-addr.arpa [18:05:20] But I now see that affects the DNS script [18:05:22] yeah exactly [18:05:24] yeah :D [18:05:27] I was rnning it in noop [18:05:42] but might affect also the ops/dns repo [18:05:44] the include [18:05:56] It will work with that change, if I patch the include in authdns repo [18:06:07] you need to follow [18:06:07] https://wikitech.wikimedia.org/wiki/DNS/Netbox#Atomically_deploy_auto-generated_records_and_a_manual_change [18:06:17] I'm not sure it's needed though, I can just delete the container [18:06:24] that's up to you [18:06:25] :D [18:06:26] do you have any thoughts / guidelines on what's best here? [18:06:27] ok [18:06:41] leave it with me thanks [18:07:00] I don't know by heart what were ar.hel's policies [18:07:08] on when to add a container and when not [18:07:09] sorry [18:07:44] yeah I was just doing it to group the existing WMCS ranges, felt it was better to have it [18:07:50] But I was only thinking of Netbox [18:07:57] good it happened so I understand how these get created [18:08:14] there are some logic on how to group reverse files [18:08:26] it was tricky, and the requirements have changed [18:08:29] so we can also revisit it [18:08:36] to reduce the number of imports [18:11:11] volans: yep, it's always gonna be tricky tbh [18:11:17] and works quite well [18:11:48] you can see the logic in the source code [18:12:15] but the original requirement was to be able to slowly move subnets from manual to auto-managed by netbox [18:12:25] now that we are in netbox-driven mode [18:12:30] we could aggregate more I guess [18:12:39] hmm yeah that also makes sense [18:12:52] what have you decided for this specific subnet? [18:13:09] moving smaller parts of bigger blocks, doing those sub-blocks (like not on the zone boundary for the whole /24 etc) [18:13:26] I'm gonna keep the container cos I think the allocations in Netbox are messier without [18:13:34] even though it means more work [18:13:37] ok [18:13:51] I also wasn't aware of '--skip-authdns-update' [18:13:57] that makes it realtively easy to do [18:14:35] :) [18:15:30] actually I think I'm gonna do a 180 degree on that decision :P [18:15:41] ahahah [18:15:50] existing manual entries have sub-delegation to WMCS name servers [18:16:21] I think could get too complex, not worth it I think [18:19:32] ack [18:20:20] The file it wants to rename is the actual zone file in /zones too [18:20:32] Not something in /zones/netbox that has an include pointing to it [18:20:41] so definitely break stuff if I try :) [18:22:38] I'm not sure I follow [18:22:45] ^^ sorry this is incorrect, the file is in /zones/netbox [18:22:48] yep [18:22:49] they both have the same name [18:22:52] ah [18:23:04] shouldn't be a problem though per-se [18:23:06] cmooney@dns1001:/etc/gdnsd/zones$ find ./ -name 57.15.185.in-addr.arpa [18:23:06] ./57.15.185.in-addr.arpa [18:23:06] ./netbox/57.15.185.in-addr.arpa [18:23:15] 10Mail, 10Gerrit, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: reviewer-bot is not working - https://phabricator.wikimedia.org/T331626 (10Legoktm) [18:23:16] volans: no, but mixed me up [18:23:51] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: Mailman hasn't delivered emails since 2023-03-07 14 UTC (was: reviewer-bot is not working) - https://phabricator.wikimedia.org/T331626 (10Legoktm) 05Resolved→03Open p:05Triage→03Medium a:05hashar→03Marostegui [18:28:54] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Mailman hasn't delivered emails since 2023-03-07 14 UTC (was: reviewer-bot is not working) - https://phabricator.wikimedia.org/T331626 (10Legoktm) Re-opening just for tracking while we wait for the queue to go d... [18:31:08] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Mailman hasn't delivered emails since 2023-03-07 14 UTC (was: reviewer-bot is not working) - https://phabricator.wikimedia.org/T331626 (10Legoktm) There are 2,936 emails in the out queue, it takes ~5.1 seconds t... [18:31:57] topranks: if you are still around [18:31:58] - description "subnet prod-xlink1-eqiad"; [18:31:58] + description "cloudsw1-c8-eqiad prod"; [18:32:03] is this your change in homer by any chance? [18:32:52] there is another similar one on the other cr in eqiad also I expect, sorry [18:33:01] np! [18:33:02] should I merge this? [18:33:07] please, thanks [18:33:18] thanks [18:35:26] - filter { [18:35:26] - input labs-in4; [18:35:26] - } [18:35:32] in4 [18:35:33] codfw [18:35:44] no don't merge that [18:35:44] going to merge [18:35:45] oh ok [18:35:47] :) [18:35:47] sorry [18:35:49] np [18:35:58] just say no, let me run it, what CR is that on? [18:36:14] this CR is for removing authdns2001 [18:36:19] ah sorry my bad [18:36:23] https://github.com/wikimedia/operations-homer-public/commit/47e9736b376811861de6bfc3272ee1441000df9b [18:36:28] yeah easiest thing to do is for you to merge that now [18:36:36] including your changes too? [18:36:38] I have a local patch I've not submitted to repo (as more changes to go) [18:36:43] not sure how to selectively merge :P [18:36:44] it's harmless for you to proceed [18:36:48] yeah let's not bother [18:36:55] ok merging then, thanks [18:37:01] you do your bit - remove my addition - then I'll roll in behind you and add it again from my local setup [18:37:08] sorry for confusion! [18:37:28] you will have similar on both CRs I think [18:37:31] yep [18:37:36] for cr*-codfw [18:37:40] eqiad was just that one change [18:38:21] ah - reminded I only changed it on one side best do the other now :) [18:38:36] sukhe: do you have the authdns patch handy? [18:38:47] topranks: https://github.com/wikimedia/operations-homer-public/commit/47e9736b376811861de6bfc3272ee1441000df9b [18:39:04] I didn't merge for codfw yet [18:39:10] just did eqiad [18:39:20] ok np [18:39:39] in general, as homer can be run by a lot of people, we should always have the homer repos in prod in sync with the live config, we do have the 2 times a daily email check exactly for that purpose [18:40:09] or it will either block otther people or cause issues because merging/reverting changes [18:41:10] and according to the last report from this morning there are quite a few outstanding diffs [18:41:40] I have not been in this situation before since I rarely run homer now so I am not sure if we can skip selective commits [18:41:43] I think the answer is no [18:41:58] correct [18:42:06] volans: it's on me, I thought with Arzhel away I'd get away with submitting a single patch for my changes :) [18:42:15] caught red handed [18:42:35] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Mailman hasn't delivered emails since 2023-03-07 14 UTC (was: reviewer-bot is not working) - https://phabricator.wikimedia.org/T331626 (10Legoktm) Sent [[ https://lists.wikimedia.org/hyperkitty/list/listadmins@l... [18:42:43] not many touch the CRs but it happens, I'm doing a migration so it's multi-step was being lazy [18:43:00] yeah at least the authdns2001 change is not critical [18:43:23] but I will be doing a new one later for adding dns1003 and dns2003 [18:46:08] volans: in terms of the mail this morning I'll check them out. CR ones may be due to it running while I was doing the manual works to reboot the line cards in codfw. [18:46:25] ack [18:46:27] thanks [18:47:51] sukhe: what's the current situation with your changes in Homer, are you done making any for now? [18:48:11] topranks: all good for now, pending the merging of the authdns2001 change in codfw [18:48:24] and then later (soonish, today), I will be adding dns1003 and dns2003 [18:48:29] that's the extent of it [18:48:38] ok, when do you expect to merge the authdns2001 change? [18:48:44] I can merge it right now [18:48:52] just was waiting on you, in case you wanted to revert [18:48:56] sorry this got confusing! [18:49:00] ok cool do that, and then I'll do my additions on top of that [18:49:08] haha yeah, on me sorry [18:49:09] ok, that will pull in your changes too [18:49:10] that's fine right? [18:50:29] topranks: pasting the diff, just a sec [18:50:32] better safe than sorry :) [18:50:38] https://phabricator.wikimedia.org/P45726 [18:52:51] sukhe: shit I think that will fail [18:53:18] don't push it now, I'll pull down your patch, add what's to not remove my bits, and run Homer [18:53:23] ok :) [18:53:28] I didn't push [18:53:28] problem with that is it wants to remove stuff that's refered to elsewhere in the config [18:53:29] all yours [18:53:35] ok! [18:58:12] sukhe: are you blocked until this is merged? I'm guessing no... we just have a down BGP peer that needs to be cleaned up [18:58:24] right, not blocked on this [18:58:32] but I might be blocked on bringing up dns1003 and dns2003 [18:58:42] when I add them to homer after commissioning them [18:58:51] if it complicates things, we can do that tomorrow as well [18:58:58] it's not critical since I know it's late for you already [18:59:07] so yeah, I am fine with skipping homer if you had like [18:59:27] sukhe: nope I was planning on doing this work this evening anyway [18:59:32] ok! [19:00:01] some of this I need to double check our templates though, I missed something with one of the policies in terms of the automation [19:00:24] you fire ahead I'll have it in a healthy state soon enough for you to make the changes for dns1003 and dns2003 [19:00:54] ok! [19:01:11] yes it's going to be at least an hour or two and not critical to bring up BGP today [19:28:47] sukhe: all clean for you to run Homer when you need [19:28:57] topranks: thanks! [19:29:04] you offered your help for the dns1003 thing so I will take it :) [19:29:07] https://netbox.wikimedia.org/dcim/devices/1226/interfaces/ [19:29:11] no IP for the interface here [19:29:15] I will have to make some manual changes as part of my work - can't be avoided as the timing of changes needs to be too close to manage from Homer [19:29:17] if I run https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ [19:29:21] it says cable ID assigned [19:29:22] so we may conflict again but ping me [19:29:25] * topranks looking [19:30:20] sukhe: so I think this is the bit myself and v.olans were discussing earlier [19:30:21] or will the cookbook assign the IP? [19:30:42] The provisioning script will [19:30:56] I believe we need to delete the interfaces, apart from the 'mgmt' one [19:31:12] Can I do that now and run the provision script as a dry-run see what it says? [19:31:16] please do [19:33:06] damn I messed up, I didn't not the cable label [19:33:17] anyway I'll take a note to get dc-ops to check it for us [19:33:36] I have the cable #! [19:33:37] 1971 [19:33:38] I remember it [19:33:46] sukhe: awesome! [19:34:54] sukhe: The DNS servers connect to the public vlan is that right? [19:34:59] yep [19:35:52] ok this is what I'm putting in, same as you had I'm guessing [19:35:53] https://usercontent.irccloud-cdn.com/file/8ke3lJCY/image.png [19:36:02] seems right! [19:36:09] I think the difference was that I didn't delete the interface [19:36:16] and thus it failed to provisin it [19:36:29] does the dry-run work for you? [19:37:14] No, I think that cable id is wrong [19:37:25] seems the cabel with that label is connected to a ganeti host [19:37:25] https://netbox.wikimedia.org/dcim/interfaces/27139/trace/ [19:37:50] where you might have gone wrong, is the quite confusing "object ID" that Netbox assigns to every element (including say a cable) [19:37:58] and the "label", which is an attribute of that we set [19:38:05] I saw cable # [19:38:11] can you perhaps do 1917? [19:38:32] So for instance the cable with label 1971 has an object ID of 5705: [19:38:33] https://netbox.wikimedia.org/dcim/cables/5705/ [19:38:44] sukhe: yep perhaps, let me see for 1917 [19:38:58] no cable found [19:39:00] hmm [19:39:09] There is no 1917 alright [19:39:17] 14:04:59 < sukhe> Cable ID 1971 already assigned in eqiad. [19:39:22] So that *could* be the cable label, but I don't think we want to guess [19:39:22] from my earlier chat log [19:39:23] weird [19:39:33] (in another channel) [19:39:35] We'll put it through without a label and get DC-ops to confirm it when they are on site [19:41:19] topranks: bblack has the screen open [19:41:22] it was definitely 1971 [19:41:33] I see a termination B too in https://netbox.wikimedia.org/dcim/cables/5705/ [19:42:46] sukhe: I guess that leaves us with the same task then [19:43:07] One of the cables is mis-labeled, so we need DC-ops to check the physical [19:43:18] interesting [19:43:22] (if they both had label 1971 saved in Netbox) [19:43:25] topranks: https://netbox.wikimedia.org/dcim/cables/1972/ is authdns2001 [19:43:28] #1972 [19:43:55] Some more recent Netbox features potentially allow us to enforce a unique constraint for the labels [19:44:00] For now let's proceed [19:44:02] ok [19:44:12] And by proceed, I mean deal with the next issue I hit :) [19:44:14] https://usercontent.irccloud-cdn.com/file/D31Jt4gv/image.png [19:44:28] ha [19:44:54] this means we delete mgmt too? [19:45:13] yeah I think so, this is what I hit before that I was talking about ealier [19:45:20] got it [19:45:33] right you said "all interfaces" [19:45:33] but we can just delete mgmt, we need to make a note of the mgmt IP in case it randomly allocates a new one [19:46:05] I'll open a task on the wider issue as to why this is needed, as per v.olan's comment there is provision for it in the code but obviously it's not working right [19:46:09] ok [19:46:21] 10.65.3.6/16 [19:47:10] yep [19:47:17] sukhe: Ok cool the dry-run worked [19:47:20] nice [19:47:24] let's try committing? [19:47:27] we have nothing to lose :P [19:47:32] If you want to do it again and tick the 'commit changes' box? [19:47:38] sure [19:47:43] leave the cable label box empty [19:49:13] it gave me a new cable # [19:49:18] but other than that committing? [19:49:21] I mean OK to commit? [19:49:49] yep fire ahead [19:50:14] many thanks! [19:50:17] seems like it worked [19:50:20] I will take care of the next one :) [19:50:23] It's giving you the object "id" for the cable in the Netbox database [19:50:25] enjoy your evening and thanks again [19:51:13] That object has a "label" field that has nothing in itL [19:51:15] https://netbox.wikimedia.org/dcim/cables/6207/ [19:51:33] The 'label' is actually a number we stick on the actual cable in the datacenter [19:51:55] I've often got mixed up myself, given they are both often 4-digit numbers :) [19:51:59] :P [19:52:03] sukhe: anyway for now we've more stuff to check [19:52:11] oh? [19:52:17] unlike last time I did it the MGMT IP assigned is not the same as it was [19:52:17] before I can proceed with the cookbook? [19:52:20] so I'll change that now [19:52:21] yeah [19:52:24] wow ok [19:53:33] Ok that's done [19:53:58] Now, finally, I think you can run the cookbook :) [19:54:07] :D [19:54:07] thanks! [19:54:09] good luck to us all [19:54:15] +1 [19:54:19] haha [19:56:50] sukhe: actually I know where we went wrong [19:56:55] oh? [19:56:57] * sukhe al lears [19:57:28] Those steps on wikitech say: [19:57:32] "You need to readd the DNS Name field for the management interface. " [19:57:37] But we missed that [19:58:06] which meant this condition failed: [19:58:08] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-extras/+/refs/heads/master/customscripts/interface_automation.py#1171 [19:58:34] wow [19:58:47] where is that done though? [20:01:07] You bring up the host interfaces, then click the IP assigned to the 'mgmt' interface [20:01:18] The page for the IP has a field "DNS name": [20:01:24] ok I will try for dns2003 :) [20:01:28] https://netbox.wikimedia.org/ipam/ip-addresses/12727/ [20:01:28] and hopefully it works and report back here [20:01:32] cool thanks [20:01:43] interesting, I do remember this [20:01:45] sorry these kind of changes are rare so they don't fit neatly with our automation [20:01:48] but you are right that I personally skipped this with dns1003 [20:01:54] in case you didn't know this is something of a hack :P [20:02:00] haha [20:02:51] The decommission cookbook removes that dns name, but the server provision script and reimage cookbooks need it later on :(