[08:45:15] I'm seeking a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/939236 to bump cadvisor rollout, following yesterday's bump [09:03:07] ok going ahead, should be fine [10:48:33] <_joe_> Can is there any reason why we should keep a java 8 image based on stretch around? [14:02:40] Is it safe to run the makevm cookbook again after it fails? I just tried it and got a bunch of seemingly unrelated DNS errors. Now it's asking me if I want to retry, skip or abort [14:13:32] I skipped the DNS errors and now it's showing me a diff of unrelated DNS changes: https://phabricator.wikimedia.org/P49570 . Any suggestions? [14:14:55] ^ topranks - This is related to the work you're currently doing, isn't it? [14:15:04] inflatador: that's realated to the work rob is doing in -dcops [14:16:09] inflatador: you can accept it, it's no big deal [14:16:17] ACK, thanks XioNoX ! [14:21:12] fabfur: ^^ [14:21:15] tnx [14:21:29] hi inflatador [14:21:41] I was decommissioning one host and suddenly this error popped up [14:21:43] https://www.irccloud.com/pastebin/ougWcXBU/ [14:21:54] sorry the error was `Found 4 IPs for name 'flink-zk1001.eqiad.wmnet.', expected 1:` [14:22:41] is it something you are involved with ? [14:28:46] fabfur LOL, looks like I'm getting your errors and you're getting mine. I think that is safe to ignore based on the makeVM cookbook output...old A/AAAA records should be replaced by the new VM was just successfully built [14:29:12] :) thanks [14:31:09] Will file a bug for this and CC you [14:31:18] 👍 [14:32:22] fabfur just to confirm, you were running the decommission cookbook? [14:32:41] yes, for host lvs1013.eqiad.wmnet, if it matter [14:32:54] ACK, got it [14:33:24] maybe that's what's needed: https://phabricator.wikimedia.org/T341973 :) [14:34:45] I'd argue that instead of locking, we just need a way to provision VMs and decommission physical hosts at the same time [14:35:11] But if locking is easier, I'm fine w/it [14:35:21] Could anyone help me to troubleshoot a failure in networking please for analytics1073? I tried a reimage. It failed to PXE boot, so now it's still booting its old buster install, but I can't get any packets out. [14:36:04] Can't ping default gateway. Can't resolve anything. Link to the switch is up. [14:36:49] I'm logged in through sol, but I'm not sure where to look next. [14:38:45] btullis: I'd recommend to jump over #wikimedia-dcops I think they're a bit busy right now but should be able to help with the initial troubleshoting [14:39:01] Thanks XioNoX - will do. [14:39:27] I had something like this happen recently [14:40:40] btullis: https://phabricator.wikimedia.org/T340055 [14:41:44] heads-up: Traffic is upgrading the internal recursors in core sites this week. this is mostly a heads-up because of the EDNS client subnet issues we saw once after the upgrade [14:41:56] those have been resolved but if you see something, please let us know, thanks [14:42:03] btullis: it was "resolved" by replacing the SFP-T adapter, and required a replacement with specific brand [14:42:30] btullis: and it happened a second time (different machine), same resolution [14:44:13] urandom: Thanks. Interesting. I haven't upgraded any firmware yet. I've tried, but failed. Could well be the same as what you saw though. [14:45:00] btullis: I think firmware was a red herring in my case [14:45:12] whether you're seeing the same thing or not, I cannot say [14:46:50] btullis: what happens if you powercycle? reboot without PXE? [14:47:13] in my case the machine came back up with connectivity [14:47:59] I'll try a really cold boot. So far it's booted to buster with no connectivity, but only warm boots. [14:48:27] on the heels of the PXE attempt? [14:48:39] Or a separate reboot afterward? [14:48:41] both [14:49:44] maybe have someone onsite un/re-plug the patch cable, that also worked for me (without a reboot) [14:50:24] worked to reestablish connectivity, not to PXE boot [14:51:28] hardware is the worst. [14:51:42] ha [14:52:02] sadly, there are few things that need to be tried here but yeah [14:52:24] till you find the problem, it's an excruciating process [14:52:33] I'll ask someone in eqiad to stroke it and whisper that it'll be ok. [14:52:42] that's certainly one way :) [14:55:14] 🕯️ [14:56:19] Cold boot didn't help. I'll make a ticket. Thanks for the help so far. [14:57:18] OK, bug report filed for the cookbook stuff: https://phabricator.wikimedia.org/T342130 [14:58:10] 👍 [14:58:41] thx