[05:58:59] I am continually impressed with how much better our databases keep getting, you all are very awesome <3 [06:00:57] legoktm: awwww <3 [08:51:11] OK, I tried a reset of the iLO on ms-be1057 (reset /map1), and it still isn't responding to remote ipmitool. I've tried the things on https://wikitech.wikimedia.org/wiki/Management_Interfaces#Troubleshooting_Commands and it all seems OK otehr than that remote ipmitool doesn't work. [08:58:13] likewise ms-be1058 - which means I can't reimage either host. [08:58:30] I guess I'll ask for cold resets as last-chance-saloon, but... [09:02:16] or just open a DC ops task and tag it ops-eqiad [09:47:30] Emperor: Sometimes a power drain helps fixing the issue, I would open a DC ops tag with ops-eqiad tag as moritz.m said [09:48:14] local ipmitool works? have you tried to reset the password? [09:50:40] T310478 [09:50:40] T310478: hw troubleshooting: remote IPMI not working for ms-be105[7-8].eqiad.wmnet - https://phabricator.wikimedia.org/T310478 [09:51:25] yes, I think one of the things I did was set /map1/accounts1/root password= [09:51:38] (I will just do it again to check) [09:52:53] ack [09:52:53] Yep, re-restting the root ps doesn't change the failure state (everything seems to work except remote-ipmi) [11:09:26] I am going to reboot all x2 hosts before they go live tomorrow [11:09:33] And same with s6 candidate master that will go live tomorrow [11:11:14] Emperor: one thing that not sure if documented was that, after a board reset "Enable remote IPMI" on the config was unchecked on the web access [11:11:54] but most likely at that point requires a power drain [11:58:47] jaime if you can review this https://gerrit.wikimedia.org/r/c/operations/dns/+/805114 today, that'd be great (doesn't have to be now, just today) [11:59:15] will do after lunch [11:59:59] No rush, thank you! [12:00:11] Will also send two patches later for the switchover tomorrow if you don't mind [12:00:20] that's ok [12:06:13] jynus: is that different to https://wikitech.wikimedia.org/wiki/Management_Interfaces#Is_remote_IPMI_enabled? [12:07:04] Emperor: in theory no, but with HP weirdness, I cannot say (e.g. there may be additional root config or something) [12:16:41] jynus: ah, in web-IPMI, I see 'IPMI/DCMI over LAN' is disabled [12:16:55] that is the one I thought should be the same [12:17:06] and is configured on first setup, but sometimes gets resetted [12:17:27] Ah, cool, I'm in to ms-be1057 with ipmitool now. Super. [12:17:29] but please don't take my word for it, just I remember enabling it [12:17:51] should probably update that wikitech page as a thing to check on HP kit. [12:18:18] it is one of those things where I probably did several things and didn't know what worked [12:18:27] but +1 to document it [12:18:59] Yeah, and on a previously-working HP system it's "enabled", so I think that's the issue here. [12:20:40] I thought the ipmi command changed that [12:23:19] Emperor: would that mean that https://wikitech.wikimedia.org/wiki/Management_Interfaces#Is_remote_IPMI_enabled? is not enough for HP? [12:23:40] volans: indeed not - see the paragraph I've just added... [12:25:55] ok, I'm not sure if there is an equivalent via ipmi-config or similar, but surely is doable via redfish API :) [12:26:54] and I bet an iLo one too [12:36:13] Mmm. But I can at least now plan to re-image these two systems :) [12:42:38] Emperor: stub https://phabricator.wikimedia.org/P29656 [12:46:32] btw I'll wait for your ping when you want to discuss next steps on the fix-SSD cookbook ;) [12:49:13] volans: yes, we should talk about that; sometime tomorrow afternoon maybe? [12:49:41] SGTM [12:50:18] shall I ping you on IRC, or would you like a more concrete time? [12:50:34] ping works for me [12:50:44] ack [14:22:15] I am a bit overwhelmed with the amount of backlog only after a week- please ping me if something needs quicker response