[07:33:09] Amir1: that isn't showing me anything right now [10:24:47] Got fixed [10:53:48] I'm depooling and rebooting clouddb1018 [10:58:22] hmpf, s7 is lagging behind on clouddbs, the sanitarium is fine though [11:02:46] !log restart backup* hosts [11:02:46] jynus: Not expecting to hear !log here [11:29:07] hmm clouddb1018 is not coming back up after reboot (I can ssh to mgmt and the console is blank) [11:29:46] any tips? [11:37:36] powercycle from the management [11:38:01] then pray/file a ticket if it doesn't come back. Observe the entire post to see what went wrong. [11:43:06] thanks, trying powercycle... [11:43:21] what do you mean "the entire post"? [11:43:38] as in, be on the console output [11:43:44] after powercycle [11:43:53] right [11:43:57] that way you will see if it is fully toasted [11:44:05] or tries to netboot [11:44:10] or a disk fails [11:44:14] or whatever :-D [11:45:50] I guess technically it is not called POST for non PCs: https://en.wikipedia.org/wiki/Power-on_self-test [11:46:35] ah-ha I get the acronym now :) [11:47:16] console com2 is just blank, both before and after the powercycle [11:47:28] I will open a ticket :/ [11:47:30] even a hard one? [11:47:54] yeah, file one, you may have a bad motherboard [11:48:18] or just needs some physical help in the best case [11:52:45] I haven't tried the hard one, I guess it's "serveraction hardreset"? [11:53:26] yep [11:54:00] the first one simulates a press of the reset button, the second removes power fully and restarts it [11:54:35] You can try to reset the idrac as well [11:54:51] I mean, that seems to be working well [11:54:59] as he is logged in :-D [11:55:50] I've had the case where I could log in to the idrac but couldn't get the com2 console to work and had to hardreset + idrac reset to get it to work again [11:56:02] oh, interesting [11:56:55] what's the reset command? [11:57:13] hardreset didn't help [11:58:26] claime: resetting the idrac is "racreset"? [11:59:13] dhinus: yeah [11:59:27] thanks, trying [12:02:37] console com2 is still blank after the racreset :/ [12:03:33] Then you may need dcops assistance [12:03:46] yup, filing the task [12:05:34] T367499 [12:05:34] T367499: hw troubleshooting: server fails to reboot for clouddb1018.eqiad.wmnet - https://phabricator.wikimedia.org/T367499 [12:10:58] how long do we keep mariadb binary logs for? i.e. how long do we have to bring that host back online? [12:34:25] FIRING: SystemdUnitFailed: envoyproxy.service on moss-fe2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:49:25] RESOLVED: SystemdUnitFailed: envoyproxy.service on moss-fe2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:03:25] FIRING: SystemdUnitFailed: ceph-562c260e-2a60-11ef-9c1c-bc97e1bbace4@mgr.moss-fe2002.aenjec.service on moss-fe2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:08:25] RESOLVED: SystemdUnitFailed: ceph-562c260e-2a60-11ef-9c1c-bc97e1bbace4@mgr.moss-fe2002.aenjec.service on moss-fe2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed