[07:23:36] !log admin cloudvirt1023 seems to have gotten some hardware issue from racadm lclog view "System CPU Resetting.", rebooting and doing memory checks (T315718) [07:23:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:23:40] T315718: NodeDown - https://phabricator.wikimedia.org/T315718 [07:39:56] !log admin cloudvirt1023 is back up, VMs are starting to recover (T315718) [07:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:40:00] T315718: NodeDown - https://phabricator.wikimedia.org/T315718 [07:41:41] !log tools cloudvirt1023 down took out 3 workers, 1 control, and a grid exec and a weblight, they are taking long to restart, looking (T315718) [07:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:43:32] !log tools rebooted tools-k8s-control-2, seemed stuck trying to wait for tools home (nfs?), after reboot came back up (T315718) [07:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:44:07] !log tools all k8s nodes ready now \o/ (T315718) [07:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:55:27] !log dwl after cloudvirt1023 reboot, the vm irc-buster does not seem to have rebooted correctly (no ssh, no console), rebooting (T315718) [07:55:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dwl/SAL [07:55:30] T315718: 2022-08-20 NodeDown: cloudvirt1023 - https://phabricator.wikimedia.org/T315718 [08:04:40] !log dwl after cloudvirt1023 reboot, the vm irc-buster shows as running, but even after restart is not responsive through ssh nor console (T315718) [08:04:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dwl/SAL [08:04:43] T315718: 2022-08-20 NodeDown: cloudvirt1023 - https://phabricator.wikimedia.org/T315718 [08:16:47] opened a task for the dwl vm issue, gonig back to weekend mode :) [13:57:03] legoktm dhinus: btw, the terraform thing, what would making that work involve? is it something I could help with? [13:58:14] or is it something that cloud VPS admins need to sort out? [14:10:47] Haven't looked at it yet, I think one question is how would Terraform authenticate to the OpenStack API? [14:12:11] Once authentication is sorted, Terraform should "just work" and it's only a matter of writing examples and maybe reusable modules. [14:13:50] proc: yeah, the authentication is the main thing, actually opening up the firewall rules is simple [14:14:37] openstack (the software cloud vps uses) has something called 'application credentials', which are basically a per-user per-project api keys [14:15:47] I haven't tested those, but if they work with our custom auth code (TOTP support + something to block password logins on the api) I think those should be perfect for the terraform use case [14:17:07] if you'd be interested in testing those and possibly writing some docs, I think we could just give you access to our testing environment [14:17:19] Related: https://phabricator.wikimedia.org/T294195 [16:01:00] tbh I've never actually used terraform before. it's next on my list after I finish learning ansible... [16:01:14] proc: also while you're here, you were pinged at https://en.wikipedia.org/wiki/Wikipedia:Bots/Noticeboard#Help! [18:02:12] i cannot ssh to one of my instances: [18:02:17] [annika@pbp ~]$ ssh irc-buster.dwl.eqiad1.wikimedia.cloud [18:02:18] channel 0: open failed: connect failed: No route to host [18:02:18] stdio forwarding failed [18:02:18] kex_exchange_identification: Connection closed by remote host [18:02:18] Connection closed by UNKNOWN port 65535 [18:04:32] the action log says it was stopped and started today by novaadmin, not sure if related [18:06:38] annika: you should have had an email [18:06:42] Your VM is broken [18:06:47] Let me get the task for you [18:07:07] annika: https://phabricator.wikimedia.org/T315720 [18:07:29] Cc dcaro_away [18:08:09] Doesn't look like you were included on the task [18:13:17] yeah, i was added later, that didn't trigger an email apparently [18:13:21] thanks [22:02:02] Hello [22:03:20] Although it may be a bit abrupt, I wanted to ask if there are currently volunteers who can review my toolforge membership request? [22:13:20] Someone will get to it soon. Feel free to hang around in case they have questions or you have any issues. Don't forget that it's the weekend so anyone responding would be doing so as a volunteer on their own time (and a few people with a access do it as a volunteer rather than staff anyway) [22:17:18] Okay [23:20:23] !log tools.mjolnir Updated mjolnir from 1.2.1 to 1.5.0, now running on node16 [23:20:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.mjolnir/SAL [23:26:47] small gotcha in migrating, `nodejs` no longer exists, it's just `node` now