[00:07:30] bstorm: back now [00:08:21] 👋🏻 [00:08:23] Hey! [00:08:24] bstorm: if you restore to the same place the files came from there should be no issue with keys [00:08:34] Yes, but that place is not a working server anymore [00:08:41] i see. uhm [00:08:49] I cannot figure out if it just fried or if DCops actually threw it out :) [00:08:51] I think the former [00:09:23] I didn't want to try that process of stopping puppet and screwing with the config on my own when nobody else was around [00:10:15] https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key) is what you were looking at, right [00:10:32] I want to restore the contexts of /var/lib/grafana from cloudmetrics1002 (dead) to anywhere on the filesystem BUT /var/lib/grafana on cloudmetrics1001, basically [00:10:34] Yep [00:10:44] I know I need to create a config [00:10:50] I haven't attempted that yet. [00:11:01] I didn't want to risk the main backup server while everyone was on the weekend :) [00:11:13] again at least not alone [00:11:44] I'm not sure about how the retention works, but it looks like 30 days, which means I'm down to the last minute to restore this lol [00:11:59] I just noticed today how badly we are missing lots of work on grafana-labs [00:12:08] ok.. so.. I haven't exactly done that procedure but we can through it together [00:12:09] after our server failure [00:12:14] Ok :) [00:12:22] at least there is no writign involved on the backup server [00:12:23] Doing it together sounds better than doing it alone at least [00:12:24] just where you restore [00:12:30] would a quick 'fix' be to somehow bump the retention for that backup to give more time for piddling? [00:12:37] nope, you have to create a config file on the backup server [00:13:09] mutante: I'll have to create a file by hand on /etc/bacula/clients.d [00:13:16] The client is missing [00:13:39] So it won't work, keys be damned, until I copy a file there [00:13:51] The docs don't seem very clear on this, but having tried to restore, I'm sure it needs that [00:14:08] And I think puppet will remove it so I have to stop puppet on backup1001 [00:14:18] You see why I was nervous? :) [00:14:23] yea, I see that. first step is to check if that client config is already gone [00:14:30] done [00:14:42] https://www.irccloud.com/pastebin/fvOVHRqk/ [00:14:51] I disabled puppet on backup1001 [00:14:55] I can copy cloudmetrics1001's file [00:15:03] Ok, creating a file then [00:16:00] Ok, the client file now exists [00:16:12] I copied it and simply changed the hostname [00:16:17] ACK, "modify the name/host of the client to match the job to be restored." [00:16:55] So now we need to restart a bacula daemon to pick up the config? [00:17:09] the client file looks good to me [00:17:20] "Then reload the bacula-dir daemon" [00:17:31] I can do that [00:17:37] thank you :) [00:17:51] that failed [00:17:54] logs [00:18:22] permission denied? [00:18:25] I'll check my file perms [00:18:41] root:bacula is needed [00:18:45] It was created as root:root [00:18:47] fixing it fast [00:19:09] ok restart again mutante? [00:19:16] Should be better now :( [00:19:17] sorry [00:19:19] done, looks much better [00:19:23] great [00:19:30] let's add this to the docs..once we are done [00:19:45] 👍🏻 [00:19:53] ok, I see some warnings about removed hosts but seems normal [00:19:56] it's running [00:19:59] That would really upset someone at 3am when things are on fire [00:20:11] ok great [00:20:34] do you want to copy the key from puppetmaster or should I [00:20:39] needs to be pasted to where you restore [00:20:41] Go right ahead :) [00:20:44] ok [00:20:54] cloudmetrics1001 is the destination [00:21:10] I just need to make sure it is not restore over the existing files, and I'll be thrilled [00:21:28] cloudmetrics1001.eqiad.wmnet to be more specific [00:22:43] interestingly one of the 2 files can be read without root but not the other.. on it [00:23:05] ah, well makes sense [00:23:08] Great, thank you [00:23:23] Yeah, as long as the one that can be read is the public key :) [00:24:21] so we have to stop "bacula-fd"? [00:24:30] ok, we have a temp-restore.pem there [00:24:35] now stopping service [00:24:38] puppet first [00:24:44] Ok, and this is on the client? [00:24:52] root@cloudmetrics1001:/root# puppet agent --disable [00:25:02] got it [00:25:34] root@cloudmetrics1001:/root# systemctl stop bacula-fd [00:25:43] vi /etc/bacula/bacula-fd.conf [00:26:02] ok, and then the restore should be doable, when that's done and back running, if I'm reading this right [00:26:02] * mutante points config to temp key in /root [00:26:10] 👍🏻 [00:26:43] looking for "PKI Keypair" not the other lines.. [00:27:20] .I need to move thiss from root to /etc/bacula [00:27:30] huh ok [00:28:06] mv temp-restore.pem /etc/bacula/ssl/ [00:28:10] Now that I'm getting that this part is on the client, it's making sense [00:28:36] ok, so now we can go back to backup1001 and run bconsole [00:28:38] The doc could use, in bold "on the director" and "on the client" before each line :) [00:28:50] yes, true [00:28:55] Alright, shall I try that? [00:29:01] sure, yes [00:29:04] 5 -> restore .. [00:29:06] ok [00:29:26] then you get to a part where it builds a virtual filesystem you can move around in [00:29:37] and then you have to "mark" the files you want [00:29:44] got there [00:29:50] well, let's hope the key part works [00:29:54] :) yay [00:30:09] gonna mark the grafana dir [00:30:16] ok, mark all the files and then it's "done" somehow [00:30:32] which is clearly more files than I need, but it won't blow away the freespace so more is better [00:30:47] and then it should show you another time all the things.. what host to restore from, what host to restore to [00:31:06] Ok, so I'll say "mod" [00:31:11] to change host [00:31:19] yes [00:31:50] eh, wait a sec [00:32:01] let me give bacula the right to read the temp key, heh [00:32:01] looks ready to go [00:32:11] yes please :) [00:32:30] https://www.irccloud.com/pastebin/bdL68j83/ [00:32:39] That all looks good to me. [00:32:43] ok, done, files in ./ssl/ all have same privs now [00:32:45] whenever the client is ready [00:32:47] thanks :) [00:32:49] go ahead [00:33:01] Job queued. JobId=340859 [00:33:11] I guess bacula doesn't believe in progress bars [00:33:21] ok, cool.. now it's just about checking the destination directory [00:33:29] did it already create all the file names but they are empty? [00:33:42] I am not sure how long it takes [00:33:52] sometimes fast sometimes not afair [00:34:18] you can also try looking at jobs from bconosle [00:34:54] what did you pick as destination path ? [00:35:07] I left the default of /var/tmp/bacula-restores [00:35:15] That doesn't seem to exist at the moment [00:35:26] ack, yea, not yet [00:36:00] oh, but we need to start the daemon again too [00:36:08] checks [00:36:12] I tried `status jobid=blah` and that didn't really help [00:36:27] it didn't give me a job status per se [00:36:38] root@cloudmetrics1001:/var/tmp# systemctl start bacula-fd [00:36:39] :p [00:37:02] heh [00:37:06] guess that is needed to write the files [00:37:22] docs dont say it though [00:38:05] https://www.irccloud.com/pastebin/e4iD1TtD/ [00:38:26] I guess that's a socket [00:38:28] can you tell it to run that again? [00:38:41] https://www.irccloud.com/pastebin/P7BtJS4N/ [00:38:45] Yeah I can try that [00:39:28] fwiw, /var/log/bacula exists on client but is empty [00:40:16] [bstorm@cloudmetrics1001]:tmp $ systemctl status bacula-fd [00:40:16] ● bacula-fd.service - Bacula File Daemon service [00:40:16] Loaded: loaded (/lib/systemd/system/bacula-fd.service; enabled; vendor preset: enabled) [00:40:16] Active: active (running) since Sat 2021-06-05 00:36:33 UTC; 3min 27s ago [00:40:25] That looks promising anyway :) [00:40:36] ah, yea, that is running [00:40:48] but we need the director to try it again [00:40:57] to connect to the client [00:41:04] I am starting it...now [00:41:09] cool [00:41:33] root@cloudmetrics1001:/var/tmp# file bacula-restores/ [00:41:33] bacula-restores/: directory [00:41:37] ^ this is new :) [00:41:40] yay! [00:41:46] status looks better on bconsole too [00:41:56] Running Jobs: [00:41:56] Reading: Full Restore job RestoreFiles JobId=340860 Volume="production0471" [00:41:56] pool="Default" device="FileStorageProduction" (/srv/production) newbsr=0 [00:41:56] Files=545 Bytes=376,748,783 AveBytes/sec=41,860,975 LastBytes/sec=41,860,975 [00:41:56] FDReadSeqNo=6 in_msg=6 out_msg=16944 fd=5 [00:41:57] but wait until we see files are not empty [00:42:03] yeah [00:42:09] without the key we would also see this ..so far [00:42:40] files do not look empty so far [00:43:09] job finished [00:43:18] looking great [00:43:22] I can read content in clear text [00:43:29] 484M bacula-restores/var/lib/grafana/ [00:43:32] great [00:43:48] ok, I'll move it to /root so some random reboot doesn't erase it [00:43:56] *nod* [00:44:38] sure enough this is what you need ? [00:45:06] This is what I was after :) [00:45:18] If it doesn't work, i probably have no other recourse anyway [00:45:28] Now we can try to restore the lost content on Monday [00:45:33] oh, look "enable puppet again / let it start bacula-fd or start it yourself" is at the very end [00:45:35] Thank you! [00:45:42] Yeah, let's do that [00:45:44] but .. either not true or we got away with it [00:45:50] starting it early [00:46:02] I'll enable puppet on the director, if you haven't already [00:46:23] please do and remove your client config file [00:46:30] I will shred the temp key on the client [00:46:50] I'm running puppet agent to see if it deletes my file for me (and will remove it not) [00:47:10] partly because it should need to reload bacula-dir [00:47:21] restarting bacula-fd on cloudmetrics1001 after removing temp key..and failed [00:47:22] Notice: /Stage[main]/Bacula::Director/File[/etc/bacula/clients.d/cloudmetrics1002.eqiad.wmnet.conf]/ensure: removed [00:47:28] oh that's fun :) [00:47:30] because I need to remove my edit there as well [00:47:34] that points to the key [00:47:34] yeah... [00:47:45] puppet should fix that? [00:47:49] or maybe not [00:48:04] yea, verifying that right now [00:48:20] it removed the file, on the director, but I didn't see a refresh, so I'll restart that service [00:48:35] do a reload [00:48:39] all we did was a reload so far [00:48:42] for the director [00:48:46] Will do [00:49:12] It's running after the reload [00:49:15] and yea, puppet agent on cloudmetrics fixed the PKI Keypair line .. and started bacula-fd [00:49:18] done [00:49:25] 🎉 [00:49:30] Thank you for working with me on that [00:49:35] nice. you're welcome [00:49:45] All set :) [00:49:48] have a good weekend [00:49:55] * mutante logs out of servers. you too !:) [00:53:38] * bstorm makes a reminder to update the docs too [00:54:54] oh yea, thanks (something about chmod, then to remember to also revert edits in config everywhere and not sure if starting -fd only at the very end is correct now or not [00:55:41] but how would bacula write restored files with the file daemon :) [00:55:53] so I dont think docs can be correct in that part [00:56:03] s/with/without