[09:32:01] Can I restart db2097? it seems to be memory-leaking slowly, after 6 months: https://grafana.wikimedia.org/goto/nY1Rm6pSz?orgId=1 [09:34:14] I am not using it [09:34:36] I see no ongoing schema change on it, so I am going to do it quickly [09:34:45] ok [09:44:59] reboot done, all ok [10:55:18] jynus: good morning, if you're around I have a quick question for you, I'm trying to restore the last backup for netboxdb to complete the test. But I want to restore it to the test host. I'm following the docs, and I'm at the "modify the restore job" step, and when I select "5: Restore Client" it gives me a list of hosts but the one I'd like is not there. Should I restore it to a cumin [10:55:24] host and then scp it? [10:55:41] I think netbox-dev2002 has no backups setup so possible that's why [10:55:58] indeed, that's why- it needs to be configured as client [10:57:46] ok, and out of curiosity why when selecting the host I had 325 hosts to pick from and when selecting the restore host it gives me only 109 of them? [10:58:29] there are jobs on the db from 325 hosts [10:58:41] the 109 are the current clients [10:58:52] current vs history [10:59:05] got it, make sense :D [10:59:14] jobs on metadata vs current clients you can recover to [10:59:40] (I cannot discard there could be some garbage ones, but one should be bigger than the other) [11:00:03] sure sure [11:02:43] jynus: I go [11:02:46] *got [11:02:48] 05-Feb 10:58 cumin2002.codfw.wmnet-fd JobId 551639: Error: Missing private key required to decrypt encrypted backup data. [11:03:05] yep, we discussed that [11:03:19] right because the different host [11:03:22] I need to use the other key [11:03:36] https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key) [11:06:24] This is why I insist so much about trying to do a recovery every some time [11:07:03] it is not that hard- but in an emergency, the difference between doing it once before and never, in panic, is quite big [11:07:09] I wonder if the recover to another host while the source host is still available could be simplified [11:08:16] I'm currently dping it the other way around (restore on the source, then move it, it's small enough) [11:08:21] yeah, there are plans to automate it [11:08:54] but it always drops to the bottom of the barrel- given it doesn't happen often [11:09:45] nice [11:10:21] think it is "hard" for a reason - privacy, so it is a balance between both [11:11:19] I am taking a coffee break, let me know later in private if I can help further [11:11:51] I like that you are testing the recovery fully ❤️ [11:13:23] thanks! I think we're ok for now :) [11:17:58] I'll restore the file to the db later, need to fist check noone has temporary data they need in netbox-next ;) [13:16:36] I am going to commit this if there are no objections https://gerrit.wikimedia.org/r/c/operations/software/+/997422 [13:26:43] marostegui: FYI we'updated the m2 maintenance docs regarding debmonitor, diff is https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2Fmisc&diff=2145558&oldid=2126423 [13:27:02] oh thanks volans [13:27:05] TL;DR as we're now deploying debmonitor as a deb package some path/name of unit has changed [13:27:34] just that, the behaviour hasn't changed :D [13:28:03] right, the only times we barely interact with debmonitor is when we switch m2 [13:28:13] but normally we don't even have reload/restart anything [13:28:24] yeah it should self-recover fairly quickly [20:45:20] PROBLEM - MariaDB sustained replica lag on s8 on db1154 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13318 [20:46:36] RECOVERY - MariaDB sustained replica lag on s8 on db1154 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1154&var-port=13318