[01:08:06] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 5.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:09:34] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [07:59:39] arturo: I was reading some stuff about the various OpenStack network options [08:00:15] topranks: wrong channel? :P [08:00:27] taavi: sure is :P [08:00:29] sorry folks :) [08:31:12] networks only appropriate for data-persistence if you're talking about pingfs ;-) [09:41:47] Emperor: LOL! [09:41:57] TIL https://code.kryo.se/pingfs/ [13:07:16] I suspect alcohol was involved there (re: pingfs) [13:07:46] Emperor or urandom: do you know about the different thanos clusters enough to give me some brief introduction [13:08:03] about the types, not the whole service [13:08:26] e.g. titan vs ms-thanos vs some more? [13:08:44] jynus: I only know that work was done recently to move the thanos-specific code from the "thanos cluster" (read: swift), to its own called titan [13:09:00] ok, that is already more than what I knew ! :-D [13:09:10] is there more cluster other than those 2? [13:09:18] *are *clusters [13:09:31] more than two in what context? [13:09:34] thanos? [13:09:37] yes [13:09:55] I don't know... not that I'm aware of [13:10:19] thanks, that is useful already! I knew you at least would have more context than I did [13:10:42] I saw a gap on docs but for that will ask obs. team [13:10:53] *I will. [13:11:08] go.dog has been working on that split recently, I'm not sure what the status of it is [13:11:26] I supposed he was the person to contact, but he is out this week [13:11:46] so hoping you knew at least about that split (which you did) [13:11:51] thank you! [13:44:28] jynus: there is one thanos-swift cluster (spanning both DCs); the backends are regular swift backends, the frontends used to both swift-frontend and thanos-frontend, but are being split so that the thanos-fe nodes will be just swift-frontends and the titan-* nodes will run the thanos software [13:47:53] thank you, there was a brief thanos-query glitch and a restart on titan hosts fixed it, so it must be at least partially done [13:52:42] Yes, I think the work is nearly complete if not actually complete yet (not entirely sure) [13:55:15] jynus, what does it mean when I run a bacula restore job and it creates the files and dirs on the target but the files are empty? [13:55:38] the file I'm trying to restore is /var/run/openldap-backup/backup.ldif on cloudservices2004-dev, e.g. job 529279 [13:55:58] it means that it was unable to decrypt the files [13:56:21] as the metadata is not encrypted [13:56:42] did you use the same key to decrypt as to take it? [13:57:39] other option is that you backed up empty files, but the first case will show on logs as "wrong key" or so [13:59:07] jynus: I was following the guide here: https://wikitech.wikimedia.org/wiki/Bacula#Restore_(aka_Panic_mode) <- no mention of keys or decryption there that I an see [13:59:31] if it backed up an empty file would the job have 2908304 jobbytes? [13:59:51] can you send the id of the restore- that way I can check the logs [14:00:27] I don't think I have it since I ran the restore yesterday but I'll try again today and get it to you [14:00:41] that's ok, give me an aproximate time [14:00:49] I can search it with that too [14:01:34] 27-Sep 19:41 cloudservices2004-dev.codfw.wmnet-fd JobId 530842: Error: Missing private key required to decrypt encrypted backup data. [14:01:37] around 14:30 my time, which is... 19:30 utc [14:01:42] ^ I think it is this one [14:01:46] probably [14:02:09] If you search "Error: Missing private key required to decrypt encrypted backup data." on the wikiteck page [14:02:24] it will direct you to use the method: https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key) [14:02:37] backups are client- encrypted [14:02:53] ok, that's a result of the client being reimaged then [14:03:07] yep, you can use that backup method then :-D [14:03:11] *recovery [14:03:14] btw how did you find that log? I tried to read the messages but it was a firehose [14:03:30] he he, no worries, it is hard- that is why I asked for an id [14:03:35] ok [14:03:51] so my next thing was to grep for "restore" on the yesterday log [14:04:03] on: backup1001:/var/log/bacula/log [14:04:15] ah, I was only using the cli [14:04:18] I have meetings now but I'll try following that guide when I get a break. thank you! [14:04:30] yep, that I am sure it will fix your issue [14:04:36] awesome [18:02:40] jynus: I got my file restored. Thanks!