[04:28:16] * kormat swears at the time of day [04:28:29] But this is good for you [04:28:35] You start the week fresh and early [04:28:42] With the morning breeze [04:29:59] * kormat chooses not to respond, preferring to stay employed [04:30:08] haha [04:53:01] morning [04:53:07] hey sobanski [04:57:48] the banner is wrong [04:57:51] it says 06:00 UTC [04:58:41] I love how https://test-commons.wikimedia.org/wiki/Main_Page says "he wiki is scheduled to be closed and deleted in December 2019, so don't get too attached to it!" [04:59:04] I think we had a task somewhere to delete it [04:59:16] Or we wanted to create one and someone said: not yet [04:59:20] Don't remember exactly [07:06:56] Ugh, the scaffolders weren't kidding about arriving "first thing" [07:23:11] Amir1: https://phabricator.wikimedia.org/T290057#7333689 pretty good! [07:41:13] media backups of commons are 21% done (18 million files / over 80TB backed up) so far [07:58:06] jynus: s4 codfw finished, so going to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/715919 [07:58:15] ok [08:55:49] I am going to do some maintenance on haproxy, so going to stop puppet on dbproxy* [08:56:13] this is a maintenance we scheduled for today (valentin and myself) [09:09:35] maintenance finished [10:48:46] it's nearly lunchtime, so I'll hold off merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/716306 until after lunch [10:50:17] Emperor: sounds good, and yeah, +1 to merging it today [11:07:35] when you come back from your meeting, marostegui, this is how I am generating the stats: https://phabricator.wikimedia.org/P17225 [11:09:23] the idea is I first normalize the crazy mw metadata into a normalized table (files), and then process them as if it was a queue, and store the backed up files properties in another table (backups) [11:37:56] jynus: thanks, that's useful [11:38:01] question: what's wiki 392? [11:38:42] oh, because the tables became quite big, I had to normalize some fields (wikis, file_status, backup_status) [11:38:53] wiki 392 is commons [11:38:57] ah gotcha [11:39:14] there is an arbitrary identifier for each wiki on accesory tables [11:39:31] there is some documentation about the tables on the design doc [11:39:43] wow 194M already backuped [11:39:43] but I have yet to do more operator-oriented docs [11:39:47] and still 56M pending XD [11:40:21] be carful becuase that's when mediawiki can get deceitful [11:40:38] there can be multiple entries for each file backed up [11:40:47] I deduplicate then before sending them to backups [11:40:57] each revision for each file? [11:41:05] so there is no 1:1 between each file and each backup [11:41:25] it is more of an n:m [11:41:33] with higher number on the metadata [11:41:42] you backup each revision or just the last one? [11:41:43] marostegui, we backup "blobs" [11:41:49] right [11:41:59] and each blob can be multiple revisions of multiple files [11:42:15] e.g. a file reverted multiple times between 2 versions [11:43:14] so it will show multiple versions on the metadata, but on storage and backups table, there will be only 2 of it [11:43:49] in a way I had to do backups and normalize the mw schema at the same time [11:44:37] Ah I see, I get it [11:44:45] Interesting [11:45:03] I talked with amir briefly about this issue [11:45:14] maybe some could be used to "fix" the mw schema [11:45:56] it is not only about duplicates, also about writing the same image many times on the db and having no unique identifiers [11:46:18] which was why I had to end up using new ones (sha256 hash) [11:48:53] Yeah, I was going to ask if this can be reused on mw in general [11:49:24] not in general, but I would like to do some "things learned" to push mw schema into a saner schema [11:49:40] you mean it is not sane now??? :p [11:49:40] both for the database and for swift [11:50:03] look, once I will tell you all the edge cases and you will die of exhaustion [11:50:06] :-) [11:50:23] So once this first batch is backuped, what's the plan with the incrementals? you've got anything in mind? [11:50:25] like using 3 different sharding functions [11:50:48] or 2 hashing methods [11:51:09] first I want to make a "full"/incremental backup [11:51:36] so running the same essential process and how I would update the metadata [11:52:07] I know it will work for incrementals, but not sure how I should store mediawiki metadata (all historical records? only the last status?) [11:52:32] assume we will always have a db available and not store metadata? [11:52:37] lots of questions [11:53:00] but in the end, it should probably come from kafka, triggering a check for single files [11:53:02] so once the first batch is done, are you in a position where you could re-run a backup job and it would only backup the new files? [11:53:28] in general yes, but I need to decide how exactly- regarding metadta [11:53:49] but yes, if I dropped the files table and rerun it [11:53:58] it would only backup the new files [11:54:08] the question is how to merge the old and the new metadata? [11:54:18] insert new and update the latest status of the file [11:54:20] ? [11:54:33] yeah maybe [11:54:34] that needs discussion and will depend on how it is supposed to be used [11:54:39] but it is to be discussed [11:54:39] yeah [11:54:48] maybe we should keep all the snapshots for redudnancy [11:54:54] e.g. a files_archive [11:55:04] I have not gone deep on that [11:55:14] so the design is prepared for incrementals [11:55:29] but it is not implemented, needs more thinking of what we want to do [11:55:56] but it is more of a mw problem, storage is 100% prepared for it [11:56:05] that's good yeah [11:56:12] Glad to see we are doing a first run already [11:56:26] I would also like to see a dashboard [11:56:31] like with database backups [11:56:37] to expose stats easilly [11:56:51] and there has been a lot of compromises that "we will see later" [11:57:17] for example, consistency checking [11:57:52] we shouldn't trust minio blindly- we need to check what was uploaded can be downloaded and also is the same as ground truth [11:58:15] and what if swift itself has an inconsistency with the db? [11:58:23] or between datacenters? [11:58:42] I am sure on outages, many files have become orfan/lost/etc [11:58:54] lots of work pending... step by step :-) [11:59:00] hehe yeah, lots of work ahead [11:59:06] but lots of work already done too [11:59:17] not to speak about offline backups [11:59:41] that's for the 3 categories: general/bacula, dbs and media [12:00:33] and people are already asking also for the next big thing, dumps! [12:00:49] and that is a complete different problem [13:01:39] marostegui: awesome \o/ 50GB [13:02:00] Yeah! Pretty cool! [13:27:46] Emperor: sorry to ask but what's IYSWIM? [13:28:49] if you see what i mean [13:29:20] that one is at least kinda "standard" in UK-english. ;) [13:29:26] sorry, me and my acronyms [13:29:37] MAMA [13:29:45] 🐟 [13:29:47] I KILLED A MAN [13:29:48] ah thanks [13:29:56] acronyms are hard! :-( [13:30:25] is there a rule to build them? cause it looks like pretty much anything can become one no? [13:31:03] There's an element of usage/custom - so some are very common, some were common in some community I was part of at some point in the past, and some are probably just me being weird... [13:31:27] sobanski: :D [13:31:51] hahaha [13:32:15] marostegui: they're typically used for common subclauses. but of course you'd have to have a good feel for what subclauses are common in order to make a prediction.. [13:32:42] ('subclause' might not be the correct term. i'm not a word-smith) [13:33:02] WYGIWYGAINGW is one of my favorites: https://acronyms.thefreedictionary.com/What+You+Get+Is+What+You%27re+Given+and+It%27s+No+Good+Whining [13:33:07] wtfff [13:33:42] Terry Pratchett, a class of its own [13:33:46] hahah [13:33:51] pterry ftw [13:34:09] marostegui: it's a riff off of WYSIWYG [13:34:09] i think it would take me more time to actually write the acronym than the whole sentence (as a non native English speaker) [13:34:14] (What You See Is What You Get) [13:34:54] I think I know that one from the html times [13:35:30] yah [13:37:05] ...and always remember that the F in RTFM stands for "fine" [13:37:18] xddddddddddd [13:38:57] * Emperor resolves https://phabricator.wikimedia.org/T289488 [13:39:07] I think that's my first resolved phabricator task... [13:39:47] congratulations! [13:40:46] \o/ [14:00:26] kormat: https://gerrit.wikimedia.org/r/c/operations/puppet/+/719120 look plausible? [14:00:41] it certainly _looks_ like a url [14:01:31] 🐟 [14:02:53] Emperor: oh. i got _very_ confused there for a minute. i `git grep`'d for the IP of pc1007, and found a hit in `modules/nutcracker/manifests/init.pp` [14:03:05] but it turns out to be in an example comment [14:06:25] weird, backup1001 root filesystems increased a lot since yesterday [14:06:40] no data should be there, so I which check if we have lot of error logs or something [14:06:46] *will [14:08:31] it could also be apt upagrades or somethiung on /var [14:10:45] none of that, interesting, I just learned that file attributes get temporarilly written to /var/lib/bacula while backup is ongoing [14:10:58] and this backup has 7GB of file attributes! [14:21:55] that's a lot of files :) [14:38:23] I've filed T290437 for research [14:38:24] T290437: CI backups on contint1001 generating 6GB of file metadata- not happening before- potentially slowing down or making impossible a recovery - https://phabricator.wikimedia.org/T290437 [14:41:16] while bacula doesn't have issues handling very large or many small files, backup jobs larger than e.g. 300 GB with lots of small files tend to get quite slow to be handled on recovery by the client hosts