[05:25:27] I am failing over pc1 master for codfw [05:58:36] done [05:58:40] Going to go for pc1 eqiad now [05:58:47] And then will move pc2014 to pc2 [08:08:39] awesome. Thanks. [08:15:00] Started reboot script on s8 [08:15:39] before doing s1 please sync with me [08:15:56] Please review this when you get a chance https://gerrit.wikimedia.org/r/c/operations/dns/+/916895 [08:16:00] I want to failover m3 [08:16:16] sure. Thanks. [08:16:29] thanks [08:42:03] several snapshots have a reduced size today [08:42:41] s1, s5, s6 [08:45:31] hmm, I don't remember any major change happening right now [08:45:33] I think amir was running some scripts? [08:45:43] I thought there were some clean ups? [08:45:51] my script should increase the size right now [08:45:55] right [08:46:16] jynus: do you know what table? [08:46:32] I can get you the info [08:46:34] Maybe quarterly clean up flaggedtemplates got kicking in [08:46:45] Thanks [08:47:00] but it is only on certain datacenters [08:48:00] the decrease? Maybe some alter optimized tables [08:48:05] codfw on s1, eqiad on s5, codfw on s6 and it happened between May 6, 2023, 4:48 a.m and May 8, 2023, 4:57 a.m. [08:50:43] jynus: can any of the RO external store hosts be rebooted or there are still backups on going? [08:51:02] I don't backup regularly the ro hosts [08:52:20] althought restarting today, before 0 hours es1022, es2022, es1025 and/or es2025 would be a win [08:52:36] ok, taking care of that! [08:53:03] that way they they are not touched for the rest of the week [09:16:23] it seems to be flaggedtemplates.ibd [09:17:34] awesome. I wrote a quarterly clean up for it in puppet [09:17:58] https://phabricator.wikimedia.org/P47795 [09:18:02] Thanks for checking, way bigger changes will come soon [09:19:08] wait, I am not so sure, the difference is only 50MB, but I see a difference of -700MB [09:20:36] I might be wrong but that should be increases [09:20:48] like revision only grows [09:21:40] btw, s5 and s6 are so tiny 568.3 GB 🥺 [09:22:17] yes, not worried about the size [09:22:39] but why it shrinked, normaly it doesn't- and that could mean a problem with backups [09:23:11] if size_job_22643 is later job than size_job_22626 (assuming id is incrementing), these are increase in sizes [09:23:21] yes, I can confirm [09:24:14] if swap them, we should get the ones that shrunk [09:24:30] I can do that as well, just don't want to step on your toes [09:25:19] ok, I got better info on enwiki [09:26:03] https://phabricator.wikimedia.org/P47795#193771 [09:31:49] <_joe_> hi, say a team needs a database to use with a new service in production, what is the process? [09:31:56] <_joe_> a pointer to the documentation is enough :) [09:32:07] Amir1: and this is s5 https://phabricator.wikimedia.org/P47795#193778 [09:34:33] My home connection is gone. On phone now. I check and let you Joe [10:02:39] So I am not super-worried about the size changes, but I am going to keep digging to make sure all data that should be there is still there [10:04:50] maybe just something ran that optimized tables as a side effect? [10:05:17] we will see it better with tomorrow's logical backups [10:34:41] okay, back-ish. It seems the biggest hit is on mgwiktionary. It might be something got deleted there? [10:37:31] _joe_: I don't think we have a doc for it tbh, there is https://wikitech.wikimedia.org/wiki/Creating_new_tables for new tables on mw that's quite similar. Generally, just create a ticket with DBA and we can take a look [10:37:53] <_joe_> Amir1: ok, thanks [10:38:26] <_joe_> I hoped you had a standard request form to reduce the back and forth [10:38:41] <_joe_> also, we really need that documentation :) [10:38:53] <_joe_> and that's for all of you here, not just Amir1 [10:39:58] Yeah. If it's a service being built from ground-up, I'd like to help with the db design [10:41:18] _joe_: https://wikitech.wikimedia.org/wiki/MariaDB#Database_creation_template ? [10:41:57] They aren't great docs but something is there [10:42:32] Why that's in the MariaDB docs? That's mostly for DBAs themselves :D [10:42:43] that explains why I couldn't find it. Anyway [10:42:45] <_joe_> ok thanks [10:43:05] <_joe_> Amir1: the db is simple enough it shouldn't need much work, but yeah I think they already have a schema [10:43:16] okay cool [10:43:18] Amir1: I searched wikitech for QPS. I knew something existed. [10:43:18] <_joe_> the dataset is ludicrously small though, about 5 GB IIRC [10:56:03] _joe_: As long as you are fine accessing it via the dbproxies, it should be fine [10:56:23] Just create a ticket with https://wikitech.wikimedia.org/wiki/MariaDB#Database_creation_template and I can get it done today or tomorrow [11:00:54] jynus: I need to restat my browser, be late for a min [11:01:02] ok, np [12:36:29] jynus: marostegui do we use OSC? https://gerrit.wikimedia.org/g/operations/software/wmfmariadbpy/+/dbec737ec27df98ed256a24bd881b5094875914e/wmfmariadbpy/cli_admin/osc_host.py [12:36:40] we don't [12:37:41] Gonna drop it [12:58:04] <_joe_> turns out they already created a task https://phabricator.wikimedia.org/T305114 [12:59:18] _joe_: https://phabricator.wikimedia.org/T305114#7949919 [12:59:52] <_joe_> marostegui: yeah it's almost time :D [13:00:09] _joe_: But there are issues they need to fix: https://phabricator.wikimedia.org/T305114#8471286 [13:00:19] They need to come up with a strategy to delete the rows in batches [14:37:36] Amir1: this is the query I do: https://github.com/wikimedia/operations-software-mediabackups/blob/master/mediabackups/cli/add_recent_uploads.py#L19 but I don't want to create a ticket because I can work with failures from time to time [14:39:00] if in the future rcs get prioritized, I will report it, but not worth spending time on this when many other things are on fire [15:36:33] over 115 million files backed up on both datacenters! [15:42:45] @Amir1 o/ what can we do to move forward with the checkpoint storage (swift?) stuff https://phabricator.wikimedia.org/T330693 ? Not sure if urandom is on vaca, haven't heard from him in a while. [16:33:41] ottomata: I just updated the ticket [16:34:17] Thank you urandom ! [22:07:48] jynus: I found this thread while digging in mailing list archives for something else, and I thought you might get some joy from reading Jimmy's thoughts on backups from 2004. https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/VPWUWOWCWUB45TJJL4MMFQKQFX6QBCIY/ [22:09:03] actually, I have been reading many of those for: https://docs.google.com/presentation/d/1Bty9YV751Fr4f1tfkGLNLS_BVg3ljA1_nl42z_YoQmU [22:09:32] including the many times dbs went down and lost data [22:10:03] note may comment above was re: media backups, which are now almost real-time [22:10:19] we have way more than those for backups in general [22:10:44] nice presentation. I'm digging for citations related to early days of what is now WMCS for a hackathon presenation.