[08:39:56] marostegui: you mentioned using db-compare with `main_tables.txt` - where do i find this file? [08:41:41] one sec [08:41:47] it is in the repo I think [08:41:53] which 'the' repo? :P [08:42:12] ah. is it actually `tables_to_check.txt`? :P [08:42:13] https://github.com/wikimedia/operations-software/blob/master/dbtools/tables_to_check.txt [08:42:17] haha yeah that [08:42:39] they're so close, i can see how you got mixed up. [08:43:41] I think the original name I had it locally was main_tables.txt [08:44:03] https://github.com/wikimedia/operations-software/commit/82c6ad480d725a5748ed3b9591f4dd0ee20c4962 [08:44:06] see?! [08:44:35] https://github.com/wikimedia/operations-software/commit/2e72675a049dae2c4ba099023c8aded235c30b9a [08:44:54] I Wonder Who Renamed It :P [08:44:58] :) [10:12:21] kormat: in your Copious Free Time, I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/714358 is deployable and https://gerrit.wikimedia.org/r/c/operations/puppet/+/715934 & https://gerrit.wikimedia.org/r/c/operations/software/+/715926 are worthy of your +1? [10:12:26] * Emperor nearly typoed that as deplorable [10:17:48] Emperor: awesome. will look at them now, before i get distracted again [10:18:00] TY [10:23:58] Emperor: LGTM'd first one [10:26:50] * Emperor deploys and waits for 🔥 [10:27:22] Emperor: for the packaging one, i'd like to double-check the multi-instance change. as we don't (yet) have a multi-instance pontoon host, let's do it in prod* [10:27:39] (specifically, let's make the changes to a core::multiinstance host in eqiad, with puppet disabled) [10:29:52] OK; you think make the service update, systemctl daemon-reload and try stoppping & starting one of the mariadbs? ?+ removing the prometheus-myslqd-exporter package at the same time? [10:30:24] very first step: disable puppet. then yep [10:30:55] i'm not very concerned about testing the PME-not-installed scenario [10:31:20] i just want to be reeeally sure that the .service change is correct. [10:33:04] Do you want to nominate a victim? [10:33:57] let's say... db1099 [10:34:36] we should also downtime the host for an hour, just so it doesn't noisy-up icinga [10:34:45] (it won't actually _page_, as it's not in the active DC) [10:37:03] downtime set [10:38:51] if you haven't used it already, puppet disabling is: `sudo disable-puppet "Testing .service changes - mvernon - T289488"` [10:38:51] T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter - https://phabricator.wikimedia.org/T289488 [10:40:48] ta, done. Now making the edit to /lib/systemd/system/mariadb@.service [10:43:11] OK, systemd didn't like %I but is happy with %i. It's almost like the docs SUCK [10:44:19] Now let's test that stopping prometheus-mysqld-exporter@s1.service then stopping and then starting mariadb@s1.service results in joy [10:44:37] 👍 [10:46:01] Flawless Victory [10:47:42] no idea why %i vs %I should be an issue (thanks, systemd), but at least we caught it [10:47:44] nice work :) [10:47:45] Toasty! [10:48:11] sobanski: ham and cheese? [10:48:15] Which gives me confidence that an equivalent override for the pme@ service will give us linked stop/restart too (but I've not written that yet) [10:48:38] kormat: https://mortalkombat.fandom.com/wiki/Toasty! [10:48:38] kormat: presumably I should enable-puppet again; do you want me to undo the .service change first? [10:48:46] aye. fortunately as that's happening in puppet, it's much easier to test [10:49:20] Emperor: yes please re: .service change. enable-puppet (or `run-puppet -e`) will both require the same message used for disabling [10:49:33] (but will print it out, so you don't have to remember it) [10:51:23] done. Now I'll update my CR [10:59:02] (done) [11:00:17] marostegui: it seems we can drop flaggedimages next week. The train is stable and no issues so far. [11:00:41] maybe we should get how much space it saves (from backups?) [11:00:42] oh sweeeet [11:00:54] Amir1: that will need to happen after the switchover though (the drop I mean) [11:01:38] whatever works for you :D [11:01:43] As long as I get my beer [11:04:21] Amir1 is motivated by beer-pressure [11:05:02] Anything Haraam, The more Haraam the better [11:05:28] Amir1: https://phabricator.wikimedia.org/P17159 [11:05:46] that's a lot of digits [11:06:33] 36GB? That doesn't make much sense [11:06:45] https://phabricator.wikimedia.org/T289248#7304099 [11:07:43] Amir1: keep in mind that dumps are a lot smaller [11:07:49] let me check the dump files [11:07:57] Thanks <3 [11:08:41] https://phabricator.wikimedia.org/P17159#88018 [11:08:46] It can be the ibd files being bigger, one thing is that these clean up scripts is also cleaning this table too [11:09:45] yeah, the dump files do not have fragmentation [11:10:14] I can see the impact once we drop them in db host metrics [11:10:41] I assume it'll be around 200GB in total. Not too much but useful [11:14:42] Definitely useful [11:14:50] Thanks for spending time on that [11:44:37] Emperor: See -operations :) [11:57:03] https://twitter.com/furioursus/status/1431434592376922122?s=19 :D [11:58:16] PROBLEM - MariaDB sustained replica lag on s1 on db1099 is CRITICAL: 4268 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1099&var-port=13311 [12:01:49] ACKNOWLEDGEMENT - MariaDB sustained replica lag on s1 on db1099 is CRITICAL: 3679 ge 2 MVernon mea culpa. Replication restarted and is catching up https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1099&var-port=13311 [12:03:50] I did tell y'all I'd not worked with mysql before /o\ [12:04:16] anyhow, replication going again OK [12:06:21] :) [12:06:27] No worries, it is a very easy thing to forget [12:08:23] Emperor: As long as you don't break the active DC, you wouldn't get the tshirt unfortunately [12:08:40] oh we shrunk to sticker these days [12:11:50] RECOVERY - MariaDB sustained replica lag on s1 on db1099 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1099&var-port=13311 [12:23:22] Emperor: sorry, that was my bad, never thought of it [12:26:41] I will have my revenge!^W^W^W^W^W No worries :) [12:29:20] Emperor: +1's on all your CRs, as penance [12:40:27] :) [12:47:38] marostegui: db-compare is looking very good for db2118. no issues found so far [12:47:46] sweeeeet [13:41:42] I've pushed a CR with the relevant multi-instance systemd incantations [13:41:49] https://gerrit.wikimedia.org/r/c/operations/puppet/+/716306