[01:09:24] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 7.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:10:48] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [06:31:59] thanks Platonides ! [07:51:38] https://mariadb.org/a-positive-new-chapter/ [08:20:08] I would write a blog post "MariaDB: why you should worry"- but apparently every time I say something even remotely negative an army of people appear, so I will keep quiet [09:13:06] https://toot.bike/@brie@infosec.exchange/111253467143797686 ? :) [09:14:55] I pronounce it EZE KU ELE in Spanish, of course! [09:21:35] that led me to wonder how you spelled it in languages like Ukrainian (where the alphabet doesn't have all the necessary letters). Disappointingly, the answer is "use the other alphabet" - https://uk.wikipedia.org/wiki/SQL [12:36:18] could I silence db2109 alertmanager alert for a day or so, until things get fixed? [12:37:13] arnaudb: ^ [12:37:31] oh yep; sorry! [12:37:35] will downtime it [12:38:31] not sure if that works, sadly [12:38:56] but in any case, I will let you handle it [12:41:44] should be all covered jynus, let me know if there is still noise coming your way [12:42:36] thanks [13:42:32] btullis: I think we're ready to take the first step toward https://phabricator.wikimedia.org/T347738, namely re-imaging a canary [13:44:07] urandom: Ack. Sounds good to me. Would you like any help from me, or are you OK to proceed? [13:44:09] I do not know if the aqs service will work ok on node 12, but my thinking was to just smoke test it. If it doesn't work then we can depool that node, and evaluate next steps (it should be fine to leave it in that state). [13:44:35] Yep, agreed. [13:45:36] btullis: you are welcome to join in at any time you like! :) (but otherwise I think I have it...) [13:45:52] I don't know if you saw: https://gerrit.wikimedia.org/r/c/operations/puppet/+/965767 [13:46:54] fingers-crossed that works OK, the first node will be the test [13:51:33] Yep, I saw that but didn't give it a proper review until now. Sorry for the delay. [13:52:26] It'll pause in the installer, so I'm happy to give a second pair of eyes before you commit any changes to disk, if that helps. [13:52:56] It gives a nice point to go back and tweak the reuse recipe if it doesn't look right. [13:54:12] btullis: sure, I'll take you up on that [13:54:23] ๐Ÿ‘ [13:54:34] I have a meeting shortly โ€”30 minsโ€” and then I was planning to try thereafter [15:08:18] btullis: aqs1010 is dropped to the partitioner waiting confirmation, if you want to have a look [15:08:36] Will do now. [15:15:36] Hmm. Looks safe, but I don't see the volumes themselves. I would expect to see the mountpoints for `/` and `/srv/cassandra-a`,`/srv/cassandra-b`. [15:16:52] I think you're right [15:19:16] I'm looking for a screenshot in phab of a previous resuse-parts-test but I can't find one. [15:19:38] I think if we did press go-ahead, it would fail with a red screen saying 'no root partition defined' [15:21:13] ...but it might be better to have another look at that recipe. [15:27:12] I'm looking, but I can't see what would be wrong [15:27:29] ofc, I cargo-culted this whole thing, and got help from Luca :) [15:30:17] I might also be too cautious. I mean the partitions are totally mentioned in the patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/965767/3/modules/install_server/files/autoinstall/partman/custom/reuse-aqs-cassandra-8ssd-2srv.cfg#14 [15:30:59] Perhaps if we were to say OK, it would do the next step of 'configure software raid' and then return to this menu with the right volumes shown. [15:33:21] when I reconnected, I (apparently) accepted & continued, and you were right, I got a "no root partition defined" error, and was made to go back [15:33:54] OK, try 'configure software raid' and see what it says. [15:34:11] Maybe it will recompute partitons atrer that. [15:34:18] ...sfter that [15:34:28] after that [19:48:02] :'-DDDD https://phab.wmfusercontent.org/file/data/pgaekkuigqkxhyjcd7vh/PHID-FILE-antdrxb2ozahiupugrnc/backup_time.png [19:50:12] ^ Mathew only 1 last email warning for this week yay