[00:52:59] i'm looking at toolsdb [00:53:57] oom killed once again [00:53:58] [3211718.996904] Out of memory: Killed process 725680 (mysqld) total-vm:64552308kB, anon-rss:63683220kB, file-rss:0kB, shmem-rss:0kB, UID:497 pgtables:125676kB oom_score_adj:-600 [00:55:03] last one was on nov 9th, it made it almost a month without crashing [00:55:11] oops I just rebooted it out from under you sorry [00:55:26] But I imagine you were about to do that anyway :) [00:55:56] taavi: do you mind setting r/w after it comes up? I have food on the stove that's about to burn :) [00:56:00] sure [00:56:18] yeah I was just going through lots of email from the past week and happened to spot that particular one just as it got in [00:56:22] thank you! I'll check back in a few to make sure it was an easy resolution [00:58:24] mariadb is taking suspiciously long to even start [00:58:54] there we go [07:14:07] * dcaro paged [07:16:57] oom killed too [07:16:57] Dec 09 07:03:41 tools-db-1 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer. [07:18:29] I'm starting it again [07:19:32] made it writable again [07:30:54] created T353093 to follow up [07:30:55] T353093: [toolsdb] MariaDB process is killed by OOM killer (December 2023) - https://phabricator.wikimedia.org/T353093 [07:34:17] everything seems stable, /me back to sleep [07:34:42] I wonder if I'm getting charged by the oncall sms... it comes from a foreign phone number (+1 ...) [11:33:45] Oh, paged again, but I'm not near a laptop [11:34:13] I'll try to do something, but might take me a while [11:50:25] Manually set the db as writable again, it was up and running :/ [16:32:35] I am in a car so not very available but I'm tryiing to restart the db yet again :/ [16:38:18] dcaro: are you awake/around by chance? [16:38:37] and/or rook? [16:38:57] Oh, ok. I'm around [16:39:00] Let me look [16:39:43] I'm on db-1 and mariadb won't start up [16:39:47] am I on the wrong VM? [16:39:56] (I rebooted it a few minutes ago) [16:40:23] tools-db-1 was the one [16:41:19] well /now/ it's running, did you do anything other than wait? [16:41:26] just started it [16:41:37] huh [16:41:37] ok [16:41:40] root@tools-db-1:~# systemctl start mariadb [16:41:54] Is it ok if I leave you to it for a few? I have a realtor waiting on me. [16:41:57] just set it as read-write [16:42:00] ack [16:42:00] well, maybe more than a few :/ [16:42:14] My mistake was maybe thinking that puppet would start it [16:42:19] Thank you! [16:44:26] yep, puppet does not touch it iirc (for safety reasons), and it also starts as ro [16:57:09] I'm temporarily back! Are we all good? [16:57:38] so far yes, I suspect that there's something making it crash (a user query someone sends) [16:57:47] but not sure [16:58:11] up and running for now, we can wait and see if it crashes again (it's 4 times today) [17:01:01] only thing that comes to mind is enable swap, and instead of hard crashing, send an alert when the swap starts getting used so we can go and check what's going on [17:01:21] (though the performance of the DB might get unusable, we might be able to find the cause for the crash) [17:03:13] Ok! I'm going back to my day for now but we should talk about it more on Monday. Thanks for the rescue! [17:03:32] 👍 [17:03:35] * dcaro same