[07:28:00] Is thanos-be1003 ours or observability's. I am wondering if https://phabricator.wikimedia.org/T285662 is happening again? [07:30:52] In any case I wonder if I should open a task for dcops, but will wait for feedback before doing it on my own? [07:40:36] jynus: It's Complicated (we are moving to splitting thanos-the-software onto separate hardware); I'll have a look [07:43:04] jynus: OOI, did you pick that up by the puppet-failed alert? [07:44:36] no, it is generating a critical [07:44:55] summary: DISK CRITICAL - /srv/swift-storage/sdn1 is not accessible: Input/output error [07:45:44] It is not a worry if it is a known failure or WIP maintenance, but can I ack it? [07:46:19] No, it's a real problem [07:46:24] I see [07:46:34] then I let you work, ignore me [07:46:51] weird to see EIO without kern.log entries at all, though [07:46:59] I just thought it was a disk failure [07:47:06] (a regular, boring one) [07:47:15] seems most likely, but it's an odd presentation [07:48:05] the underlying device (/dev/sdl) is still there per lshw and blkid but the partition says EIO without any of the usual xfs errors you'd expect if the hardware was defunct [07:49:15] ah, plenty of kernel errors yesterday, that'll do [07:50:08] remount has cleared it, though. [07:54:23] I saw it yesterdey, but following the saying "friends don't ping friends about RAID/JBOD disk failures, phabricator generates an automatic ticket" I didn't give it much importance [07:55:48] :-D [08:00:02] my experience is that the failure described in T285662 is not uncommon on swift nodes (but as it happens, this was not one of those) [08:00:03] T285662: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 [08:00:53] I see [08:01:49] thank you for taking care of that- I realize I lack so much context of things to be able to properly respond to so many alerts [08:04:55] NP [09:13:37] marostegui: independently of the date, do you want to do a refresh of a misc db backup taking and recovery at the end of this week? [09:14:05] jynus: I won't be around for their migration if they want to do it on monday XD [09:14:13] ok :-D [09:14:24] but yeah [09:14:24] offering it also for Amir1 BTW [09:14:27] I'll check the docs [09:14:33] And let you know if I have something not clear [09:15:00] I will be literally in a plane :D [09:15:04] it is a bit more complicated than usual because we take no snapshots of misc, so it would have to be added, and that may not be super clear [09:15:08] ha ha [10:04:31] I will never understand when mariadb decides to lock metadata and when it doesn't [10:08:28] may I introduce you to sod's law? :) [10:19:27] haha, fair [10:30:10] * Emperor gets gitlab to do their bidding [10:42:45] ...dcmd(1) is very useful [13:46:29] es2025 backup finished correctly, so it is now free for maintenance [13:56:40] awesome! [16:37:43] jelto and others, I just saw some errors to: /w/rest.php/ru.wikipedia.org/v3/transform/wikitext/to/pagebundle/Main_Page [16:37:53] ups, wrong channel