[13:11:48] dcaro: thanks. I didt not get the page [13:12:05] I guess that's normal per the oncall rotation [13:31:41] it should be yep, if I catch it on time :) [14:00:15] * dcaro paged, same as the weekend [14:01:35] no error on the nfs server this time though [14:02:04] and it recovered :/ [14:33:18] there's a couple workers that are having extra heavy load, and got some containers killed by OOM [14:34:43] there's this: `[Mon Mar 17 13:59:26 2025] nfs: server tools-nfs.svc.tools.eqiad1.wikimedia.cloud not responding, still trying` in one of them [14:42:15] the errors in the NFS server could be related to ceph? [15:30:41] from what I've read around, it might be more network, as in someone hinted to change tcp settings to increase windows and such, and someone else hinted that error -32 was connection reset (though I did not see proof of it) [15:31:02] of course, it might be ceph underneath slowing something up, and causing network issues [15:31:17] but did not see any other issues around [17:20:09] * arturo offline [18:48:04] * dcaro off