[13:11:48] <arturo>	 dcaro: thanks. I didt not get the page
[13:12:05] <arturo>	 I guess that's normal per the oncall rotation
[13:31:41] <dcaro>	 it should be yep, if I catch it on time :)
[14:00:15] * dcaro paged, same as the weekend
[14:01:35] <dcaro>	 no error on the nfs server this time though
[14:02:04] <dcaro>	 and it recovered :/
[14:33:18] <dcaro>	 there's a couple workers that are having extra heavy load, and got some containers killed by OOM
[14:34:43] <dcaro>	 there's this: `[Mon Mar 17 13:59:26 2025] nfs: server tools-nfs.svc.tools.eqiad1.wikimedia.cloud not responding, still trying` in one of them
[14:42:15] <arturo>	 the errors in the NFS server could be related to ceph?
[15:30:41] <dcaro>	 from what I've read around, it might be more network, as in someone hinted to change tcp settings to increase windows and such, and someone else hinted that error -32 was connection reset (though I did not see proof of it)
[15:31:02] <dcaro>	 of course, it might be ceph underneath slowing something up, and causing network issues
[15:31:17] <dcaro>	 but did not see any other issues around
[17:20:09] * arturo offline
[18:48:04] * dcaro off