[08:54:47] morning! [09:47:16] o/ [11:53:45] hmm I see this alert in #-cloud but I don't see it in alertmanager "Node tools-k8s-worker-nfs-58 has at least 12 procs in D state" [11:54:15] and grafana also shows "1" as the latest value https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview?orgId=1&viewPanel=2&from=now-15m&to=now [11:54:43] now it's "resolved" in #-cloud as well [11:59:04] ok found the explanation: the alert triggers when avg_over_time[1h] > 12, for longer than 1 hour [11:59:12] and there was a spike at 10:22 UTC [11:59:31] now it's back to normal, so the alert only fired very briefly [13:54:12] might be a problem between metricsinfra alertmanager and prod one, I saw alerts not passing through the last time we had nfs issues (not saying they are related, they might, but not sure how) [14:09:09] dcaro: it could be actually, let's keep an eye on the following alerts in -cloud and double check if they appear on alerts.wm.o [15:59:54] 👍 [17:25:41] * dcaro off, cya on monday!