[18:13:05] <mutante>	 I know now how to replace all my Icinga check_http and check_tcp checks variations with Prometheus/Alertmanager. But what I don't know yet is how you usually replace one of the NRPE process checks.. whether a certain process is running. 
[18:14:53] <mutante>	 but then again, it's a good time to wonder if that's actually useful to check.. and maybe it's smarter to reduce it to just monitoring a public endpoint being up.. and not worry about processes. there are always some legit cases though, like "is zuul_merger running" for CI
[18:20:38] <herron>	 yeah, whenever possible it will be better to check something that's more indicative of service health than pgrep.  I've seen so many times a process that is running but is in a broken state
[18:25:07] <mutante>	 Yea, I am writing something like this on a ticket for my team.. That we should question each one. But there will be some left that we kind of do want to keep I think. and for those it's still the question what would replace that kind of check.
[18:25:38] <mutante>	 one example where I will just remove it: already checking if https is up on Etherpad.. dont care to additionally know if the process is running
[18:26:05] <mutante>	 one example where I think we need to keep it: is clamd not crashed on VRTS
[18:55:09] <herron>	 we may be able to find a prometheus exporter for clamd that'd give up/down status along with some additional metrics too
[18:55:39] <herron>	 an in general s/clamd/$service
[19:01:58] <mutante>	 oh, _that_ specific.. hm, ok!
[19:02:25] <mutante>	 well, there's going to be zuul, zuul-merger, jenkins, gerrit.. etc
[19:02:31] <mutante>	 I made a fresh ticket :)
[19:19:51] <herron>	 there are also systemd failed unit checks in place, which probably could supersede various process name nrpe checks.  but yeah the more specific the better
[19:21:38] <mutante>	 good points. thank you