[09:03:57] Hi! playing a bit with prometheus + systemd units, it seems that when the unit itself is inactive (dead), node-exporter will not generate a value for it (no status="inactive" with value 1), do you know if that is meant to be like that? [09:25:18] dcaro: hi! I did a quick check with e.g. "node_systemd_unit_state{instance=~"thanos-fe1001.*",state="inactive"} == 1" and there are a few units inactive (dead) that are reported as inactive, probably sth else distinguishing that unit as not reported [09:25:47] interesting [09:25:59] I'm using 'node_systemd_unit_state{instance=~".*labstore1004.*", name=~".*maintain.*"}' [09:26:27] from the host: [09:26:28] root@labstore1004:~# systemctl status maintain-dbusers.service [09:26:31] ● maintain-dbusers.service - Maintain labsdb accounts [09:26:33] Loaded: loaded (/lib/systemd/system/maintain-dbusers.service; static; vendor preset: enabled) [09:26:35] Active: inactive (dead) [09:26:50] it was manually stopped though, so maybe that's a difference? [09:27:59] could be! perhaps diffing "systemctl show" of that unit and one that's reported might give you some hints [09:29:26] good idea, looking [09:36:37] it seems to be related to https://github.com/prometheus/node_exporter/issues/1082 , :/ [09:42:14] uuugghhh that looks like a deep rabbit hole :| [09:48:54] I have another different question, sometimes we have alerts with instance label, and sometimes with instance (where it was fetched from) and hostname (what host it affects), should be populate the hostname one automatically if it's not there? I say because then silencing alerts becomes easier (using a single silence instead of two) [09:54:36] dcaro: I think using 'instance' to generally "do the right thing" is the path of least resistance, in this case perhaps making sure the affected is in instance label [09:55:14] what are the exporters/systems affected in ? [09:56:13] our current issue is with the openstack exporter, as it scrapes a different host than the one being affected by the alert [09:56:55] (ex. neutron node being down, is fetched from the openstack side, but it affects a physical host) [09:57:27] so if you take a cloudvirt physical host down, you have to silence the regular 'node down' alerts with instance, but openstack originating ones with 'hostname' [09:58:05] ah yeah, similar pattern to what happens with blackbox exporter [09:59:54] IMO the easiest solution is to make sure instance has the target hostname for openstack exporter metrics [10:02:01] okok, I might come back with questions on how to do that though xd, as it probably has to be conditional (in the sense that not all openstack metrics will have a hostname label to override the instance label with) [10:02:42] hmm... that will also "destroy" the information of which openstack instance was used to fetch the stats right? [10:03:44] as described now yes, though you can relabel the openstack instance used to fetch into another label [10:04:40] also I recommend opening a task, it is quite easy to miss stuff on irc at least on irc [10:07:07] 👍