[11:45:17] FYI, kafka-logging1004 has Puppet disabled since jan 30 and it was thus evicted from puppetdb. not sure, why it was disabled, no reason was set, can that be re-enabled or is there an issue with the server? [12:38:39] not sure but host isn't in service yet it looks like ? [12:39:05] not sure re: being disabled that is [12:41:48] I'm re-enabling now fwiw [12:44:38] ack, I logged in via the mgmt earlier and saw puppet disabled [12:46:25] hi all, is there a reason why alert2001 is not in icinga https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=alert2001 [13:01:48] jbond: the usual historical reason [13:02:29] because of how our puppetization works between exported resources and local resources for the alert hosts [13:02:36] they don't see each other [13:03:10] (this doesn't mean it shouldn't be resolved ;) ) [13:04:14] volans: i was looking at https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cumin1001&service=Ensure+hosts+are+not+performing+a+change+on+every+puppet+run. Admittadly this shld have been spotted much sooner but this wasn;t always the case [13:05:32] eh... [13:08:30] volans: the check for "performing+a+change+on+every+puppet+run" use to work but it has been failing (for about 4 months) with the error " UNKNOWN: Host alert2001 was not found in Icinga status" [13:08:50] wow, it's clearly checked very often :/ [13:09:04] but why is it checking it in icinga status? [13:09:19] do you have handy where the code for that check is? [13:10:11] it looks for hosts with reports that are constently failing but removes any host that have node.notifications_enabled and node.downtimed [13:10:15] modules/profile/files/cumin/check_puppet_run_changes.py [13:10:48] fyi im updating this and wondering if i should filter out alert2001 but also wanted to doubl check that this is intended and not an acciental change 4 months ago [13:11:22] jbond: I guess we should skip the check on the downtime and notifications_enabled on alert2001 [13:11:58] or better, the secondary icinga hos [13:12:00] *host [13:12:16] yes getting the none active host is the better option and easy enough to do [13:13:09] for that one if it's failing puppet is reported no matter what [13:13:21] while for normal hosts they are areported only if not downtimed/disabled_notifications [13:13:47] yes thats easy enough, still curous what changed thogh [13:13:59] * volans is now curious on what changed when to trigger this, maybe when the check for downtime/disabled was introduced? [13:14:32] it could also be some change in the way icinga_status works i guess