[07:55:41] ugh, puppet-facts-export failed with KeyError: 'trusted', have you seen this before ? [07:59:07] lvs1014 doesn't have trusted fact it looks like [08:00:34] and now it worked -.- [08:01:37] I'm wondering if the host replacing its facts is racy with us reading them from puppetdb [08:04:48] (scratch the lvs1014 thing, it isn't that host) but anyways not a new failure, there's a bunch of temporary directories left behind [08:05:00] an-airflow1001 is running out of disk space, and it has about 11G of logs [08:05:09] anyone knows what can be deleted or not? [08:54:17] effie: that would be ebernhardson I think [08:54:46] ah! thank you! [08:55:49] yep, np, I also ran "apt-get clean" which pushed it slightly under the 95% alert threshold [09:27:45] hashar: hey, I see there's a bunch of puppet error emails coming from integration Cloud VPS project, just fyi. we lately introduced a way to disable those alerts if they are not needed (https://gerrit.wikimedia.org/r/c/operations/puppet/+/712923) [14:11:41] godog: random question, can we create tasks from icinga alerts automatically? [14:11:59] (I know alertmanager can, but not sure icinga by itself) [14:15:05] dcaro: we can yeah, raid tasks for example do [14:17:58] godog: nice! let me try to find them [14:18:45] is it icinga creating the task or the script itself? [14:19:07] the script that acts as alert handler IIRC dcaro [14:19:47] ahhh, I was hoping to be able to do it from icinga, so any check could potentially open a task [14:20:13] wait, as alert handler, you mean the one that runs on the host to check the raid status? [14:21:33] no not the nrpe check, the handler is what icinga runs on the alert host itself whenever the alert fires [14:21:49] nice, that sounds more promising then [14:22:07] remember the name/keyword for me to look around? [14:22:13] yeah that's modules/icinga/files/raid_handler.py in puppet [14:25:56] godog: do you think it would be possible to open a task for any critical icinga alert on cloud instances? Would that make more sense to do on alertmanager side? (maybe similar to team tags? as in, tag the alerts there somehow to trigger the task creation) [14:27:45] dcaro: yeah certainly possible, I'd recommend investing time in AM rather than icinga [14:28:49] godog: thanks! I might want to have a quick chat about it in the coming weeks :) [14:29:52] dcaro: for sure, to get more context what are the problem(s) you are trying to address ? i.e. zooming out ? [14:30:59] godog:sure, we are thinking on creating tasks for every alert we want to act on (that should be most of them), and page only on the ones that need immediate attention, that way we can keep track of counts, how were they handled in the past, etc. [14:32:28] dcaro: ah ok got it, is this for the production alertmanager I take it? in other words alerts for cloud hosts in production ? [14:32:55] happy to subscribe to a task too if/when you have one [14:33:26] godog: yes, for now at least, sure, I'll make sure to subscribe you, currently is just random thoughts [14:36:18] dcaro: sounds good to me, thanks! re: alert counts, all AM alerts that fire are also available in logstash with program:alertmanager-webhook-logger [14:43:53] godog: that is awesome, thanks!