[08:22:42] jayme: ack, just saw the patch, LGTM! [08:24:05] re: hiera switch to silence alertmanager notifications for an host, that's indeed not in puppet nor planned/intended [09:16:03] godog: why not planned/intended? That's a feature that people is relying on when setting up hosts, not only for the insetup roles, but also for example for DB hosts while they are provisioned with all the data. [09:20:48] volans: the equivalent is to issue downtimes, which can be done at any time whether the host exists or not, unlike icinga, and that's why we have the "notifications enabled" flag in hiera [09:21:03] which is basically a workaround to the icinga race [09:23:40] the feature exists not because icinga can't downtime a host that doesn't exists, but because it's not know how long it will take to setup and to reduce the noise of expired downtimes/silences [09:24:05] I'll suggest you get in touch with data-persistence as they are using that feature heavily in their workflow [09:25:37] [unrelated] I noticed this channel is not logged in our usual irc logs [09:25:41] ok thank you volans [09:26:12] (or I'm looking at the wrong page, wait a sec) [09:28:41] nevermind, my bad, old link [09:28:48] it's logged fine [10:06:42] godog: unfortunately the generic CertAlmostExpired is still matching the specific probes/modules I've created [10:14:13] jayme: mmhh ok I'll have to think a little about what's the best solution, I'm open to suggestions too [10:14:50] any chance we can extend the cert refresh period? [10:20:03] I would like not to [10:21:54] actually I would like to fix the current cert expiry and lower it to the 72h it was supposed to be initially [10:26:26] godog: what is the purpose of the generic alert? AIUI manual probes do have their own anyways and the service catalog probes do use a different job definition (probes/service vs. probes/custom) [10:27:40] if the generic one is supposed to catch the service catalog probes we coud filter on that job rather than job=~"probes/.*" [10:28:22] that's true yeah [10:30:45] or use {job=~"probes/.*", job!="probes/custom"} to be sure not to loose anything [10:33:50] jayme: I like that solution! would you mind sending out a review ? [11:21:56] sure thing [11:45:08] https://gerrit.wikimedia.org/r/c/operations/alerts/+/983164 [13:21:37] cheers jayme [14:22:32] in karma, we are seeing alerts twice, one with receiver: default and one with receiver: wmcs-ircmail (or other wmcs- receivers) [14:22:46] this doesn't seem to happen with sre alerts... are we doing something wrong? :) [14:24:18] dhinus: it could the karma issue [14:24:29] dhinus: sadly it happens with any alert with multiple receivers, known bug in karma https://github.com/prymitive/karma/issues/5144 [14:25:03] I'll be reaching out to upstream directly and see if they at least have a direction for a fix, we might need to fix locally [14:25:06] * volans was about to post the same link [14:26:07] yeah quite misleading [14:35:25] I found that bug but it seems to be for "multiple receivers" as in multiple custom ones? [14:35:38] we only have one receiver, but I guess it inherits the "default" one? [14:35:50] and why it doesn't happen on sre alerts, but apparently only on wmcs ones? [14:36:43] dhinus: what's an example to look at ? [14:37:20] I'm trying to find a good one, maybe I'm mistaken [14:39:00] alertname=PuppetZeroResources yields two alerts with two receivers, which seems the bug above (5144 in github) [14:39:52] alertname=NodeDown severity=warning yields two alerts, one with receiver "wmcs-ircmail" and one with receiver "default" [14:40:05] but we only defined one receiver ("wmcs-ircmail") [14:40:31] indeed, that's the bug, your receiver plus the default one [14:45:24] but in the results for alertname=PuppetZeroResources I don't see any alerts with receiver=default [14:45:33] does it happen only for alerts with severity=warning maybe? [14:49:19] possible although I doubt it, dhinus would you mind posting links to alerts.w.o or screenshots to make sure we're looking at the same thing ? [14:50:55] I'm also about to go into a meeting FWIW [14:51:08] no prob, I will post some screenshots. do we have a phab about it? [14:51:17] yes, looking [14:54:26] can't find it now, although I remember posting on phab [14:56:15] I will file a new one, then we can mark as duplicate if you find the other one :) [14:56:27] thank you dhinus ! [15:03:56] godog: I think you just mentioned it in https://phabricator.wikimedia.org/T350694#9374529 [15:13:38] https://phabricator.wikimedia.org/T353457 [15:13:44] I added it as a subtask [15:23:47] Is it possible to add custom headers to a blackbox check? I'm looking at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/prometheus/manifests/blackbox/check/http.pp#38 and I don't see a param for that (keep in mind I'm not the greatest at puppet) [15:24:16] like if I wanted the poller to send an 'Accept:foo' header in its request [15:54:25] volans: oh yeah, I was thinking about a task specific for the bug like dhinus created [15:54:29] anyways [15:55:22] inflatador: not out of the box, though should be pretty easy to add a parameter and then merge it into $headers [15:56:08] godog cool, I'll try and start a patch this week [15:56:36] inflatador: cheers! please send it my way and I'm happy to take a lok [15:56:39] look even