[07:10:45] dcaro_away: thanks for adding more alerts btw! I noticed a pitfall with e.g. https://gerrit.wikimedia.org/r/c/operations/alerts/+/811999 in which you don't want 'site' in 'expr' if the alerts are not deployed to thanos [07:11:00] dcaro_away: I'm thinking CI/tests need to fail on that for sure [07:37:44] godog: hey, let me check [07:38:55] godog: hmm, can you elaborate? I thought that the alerts were deployed to thanos by default xd [07:39:59] dcaro: heheh no, they are deployed to prometheus by default, you can opt-in for thanos though [07:40:11] on a file basis that is [07:40:21] basically this https://wikitech.wikimedia.org/wiki/Alertmanager#Local_and_global_alerts [07:40:52] 👍 [07:41:17] I will want to opt-in for thanos yes, we should not page on openstack stuff from codfw [07:41:56] can you instead just deploy the alerts to eqiad only? [07:42:21] I was thinking on that too, what is the advantage? [07:42:45] (not depending on an extra layer might be one xd) [07:43:12] the wikitech section I linked above hints to reliability yeah [07:43:43] also what should thanos do when prometheus is unreachable e.g. a site is offline ? [07:44:13] if the eqiad site is offline, I will not get any alerts from the prometheus on eqiad either no? [07:47:05] yeah site offline is an extreme example, you get the idea of thanos reaching out to all prometheus vs prometheus evaluating alerts locally [07:48:32] is it ok if I split the alerts in two files, one for eqiad and one for codfw? [07:48:45] yeah totally up to you [08:01:21] nice, we got systemd units errors in \o/ :) [08:02:08] woot woot! nicely done dcaro [08:02:42] godog: question about tasks from alerts, we might be creating a bunch (anything that does not need immediate attention but needs checking), so I was wondering if we can customize the title, close the task when the alert resolves, link the task to the alert (so it shows up there), etc. [08:05:06] dcaro: there's a lot to unpack, but yes you can have a custom title and yes the task will be resolved once no alerts are firing IIRC [08:05:42] linking the task to the alert I don't think is possible atm (or at all probably, you could link the phab project though) [08:06:42] godog: I got this task that was not closed: https://phabricator.wikimedia.org/T310911 [08:08:05] indeed, I stand corrected [08:08:13] and was checking the source at https://github.com/knyar/phalerts/blob/master/phalerts.py [08:10:03] dcaro: re: changing the title you can find an example at Change-Id: I65302dabc [08:10:14] oh, that helps! thanks! [08:10:26] a little messy at the moment unfortunately due to url quoting but heh [08:11:01] quoting 100% correct is never easy xd [08:11:43] heheh indeed, I thought I could get away with unquoted urls in the AM config but that's not the case [13:28:43] godog: when creating tasks from alerts, it creates one task with all the instances of that alert, if I change the title like you pointed out, will it create one task for each different title it generates? (ex. per-instance) [13:58:30] dcaro: yes that's my understanding [14:03:46] okok, sent a patch to change the title for wmcs tasks 👍 [14:05:59] neat, I'll take a look [14:12:13] yeah looks good but see my comment re: grouping [14:13:26] dcaro: actually I wrote that all alerts won't have grouping but of course we could "de-group" only alerts that will open tasks, depending on the route configurations [16:02:22] godog: random question, can I use the alert description as part of the task title? [16:04:21] dcaro: probably, if the description is unique within the alert group then commonAnnotations.description should have it [16:09:09] nice! I'll add that then too, it's useful normally (better than just the alert name) [16:10:36] *nod* yeah I haven't tried it before but I can see it being useful