[08:13:25] (SystemdUnitFailed) firing: thanos-compact.service on titan2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:13:50] (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown [08:53:50] (ThanosCompactIsDown) resolved: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown [08:54:11] was a case of https://github.com/thanos-io/thanos/issues/6398 ^ [08:58:25] (SystemdUnitFailed) resolved: thanos-compact.service on titan2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:21:26] o/ we have some prometheus labels that bear the name of the site (e.g. "something-eqiad", "something-codfw"), is it possible to transclude something like {{$externalLabels.site}} in an alertmanager expression? i.e. expr: somemetric{somelabel="something-{{$externalLabels.site}}"} > 1? [09:48:27] dcausse: hey, would you mind giving more context on the use case / what are you trying to achieve ? [09:49:21] godog: sure, https://gerrit.wikimedia.org/r/c/operations/alerts/+/1009359 [09:50:30] thank you, checking [09:51:25] I suggest to use a single for both DC using a pattern like job_name=~"cirrus-streaming-updater-producer-(eqiad|codfw)" but was wondering if we could not simply inject {{$externalLabels.site}} instead of using a pattern [09:51:41] s/single/single alert definition/ [09:53:26] not afaik no, annotations will get their templates expanded but not expressions [09:53:37] your solution with the pattern is the correct one [09:53:50] ok thanks! [09:54:11] sure np, thanks for reaching out dcausse [09:55:00] but yeah no need to have two separate alerts with codfw/eqiad in the name, one alert e.g. CirrusProducerFlinkJobNotRunning will do [09:58:08] godog: deploy-site/deploy-tag are global and placed only the beginning the file, no way to specialize them for a single metric? [09:59:24] dcausse: correct yeah, they apply to the file as a whole [09:59:34] makes sense, thanks! [10:00:29] yw, for context the deploy-* thing is something I have added at alerts deploy time, i.e. not a prometheus/alertmanager feature