[00:45:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: curator_actions_cluster_wide.service on logging-sd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:47:40] <XioNoX>	 good morning!! I need help on why this alert (or its test) is not passing properly : https://gerrit.wikimedia.org/r/c/operations/alerts/+/1127041 any idea ?
[08:53:34] <tappof>	 aloha XioNoX, I'll take a look
[08:56:03] <XioNoX>	 <3
[09:17:34] <tappof>	 XioNoX: I left some comments directly on Gerrit for simplicity
[09:20:17] <XioNoX>	 tappof: awesome, thanks!
[09:27:28] <XioNoX>	 tappof: interesting, running CI locally still doesn't see the issue as fixed, but in gerrit it's fine
[09:32:32] <slyngs>	 @godog Thank you for the input last week. The conn track monitoring is not successfully removed from Icinga, with minimal noice.
[09:34:22] <XioNoX>	 slyngs: s/not/now/ ? :)
[09:34:30] <tappof>	 XioNoX: are you running the CI locally using docker?
[09:34:35] <XioNoX>	 tappof: yeah
[09:34:45] <slyngs>	 XioNoX: Yeeeah :-)
[09:34:48] <XioNoX>	 `docker run --entrypoint tox alerts-tests`
[09:35:30] <tappof>	 Did you rebuild the container after applying the latest changes?
[09:36:10] <tappof>	 XioNoX: 
[09:37:37] <XioNoX>	 ahh, no
[09:37:41] <tappof>	 :)
[09:37:51] <XioNoX>	 I didn't know that was needed
[09:38:09] <XioNoX>	 I thought it was just needed for the first run and then it would pickup whatever was local
[09:38:40] <tappof>	 The container does not bind-mount the current directory, so you need to rebuild it every time
[09:42:08] <XioNoX>	 noted
[10:06:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logging-sd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:58:57] <godog>	 slyngs: woot woot!! nicely done
[11:00:40] <godog>	 good point re: the fact that the container needs rebuilding, the docs are not clear on that point
[12:56:35] <XioNoX>	 is there a way to know if those alerts have triggered over the last few days https://gerrit.wikimedia.org/r/c/operations/alerts/+/1126966 ?
[12:57:45] <godog>	 XioNoX: yes, check out the alerts overview dashboard https://logstash.wikimedia.org/goto/f3e6181b03de7d5ca37a80e83990ae65
[13:07:10] <XioNoX>	 ah right! thx
[13:25:49] <XioNoX>	 godog: looks like it's working decenly well : https://alerts.wikimedia.org/?q=%40cluster%3Dwikimedia.org&q=team%3Dsre&q=alertname%3DCoreRouterInterfaceDown  https://phabricator.wikimedia.org/T389071 :)
[13:26:52] <godog>	 XioNoX: \o/ \o/ very cool
[13:27:33] <XioNoX>	 godog: I think a netops tag will be needed at some point, to have an overview :)
[13:30:43] <godog>	 XioNoX: indeed, can't chat now but happy to later
[13:30:58] <XioNoX>	 no rush
[13:52:58] <jinxer-wm>	 FIRING: PrometheusLowRetention: Prometheus k8s-aux is storing less than 20 days of data on prometheus2007:9911. - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-Prometheus=prometheus2007:9911 - https://alerts.wikimedia.org/?q=alertname%3DPrometheusLowRetention
[14:06:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logging-sd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:46:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: curator_actions_cluster_wide.service on logging-sd1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed