[00:47:55] (SystemdUnitFailed) firing: (2) grafana-loki.service Failed on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:27:55] (SystemdUnitFailed) firing: (3) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:32:55] (SystemdUnitFailed) firing: (3) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:32:55] (SystemdUnitFailed) firing: (2) grafana-loki.service Failed on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:11] (SystemdUnitFailed) firing: (2) grafana-loki.service Failed on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:42:56] (SystemdUnitFailed) firing: (3) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:47:56] (SystemdUnitFailed) firing: (3) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:35:57] Please could someone with the Global Admin rights in VictorOps rename the `analytics` route key to `data-platform` for us? Context is: https://phabricator.wikimedia.org/T344202#9524747 [13:36:52] btullis: will do [13:37:04] I'll be changing the Icina/Alertmanager config to match and I doubt that it will trigger anything. Even if it does, it should only page me. [13:37:09] godog: Many thanks. [13:38:03] btullis: {{done}} [13:38:22] 👍 Cheers. [13:38:37] sure np! [14:48:12] (SystemdUnitFailed) firing: (2) grafana-loki.service Failed on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:39:34] godog: I'm trying to find a clean way to have SystemdUnitFailed retry for a specific unit (prometheus-phpfpm-statustext-textfile.service) because it stays failed for a little bit after a php-fpm restart, is that do-able or do I need to try and work some systemd retry magic? [15:39:55] (That's why we alertspam this unit failure during deployments btw) [15:53:41] claime: mmhh interesting [15:54:55] claime: I am in between meetings now, would you mind filing a task with e.g. the host and logs about this? I'll take a look tomorrow [15:57:07] I have a meeting in a bit as well, I'll file a task afterwards [15:57:27] cheers [17:02:56] (SystemdUnitFailed) firing: (2) grafana-loki.service Failed on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:04:06] ^ Filed T357026 for that alert. [17:04:07] T357026: grafana-loki.service Failed on grafana2001 - https://phabricator.wikimedia.org/T357026 [17:07:02] https://phabricator.wikimedia.org/T357028 filed for prometheus-phpfpm-statustext-textfile.service [19:03:49] hi cwhite, sorry to inform you that there's a new train blocker T357050 [19:03:50] T357050: editResponseTime's port to statslib is not actually backwards-compatible - https://phabricator.wikimedia.org/T357050 [19:09:51] thanks for the heads up! looking now [21:02:56] (SystemdUnitFailed) firing: curator_actions_cluster_wide.service Failed on logstash1026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed