[00:04:25] (SystemdUnitFailed) firing: prune_old_srv_syslog_directories.service on centrallog2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:09:25] (SystemdUnitFailed) firing: (2) prune_old_srv_syslog_directories.service on centrallog1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:01:46] ^ taking a look. [04:03:02] I removed the Prometheus and Thanos-BE log gzips older than 45 days, this should resolve the alert. [04:04:57] * denisse ACK'd both alerts. [08:00:55] (SystemdUnitFailed) resolved: (2) prune_old_srv_syslog_directories.service on centrallog1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:56:25] (SystemdUnitFailed) firing: generate-mysqld-exporter-config.service on prometheus2006:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:06:25] (SystemdUnitFailed) firing: (3) generate-mysqld-exporter-config.service on prometheus1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:11:39] fixing ↑ [12:14:48] thank you arnaudb [12:15:37] denisse: thank you for taking a look at prune_old_srv_syslog_directories.service, the problem was non-empty directories that couldn't be deleted, check journalctl -u prune_old_srv_syslog_directories.service [12:15:42] I fixed that earlier today [12:21:25] (SystemdUnitFailed) firing: (3) generate-mysqld-exporter-config.service on prometheus1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:36:25] (SystemdUnitFailed) resolved: (3) generate-mysqld-exporter-config.service on prometheus1005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:39:50] godog: Thanks Filippo, I'll take a look. :) [14:41:07] denisse: yw [14:58:25] (SystemdUnitFailed) firing: statograph_post.service on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:03:25] (SystemdUnitFailed) resolved: statograph_post.service on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:05:25] (SystemdUnitFailed) firing: thanos-query.service on titan1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:07:41] (PrometheusRuleEvaluationFailures) firing: (20) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [15:07:52] (ThanosRuleHighRuleEvaluationFailures) firing: Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [15:10:25] (SystemdUnitFailed) resolved: thanos-query.service on titan1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:12:41] (PrometheusRuleEvaluationFailures) resolved: (20) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [15:12:52] (ThanosRuleHighRuleEvaluationFailures) resolved: Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [15:22:06] Just noticed that the gitiles link for the alerts repo points to the puppet repo...any idea how to fix that? https://gerrit.wikimedia.org/r/q/project:operations/alerts [15:24:20] inflatador: looks like it goes to the right place to me? "Browse: gitiles" -> https://gerrit.wikimedia.org/g/operations/alerts [15:26:40] cwhite hmm, working for me now too...I guess pebkac? Sorry [15:27:37] No worries! Thanks for double-checking :)