[03:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [07:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [11:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [11:59:40] (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [12:04:40] (LogstashIndexingFailures) resolved: (2) Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [15:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [15:49:32] godog: re: systemd unit icinga checks cleanup, they don't depend on the systemd service's monitoring_enabled being set to true? [15:53:02] claime: good question, not AFAICT, nrpe::monitor_systemd_unit_state at least deploys a completely separate check, irrespective of systemd::service's $monitoring_enabled afaics [15:53:16] super intuitive eh? [15:53:21] ngh [15:53:37] I'm glad the list of icinga checks is finite [15:54:12] but then, if $monitoring_enabled is false (default), how does that affect the prometheus monitoring? [15:54:29] (if you know, I can also rtfc) [15:55:29] claime: I'm about to jump into a meeting, I'll take a closer look tomorrow [15:55:46] no worries :) [15:56:09] hehe ok! [16:01:25] (SystemdUnitFailed) firing: (2) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:06:25] (SystemdUnitFailed) resolved: (2) statograph_post.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:45:21] Sharing for visibility: thanos-query probedown due to OOM of both eqiad titan frontends https://phabricator.wikimedia.org/T356788 [17:31:25] (SystemdUnitFailed) firing: vo-escalate.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:36:25] (SystemdUnitFailed) resolved: vo-escalate.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:55:57] !log wikikube: cordon nodes added earlier today in codfw [17:55:58] kamila_: Not expecting to hear !log here [17:56:26] eh, sorry, channel starts with o XD [19:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [21:17:25] (SystemdUnitFailed) firing: vo-escalate.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:22:25] (SystemdUnitFailed) resolved: (2) vo-escalate.service Failed on alert1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:13:08] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [23:57:52] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-codfw for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag