[08:28:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [08:33:35] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [08:41:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [08:46:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [08:51:35] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [09:00:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [09:01:50] (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [09:05:35] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [09:50:43] that's me ^ prometheus rolling restarts in progress [13:24:40] (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [13:27:19] (LogstashKafkaConsumerLag) firing: Too many messages in logging-eqiad for group logstash7-eqiad - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:29:55] (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [13:37:19] (LogstashKafkaConsumerLag) firing: (2) Too many messages in logging-eqiad for group apifeatureusage - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:42:19] (LogstashKafkaConsumerLag) resolved: (2) Too many messages in logging-eqiad for group apifeatureusage - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [16:56:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [16:56:41] (PrometheusRuleEvaluationFailures) firing: (37) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [16:56:52] (ThanosRuleHighRuleEvaluationFailures) firing: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [17:01:35] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [17:01:41] (PrometheusRuleEvaluationFailures) resolved: (37) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:01:52] (ThanosRuleHighRuleEvaluationFailures) firing: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [17:01:56] hmmm, seeing gaps in thanos metrics... [17:02:07] maybe the memory upgrade is being performed? [17:02:54] https://grafana.wikimedia.org/goto/-iLX4LbSz?orgId=1 I saw it both in codfw and eqiad fwiw [17:03:43] cwhite: yea memory upgrade going now [17:03:49] and some alerts firing and resolving as prometheus is coming back up [17:03:59] I should have !log 1 sec [17:06:52] (ThanosRuleHighRuleEvaluationFailures) resolved: Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [17:13:35] (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [17:13:52] (ThanosRuleHighRuleEvaluationFailures) firing: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [17:13:56] (PrometheusRuleEvaluationFailures) firing: (21) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:18:11] (PrometheusRuleEvaluationFailures) firing: (37) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:18:35] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [17:18:50] (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [17:18:56] (PrometheusRuleEvaluationFailures) firing: (37) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:19:05] (ThanosSidecarNoConnectionToStartedPrometheus) resolved: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus [17:23:11] (PrometheusRuleEvaluationFailures) resolved: (37) Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:23:52] (ThanosRuleHighRuleEvaluationFailures) resolved: (2) Thanos Rule is failing to evaluate rules. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/35da848f5f92b2dc612e0c3a0577b8a1/thanos-rule - https://alerts.wikimedia.org/?q=alertname%3DThanosRuleHighRuleEvaluationFailures [17:44:06] ^ This seems to keep firing and resolving itself. I'll take a look at it if it happens again.