[00:47:25] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:47:40] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:47:40] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:08:09] TIL mimirtool can analyze grafana using thanos/prometheus to see what metrics are used/unused https://grafana.com/docs/mimir/latest/manage/tools/mimirtool/ [12:47:40] FIRING: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:24:03] folks kafka-logging1004 is in rack E2 we're upgrading the switch in shortly, is that ok? [13:38:30] godog: if you are around you might know about kafka-logging1004? [13:39:36] topranks: I am yeah, afaik kafka should be fine with that and I don't know if there are explicit things to do, maybe herron does ? [13:40:23] godog: ok thanks for the info! [13:42:04] topranks: thx for the heads up yeah thats fine for the single host [13:42:25] RESOLVED: SystemdUnitFailed: curator_actions_cluster_wide.service on logstash2026:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:44:09] herron: appreciate the info, I'll let you know when we're done here if there is anything needs checking [14:24:50] herron: ok switch is upgraded, kafka-logging1004 should be back in the mix [14:25:04] topranks: great thx having a look now [14:26:44] topranks: yup looking good thx again! [14:28:03] super, thanks for the help :) [15:58:41] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [16:03:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [17:48:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [18:53:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [19:03:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [19:23:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [19:32:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [20:37:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [20:43:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [21:28:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [21:42:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [21:47:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [21:58:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [23:18:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [23:31:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [23:36:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [23:50:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-sampler - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag