[05:34:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:44:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:55:25] FIRING: SystemdUnitFailed: opensearch_2@production-elk7-codfw.service on logging-hd2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:10:25] RESOLVED: SystemdUnitFailed: opensearch_2@production-elk7-codfw.service on logging-hd2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:42:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [15:07:40] FIRING: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [15:12:40] FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [15:17:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [15:19:35] FIRING: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [15:22:40] FIRING: [2x] LogstashKafkaConsumerLag: Too many messages in logging-codfw for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [15:27:40] RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [15:29:35] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [15:32:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [15:32:40] RESOLVED: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [15:32:40] FIRING: [2x] LogstashKafkaConsumerLag: Too many messages in logging-codfw for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [15:34:50] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [15:54:50] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [16:37:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [16:39:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [16:42:55] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [16:49:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [17:02:55] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [19:34:55] cwhite /0lly , just a heads-up that I have a patch up to slightly change firewall behavior on opensearch hosts, feel free to review at your convenience: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1266372 [19:35:14] per conversation with taavi in #sre, this should keep us from having to manually reload ferm after every reimage [22:04:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [22:14:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure