[00:45:40] FIRING: LogstashUnassignedShards: OpenSearch reports unassigned shards outstanding for more than 24h. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-48 - https://alerts.wikimedia.org/?q=alertname%3DLogstashUnassignedShards [00:45:52] FIRING: OpensearchUnassignedOrInitializingShardsRatio: Unassigned+Initializing shards in OpenSearch exceed the allowed ratio - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchUnassignedOrInitializingShardsRatio [02:05:37] RESOLVED: OpensearchUnassignedOrInitializingShardsRatio: Unassigned+Initializing shards in OpenSearch exceed the allowed ratio - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchUnassignedOrInitializingShardsRatio [03:47:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [03:55:40] FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [03:59:34] FIRING: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:10:40] RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [04:11:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:12:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [04:14:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:36:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:39:34] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:45:55] FIRING: LogstashUnassignedShards: OpenSearch reports unassigned shards outstanding for more than 24h. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-48 - https://alerts.wikimedia.org/?q=alertname%3DLogstashUnassignedShards [04:58:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [05:03:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [05:54:40] current status: opensearch in codfw (the standby) is unhappy with the amount of logs it is storing. I've removed a few of the oldest and largest indexes, but it's slow going because the cluster is sometimes evicting the hdd nodes while they're clearing space. I'm going to let it try to finish its recovery with the cleared space it has now and come back to it tomorrow. [06:54:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [07:07:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [07:12:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [07:19:40] FIRING: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [07:24:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [07:24:40] RESOLVED: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [08:59:40] FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [09:05:34] FIRING: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:06:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [09:20:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:39:40] RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [09:40:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [10:05:34] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [10:41:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [10:42:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [10:51:55] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [18:16:09] opensearch in codfw finished rebalancing shards and is happy again.