[02:29:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [02:44:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [02:44:40] FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [02:51:34] FIRING: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [02:59:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [02:59:40] RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [03:06:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [03:31:34] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:14:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:15:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:19:55] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [04:40:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [05:20:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [05:21:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [05:30:55] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [06:30:40] FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [06:34:34] FIRING: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [06:37:40] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [06:49:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:44:37] FIRING: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [08:09:40] FIRING: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [08:25:40] RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed [08:29:34] FIRING: [2x] ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [08:29:37] RESOLVED: OpensearchClusterHealth: Opensearch cluster health reported as red - https://wikitech.wikimedia.org/wiki/Runbook - https://grafana.wikimedia.org/d/e7d7fa18-7bc3-4548-bb07-ef261a9d3b8b/opensearch-cluster-health?var-cluster=production-elk7-codfw - https://alerts.wikimedia.org/?q=alertname%3DOpensearchClusterHealth [08:29:40] RESOLVED: LogstashClusterStatus: OpenSearch reports cluster status is red. - https://wikitech.wikimedia.org/wiki/Logstash#Unassigned_Shards_and_Cluster_Status - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=panel-49 - https://alerts.wikimedia.org/?q=alertname%3DLogstashClusterStatus [08:54:34] RESOLVED: ErrorBudgetBurn: logstash-availability codfw - https://slo.wikimedia.org/?search=logstash-availability - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [10:37:55] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:47:40] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:49:10] FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [13:54:10] RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [15:13:35] Hello. I'm planning to merge this change in a few minutes unless there are any objections: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1196942 [15:15:01] Oh it needs a rebase, first. Sorry. [15:32:31] Looks good, thank you! [15:33:44] cwhite: Thanks. Unfortunately, I think I need to update it because there's an error on bookworm. [15:33:49] https://www.irccloud.com/pastebin/cR3Hz36C/ [15:33:59] I've seen this before and I'll send a patch. [15:36:26] Similar to this T406148 [15:36:26] T406148: profile::bigtop::apt causes errors on bookworm relating to its handling of the 'Signed-by' field - https://phabricator.wikimedia.org/T406148 [15:40:18] Oh, maybe it's not quite what I've seen before. [15:51:29] cwhite: I think that this should fix it. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1202183 [15:52:47] Great! Thank you! [16:50:43] I think that's all good now. I'll move on to the remaining three patches on T407199 if that's OK. [16:50:43] T407199: Pin opensearch and logstash related package versions in puppet to avoid updates when we mirror the upstream repositories - https://phabricator.wikimedia.org/T407199 [16:54:00] cwhite: Would you rather merge this onem as it only touches logstash* hosts? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1196023 Or are you happy for me to go ahead? [17:31:45] btullis: Saw some unexpected changes in PCC. Is this patch independent of its parent change? [18:18:25] Yeah, sorry. I tried to rebase it against production but it didn't work, so I'm going to have to do this opensearch one first. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1196022? [18:18:37] That's probably tomorrow for me, now. [18:22:53] Works for me :) [18:28:22] On second thoughts, it looks like the opensearch change is going to go out now, after all. inflatador is here for it, too. [18:33:03] Looking ok. No changes on canaries logstash2034, logging-hd2004, datahubsearch1001, cirrussearch2093 [18:37:52] {◕ ◡ ◕} [18:43:42] The PCC on opensearch-dashboards looks good now, too. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1196023 - Shall we press go? [18:52:57] btullis: I agree, PCC looks clean. Good to go! [18:53:43] Ack, rolling out now. [19:31:31] Hello observability folks! I'm hoping you can help me with T409339. In short, what logstash hostname should we be using in beta cluster now? Or maybe logging-logstash-02.logging.eqiad1.wikimedia.cloud just crashed? [19:31:32] T409339: Scap can't connect to logging-logstash-02.logging.eqiad1.wikimedia.cloud in beta - https://phabricator.wikimedia.org/T409339 [22:04:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [22:07:42] Followed up in -releng. The issue is corrected. [22:44:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure