[02:08:49] (PuppetZeroResources) firing: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:38:57] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [06:08:49] (PuppetZeroResources) firing: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:38:57] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [09:53:49] (PuppetZeroResources) resolved: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:38:57] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [11:25:40] mmhh the kafka lag is mw httpd access log, consuming from codfw unsurprisingly [11:26:09] so no immediate impact on users since we're active in eqiad, still a problem [14:23:08] godog: I'd like to merge the change for the ProbeDown delay/alert_after change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/991571). Do you suggest anything special when merging this? I'd run puppet on prometheus and a bunch of hosts which use blackbox checks and make sure the ProbeDown alerts still look the same (in https://thanos.wikimedia.org/alerts for example). [14:38:57] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [14:59:31] jelto: yeah running puppet on the hosts that use the check and the prometheus is good [14:59:36] go ahead, LGTM [14:59:45] ok thanks, I'll proceed [18:38:58] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [19:29:46] Hello 0lly, I'm working on an alerts review and had some questions: what is the default retention rate in Thanos? Is there any option to keep data longer than the default? [20:31:43] inflatador: I think this is controlled by `profile::thanos:retention::(raw|5m|1h)` Seems querying raw is limited to 54w but maybe there's a way to access the compacted metrics? I'm not immediately seeing how to query the compacted metrics, but it might be worth asking g.odog. [20:32:10] I we store a lot of alert events in logs though. [20:32:16] *I know [20:33:14] We start culling alert logs at 5 years. [21:42:53] cwhite thanks for the info, I'll reach out to g.odog tomorrow [22:38:58] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag