[02:08:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:38:57] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[06:08:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:38:57] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[09:53:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:38:57] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:25:40] <godog>	 mmhh the kafka lag is mw httpd access log, consuming from codfw unsurprisingly
[11:26:09] <godog>	 so no immediate impact on users since we're active in eqiad, still a problem
[14:23:08] <jelto>	 godog: I'd like to merge the change for the ProbeDown delay/alert_after change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/991571). Do you suggest anything special when merging this? I'd run puppet on prometheus and a bunch of hosts which use blackbox checks and make sure the ProbeDown alerts still look the same (in https://thanos.wikimedia.org/alerts for example).
[14:38:57] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:59:31] <godog>	 jelto: yeah running puppet on the hosts that use the check and the prometheus is good
[14:59:36] <godog>	 go ahead, LGTM
[14:59:45] <jelto>	 ok thanks, I'll proceed
[18:38:58] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[19:29:46] <inflatador>	 Hello 0lly, I'm working on an alerts review and had some questions: what is the default retention rate in Thanos? Is there any option to keep data longer than the default?
[20:31:43] <cwhite>	 inflatador: I think this is controlled by `profile::thanos:retention::(raw|5m|1h)`  Seems querying raw is limited to 54w but maybe there's a way to access the compacted metrics?  I'm not immediately seeing how to query the compacted metrics, but it might be worth asking g.odog.
[20:32:10] <cwhite>	 I we store a lot of alert events in logs though.
[20:32:16] <cwhite>	 *I know
[20:33:14] <cwhite>	 We start culling alert logs at 5 years.
[21:42:53] <inflatador>	 cwhite thanks for the info, I'll reach out to g.odog tomorrow
[22:38:58] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag