[10:36:22] Will merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/989458 (removing a filter on a Thanos RR) in a bit [10:39:30] ack klausman [12:42:40] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [12:47:55] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [12:48:25] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [12:52:40] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [14:33:58] godog: something is puzzling. With the new RR, it seems the "le" label is just discarded, despite being in the sum by clause [14:34:30] E.g. here, there is no le label [14:34:33] https://thanos.wikimedia.org/graph?g0.expr=istio_sli_latency_request_duration_milliseconds_count%3Aincrease5m%7Bresponse_code%3D~%222..%22%2C%20destination_service_namespace%3D%22recommendation-api-ng%22%2C%20destination_canonical_service%3D%22recommendation-api-ng-main%22%7D&g0.tab=1&g0.stacked=0&g0.range_input=1h&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0& [14:34:35] g0.store_matches=%5B%5D [14:34:37] damn you, IRC [14:37:33] Do RRs drop all labels that are unmentioned? [14:43:03] godog: At any rate, this completely breaks our SLO calculations for latency, so I'm going to send a patch to try and fix it. [14:46:55] klausman: checking [14:47:23] I also realized that the "total # of requests" RR doesn't need all the buckets, we only care about the last one. [14:53:01] klausman: mhh ok, can't quite figure out what was wrong previously, let me know if your latest change fixes things [14:53:09] will do [14:53:22] It seems weird that it would drop that label just because of the change [15:02:46] indeed [15:16:09] yeah, I am still not sure what is going on. [15:16:51] I'm trying another patch the .* didn't help [15:17:16] I've also asked on the upstream IRC channel what might be wrong [15:42:43] I think this may have been a brain fart on my side: the emtrics have appeared now. probably evaluation window or something. I'll keep investigating [15:57:58] hah, ok thank you for the followup