[14:43:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [14:53:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [15:22:48] Hi! Traffic wants to fix up our haproxy restart detection. I have a CR up (https://gerrit.wikimedia.org/r/c/operations/alerts/+/1055498) but we're not sure whether to use rate() or increase(). Any opinions? [15:27:02] brett: my own feeling is increase() is a better fit, without knowing anything about it! [15:27:22] the "number of restarts per second" really isn't what you want to measure right? [15:27:40] The docs seem to suggest it is, too: "Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for" [15:27:57] But either way, the graph is pretty much the same either way [15:28:08] s/ either way,// [15:28:20] yeah it'll be the same shape, and it will work equally well to discover that there has been a restart [15:29:09] but when there is one restart I prefer increase() returning integer '1', telling me there has been 1 restart in the time period [15:29:18] rather than rate() telling me that there has been 1/300 restarts per second average over the last 5 mins [15:29:26] Good point [15:29:47] but perhaps see what the o11y folks think too they probably have more insight [15:31:39] Your explanation is good, topranks - I would also use increase() for graphing a metric like restarts [15:31:59] I'd use "> 0" instead of "!= 0". [15:36:45] FWIW +1 to increase() in this case, mostly to topranks' point [15:41:28] I think we're at +4 now, and I'll throw in a +5 for increase() 😆 [15:48:28] +6.