[12:13:50] folks nothing important but I was trying to move the "RPKI" dashboard in Grafana into the "SRE Netops" folder, every time I do it it seems to have worked, but when I refresh it's back in "General" [12:29:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [12:41:03] !restarting gnmic.service on netflow1002 [13:38:29] topranks: not sure what's going on off the top of my head, I'm looking at sth else right now and will take a look later [13:38:59] godog: thanks [13:39:01] I'm investigating the BenthosKafkaConsumerLag alert above, looks like centrallog2002 is high on system cpu usage since a couple of days [13:39:35] btw I'm looking at the gnmic stuff since I merged my patch. it's working fine in magru but stats stopped in eqiad, not exactly sure what's wrong (manual scrape is taking 13 seconds so should be ok) [13:39:44] if I can't work it out shortly I'll revert my patch [13:40:53] topranks: ack [13:45:08] looks like mtail freaked out at some point around jan 22 00:40, and the excessive cpu was slowing down benthos causing the lag [13:45:18] should be recovering shortly [13:59:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag