[00:10:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest_live - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest_live - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [00:20:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest_live - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest_live - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [13:13:11] hi folks, grafana is down, there is any maintenance going on? [13:16:34] it's working again, but apparently w/o any interaction, I'm not seeing any restarts for Grafana or Apache e.g. [14:29:09] There wasn't any maintenance that I'm aware of. [19:52:22] heads up: if apache2 gets restarted on alert2002 it will break [19:52:38] Syntax error on line 18 of /etc/apache2/sites-enabled/50-requestctl-wikimedia-org.conf [19:52:51] SSLCertificateFile: file '/etc/acmecerts/icinga/live/rsa-2048.chained.crt' does not exist or is empty [19:53:32] I noticed when trying to deploy an unrelated change to the apache config. [19:53:45] luckily did not touch the prod server yet [19:54:42] looks like acmechief stopped using RSA certs.. only ec-prime* certs there [19:54:54] but apache config still references one [19:55:14] yes that's right [19:55:30] we missed the rsa one? please drop it [19:55:34] so.. I reverted my change but .. it did not cause the error [19:55:41] ^ brett [19:55:44] yet its broken now [19:56:13] you should be using ec-prime256v1.crt only nowadays [19:56:38] maybe this is because icinga does not have an .erb template with the entire apache config like other services [19:57:01] it seems to just use a couple httpd::conf snippets on top of the default from package [19:57:15] or I am missing where it is [19:57:28] thanks for confirming the RSA part needs to go, ack [19:57:45] ack [19:58:45] right, this is not an issue in the icinga-wikimedia-org.conf [19:58:53] it is in the requestctl-wikimedia-org.conf [19:59:41] that's dbd2685c58c94a27920a06e35a222de476b22cbf [20:00:36] making it 'hidden' worked :p [20:02:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1177486 [20:02:10] that should fix it [20:02:31] beat me to it [20:02:32] thanks, I was writing the same thing [20:02:37] lol [20:02:41] haha,we always do this :) [20:04:15] CI is taking its time.. [20:04:44] deploying on alert2002 [20:05:25] cool [20:08:18] alright! apache on alert2002 is up again. [20:08:27] and prod icinga saved from a downtime event [20:08:29] thanks [20:08:44] will try to re-revert the change I had to add NEL headers [20:08:52] Thanks folks. [20:10:24] brett: can haz puppet-merge lock ? [20:10:34] done now :) [20:10:49] :) thanks