[07:40:37] <godog>	 looks /away
[07:40:42] <godog>	 yeah right
[09:33:02] <Lucas_WMDE>	 FTR, I opened T393439 for the grafana-next issue I mentioned the other day
[09:33:03] <stashbot>	 T393439: Graphite data sources broken on grafana-next - https://phabricator.wikimedia.org/T393439
[09:36:43] <godog>	 neat -- thank you Lucas_WMDE 
[11:50:33] <Raine>	 my stale benthos alert is about to grumble again, just sent https://gerrit.wikimedia.org/r/c/operations/puppet/+/1142576 to clear it the lazy "I don't know why it's alerting on a non-existent CG and I don't care" way
[12:29:15] <godog>	 sure let's try it Raine, let me know how it goes
[13:03:20] <Raine>	 ty <3
[13:35:43] <Raine>	 godog: it worked \o/
[13:36:21] <Raine>	 next time I'll just reset offsets on the existing CG :D
[13:39:39] <godog>	 ok sweet Raine ! if you have time please also update Benthos' docs on how to do so
[13:40:32] <Raine>	 godog: will do next time I need to figure it out :D 
[13:40:53] <Raine>	 (I don't know off the top of my head, but iirc it's just running some command on the kafka host)
[13:41:23] <godog>	 hehehe ok
[13:46:13] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[13:48:38] <Raine>	 excuse me? :D 
[13:53:12] <Raine>	 going down now... I really should migrate it to k8s so I can just add a few more replicas, it's in the queue...
[14:03:18] <godog>	 agreed, moving to k8s would be ideal
[15:33:59] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[15:37:42] <Raine>	 godog: seems like on top of the long backlog, benthos is now having trouble with the same load it was previously happy with 
[15:38:13] <Raine>	 I can just increase buffering, so I don't mind, but FYI, something seems to have changed
[15:40:13] <godog>	 Raine: ack, thank you for letting me know, +1'd your buffering change
[15:41:27] <Raine>	 ty <3 
[15:43:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[16:36:55] <jinxer-wm>	 FIRING: SystemdUnitFailed: stunnel4.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:45:06] <denisse>	 ^ looking
[17:14:49] <inflatador>	 Hey Olly, I just joined the OpenSearch Slack and it looks like they have an RFC for OTLP integrations with OpenSearch. Linking here in case y'all are interested. Slack: https://opensearch.slack.com/archives/C051JEH8MNU/p1745903657059589 RFC: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/39707
[17:15:00] <inflatador>	 ^^ ccing cdanis 
[18:08:43] <jinxer-wm>	 FIRING: [2x] BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest_live - TODO  - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[18:08:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:09:06] <denisse>	 ^ Working on the Grafana alert.
[18:26:11] <cdanis>	 inflatador: thanks, good to know
[18:36:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: stunnel4.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:56:18] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[21:23:43] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag