[07:40:37] looks /away [07:40:42] yeah right [09:33:02] FTR, I opened T393439 for the grafana-next issue I mentioned the other day [09:33:03] T393439: Graphite data sources broken on grafana-next - https://phabricator.wikimedia.org/T393439 [09:36:43] neat -- thank you Lucas_WMDE [11:50:33] my stale benthos alert is about to grumble again, just sent https://gerrit.wikimedia.org/r/c/operations/puppet/+/1142576 to clear it the lazy "I don't know why it's alerting on a non-existent CG and I don't care" way [12:29:15] sure let's try it Raine, let me know how it goes [13:03:20] ty <3 [13:35:43] godog: it worked \o/ [13:36:21] next time I'll just reset offsets on the existing CG :D [13:39:39] ok sweet Raine ! if you have time please also update Benthos' docs on how to do so [13:40:32] godog: will do next time I need to figure it out :D [13:40:53] (I don't know off the top of my head, but iirc it's just running some command on the kafka host) [13:41:23] hehehe ok [13:46:13] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [13:48:38] excuse me? :D [13:53:12] going down now... I really should migrate it to k8s so I can just add a few more replicas, it's in the queue... [14:03:18] agreed, moving to k8s would be ideal [15:33:59] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [15:37:42] godog: seems like on top of the long backlog, benthos is now having trouble with the same load it was previously happy with [15:38:13] I can just increase buffering, so I don't mind, but FYI, something seems to have changed [15:40:13] Raine: ack, thank you for letting me know, +1'd your buffering change [15:41:27] ty <3 [15:43:43] FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [16:36:55] FIRING: SystemdUnitFailed: stunnel4.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:45:06] ^ looking [17:14:49] Hey Olly, I just joined the OpenSearch Slack and it looks like they have an RFC for OTLP integrations with OpenSearch. Linking here in case y'all are interested. Slack: https://opensearch.slack.com/archives/C051JEH8MNU/p1745903657059589 RFC: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/39707 [17:15:00] ^^ ccing cdanis [18:08:43] FIRING: [2x] BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest_live - TODO - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [18:08:48] FIRING: PuppetFailure: Puppet has failed on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:09:06] ^ Working on the Grafana alert. [18:26:11] inflatador: thanks, good to know [18:36:55] RESOLVED: SystemdUnitFailed: stunnel4.service on grafana1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:56:18] RESOLVED: PuppetFailure: Puppet has failed on grafana1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:23:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-mw-accesslog-metrics - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag