[10:51:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[11:26:43] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[12:54:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[16:54:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[18:49:43] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[19:18:54] <cdanis>	 hi cwhite -- wanted to circle back with you at some point to discuss https://gerrit.wikimedia.org/r/c/operations/puppet/+/1112295
[19:40:43] <jinxer-wm>	 FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[19:55:05] <cwhite>	 cdanis: for sure, lets chat
[19:58:23] <cwhite>	 The largest issue I see is accidental opt-ins without fully understanding the implication.  A close second is a preference to logstash reacting on less-mutable traits.
[20:02:02] <cwhite>	 It's a goal to remove this method of ecs translation and make the pipeline more generic.  To that end, there's a preference to limit its application to unmaintained and unowned code (think, gitlab, et. al.).
[20:06:23] <cwhite>	 Perhaps keying off something else would help to these ends?  Is there a way an application could announce that it is a nodejs/service-runner application?
[20:07:03] <cdanis>	 cwhite: sure, we could modify the label name to be service-runner-specific
[20:08:00] <cdanis>	 the label used there doesn't exist anywhere / isn't used by anything yet.  so picking a different name is free :)
[20:10:23] <cwhite>	 Cool :)
[20:10:43] <jinxer-wm>	 RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
[20:10:52] <cwhite>	 Any possibility we can also eliminate the array search?
[20:12:05] <cdanis>	 cwhite: sorry, on which line?
[20:12:40] <cwhite>	 `"wmf_logging_node-to-ecs-container" in [kubernetes][labels]` specifically
[20:13:15] <cdanis>	 oh
[20:13:52] <cdanis>	 sorry, that was me being Python-brained.  is it safe to compare [kubernetes][labels][foo] == whatever if it's not guaranteed that kube.labels.foo exists?
[20:16:46] <cwhite>	 IIRC, it's safe as long as the parent object exists.
[20:17:02] <cwhite>	 Probably worth a double-check to be sure.
[20:18:32] * cwhite runs logstash-filter-verifier
[20:18:40] <cdanis>	 new PS is up assuming so :)
[20:22:00] <cdanis>	 filter-verifier seems to think they'll both compare true as empty strings, lol
[20:22:03] <cdanis>	 new PS uploading
[20:34:31] <cdanis>	 okay, latest PS doesn't crash filter-verifier nor does it touch any existing test case
[20:40:47] <cwhite>	 This feels very hacky
[20:41:43] <cwhite>	 What if this a premature optimization?  Maybe we start with `[kubernetes][container_name] == "my_app"`?
[20:45:54] <cdanis>	 cwhite: sorry, which part?  the idea is that users would label their pods with a container name as the value
[20:48:25] <cwhite>	 Linking the value of one field to some other field and that fact changing the behavior.  I'm not sure it's sustainable long-term and am concerned it will break in the future
[20:50:29] <cdanis>	 both container names and the labels are under the user's control in the helm chart
[20:50:51] <cwhite>	 This is a mechanism built into the logstash config.  If we evaluate something other than logstash, it may not translate well.
[20:51:56] <cdanis>	 are we thinking about moving away from ELK for logs?
[20:53:03] <cwhite>	 we don't use elk ;)
[20:53:24] <cdanis>	 are we thinking about moving away from OLK for logs?
[20:53:29] <cwhite>	 but logstash is pretty inefficient.
[20:53:51] <cwhite>	 so yes, I'm interested in evaluating that possibility
[20:54:05] <cdanis>	 I see