[10:51:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [11:26:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [12:54:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [16:54:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [18:49:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [19:18:54] hi cwhite -- wanted to circle back with you at some point to discuss https://gerrit.wikimedia.org/r/c/operations/puppet/+/1112295 [19:40:43] FIRING: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [19:55:05] cdanis: for sure, lets chat [19:58:23] The largest issue I see is accidental opt-ins without fully understanding the implication. A close second is a preference to logstash reacting on less-mutable traits. [20:02:02] It's a goal to remove this method of ecs translation and make the pipeline more generic. To that end, there's a preference to limit its application to unmaintained and unowned code (think, gitlab, et. al.). [20:06:23] Perhaps keying off something else would help to these ends? Is there a way an application could announce that it is a nodejs/service-runner application? [20:07:03] cwhite: sure, we could modify the label name to be service-runner-specific [20:08:00] the label used there doesn't exist anywhere / isn't used by anything yet. so picking a different name is free :) [20:10:23] Cool :) [20:10:43] RESOLVED: BenthosKafkaConsumerLag: Too many messages in jumbo-eqiad for group benthos-webrequest-sampled-live-franz - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=jumbo-eqiad&var-datasource=eqiad%20prometheus/ops&var-consumer_group=benthos-webrequest-sampled-live-franz - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag [20:10:52] Any possibility we can also eliminate the array search? [20:12:05] cwhite: sorry, on which line? [20:12:40] `"wmf_logging_node-to-ecs-container" in [kubernetes][labels]` specifically [20:13:15] oh [20:13:52] sorry, that was me being Python-brained. is it safe to compare [kubernetes][labels][foo] == whatever if it's not guaranteed that kube.labels.foo exists? [20:16:46] IIRC, it's safe as long as the parent object exists. [20:17:02] Probably worth a double-check to be sure. [20:18:32] * cwhite runs logstash-filter-verifier [20:18:40] new PS is up assuming so :) [20:22:00] filter-verifier seems to think they'll both compare true as empty strings, lol [20:22:03] new PS uploading [20:34:31] okay, latest PS doesn't crash filter-verifier nor does it touch any existing test case [20:40:47] This feels very hacky [20:41:43] What if this a premature optimization? Maybe we start with `[kubernetes][container_name] == "my_app"`? [20:45:54] cwhite: sorry, which part? the idea is that users would label their pods with a container name as the value [20:48:25] Linking the value of one field to some other field and that fact changing the behavior. I'm not sure it's sustainable long-term and am concerned it will break in the future [20:50:29] both container names and the labels are under the user's control in the helm chart [20:50:51] This is a mechanism built into the logstash config. If we evaluate something other than logstash, it may not translate well. [20:51:56] are we thinking about moving away from ELK for logs? [20:53:03] we don't use elk ;) [20:53:24] are we thinking about moving away from OLK for logs? [20:53:29] but logstash is pretty inefficient. [20:53:51] so yes, I'm interested in evaluating that possibility [20:54:05] I see