[13:28:38] cwhite: godog: a quick question about https://gerrit.wikimedia.org/r/c/operations/alerts/+/788720 or rather about the existing implementation [13:29:22] the elasticsearch data is mirrored between the two sites by logstash, right? so each elasticsearch_exporter sees the same data? [13:29:29] and I guess that means we're already double-counting [13:30:49] cdanis: yes that's correct, we're already double-counting [13:31:21] okay [13:31:36] not ideal but perhaps not too bad in this case I think [13:31:40] I don't know why I didn't think of that when I wrote it initially, but did think of that when I put ~the same data exported by statograph [13:31:53] the query that statograph runs is `sum(max by (type) (log_w3c_networkerror_type_doc_count{type=~"tcp.timed_out|tcp.address_unreachable"}))/60` [13:32:10] with the max taking care of eliminating the double-count [13:32:20] ah! smart [13:32:55] so given that this patch is equivalent to the wrong current implementation, it's +1 -- but if you also just wanted to change the `sum` to `max` and halve the thresholds, that's also +1 [13:33:00] (and if not I'll make a followup to do that) [13:33:37] thanks :) [13:33:53] cdanis: yeah might as well fix the expression while I'm at it, will send another PS for review, thanks! [13:43:41] hah also sum() isn't needed in this case since prometheus/thanos can deal with non-scalar expressions, even nicer [14:05:46] hah [14:05:48] yeah