[08:51:32] godog: cwhite: about events not being indexed by logstash/elasticsearch, there was the meta task https://phabricator.wikimedia.org/T240667 [08:51:57] seems the grafana board does list a stream of rejected events but the logstash query link on that task does not show any errors [08:52:20] https://grafana.wikimedia.org/d/000000561/logstash ;) [08:52:46] if you could amend that task to replace the https://logstash.wikimedia.org/goto/3283cc1372b7df18f26128163125cf45 with the proper dashboard, that would be great [08:53:30] I did hit that one back in december i think, cause some kind of events were some time no more logged after a new index got created at midnight [08:53:43] so I guess we are missing events which might affect how we monitor for mw error [08:54:24] (it is not like I have any idea how we can prevent the type mismatching for a given field in MediaWiki) :\ [08:55:34] bonus if the grafana view https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 gets a link to the logstash dashboard listing those events [15:07:49] hashar: thanks for the heads up. I've amended the link in the task. As of earlier this week there is a link to the logstash dashboard from the grafana panel. [15:29:59] cwhite: neat! I thought that maybe we can detect whether the messages come from mediawiki and thus tag those indexing errors with type:mediawiki. This way we would see them as part of the train log triage and get them fixed [15:30:32] or maybe even make those train blockers. But maybe I have too many ideas [15:38:22] That's a great idea. As it stands now, there isn't a an easy way to query the "type" or "source" of a log message that failed due to a mapping error. We capture the error message that logstash generates which has specifics of how it failed and the raw text of the message that caused it. The challenge would be to classify the failed messages, even in situations of the message [15:38:25] being unparseable. [15:50:44] sounds tricky. I will have a look at the dashboard and see what I can figure out