[16:30:40] would one of y'all o11y folks be available to help me find out what's happening to the messages I send to input/kafka/mediawiki-php-fpm-slowlog-{eqiad,codfw} ? according to the logstash dashboard they are being ingested, they have a field "program": "php-fpm-slowlog" and are tagged input-kafka-mediawiki-php-fpm-slowlog-eqiad by the ingestion, but I can't find them querying either the program [16:30:42] or the tag [16:31:14] Is there a way to see what filters a message goes through that could explain that? [16:34:37] https://phabricator.wikimedia.org/P43187 < an excerpt of the messages directly from kafkacat [16:34:40] claime: logstash sees no logs on those topics: https://grafana-rw.wikimedia.org/explore?orgId=1&left=%7B%22datasource%22:%22thanos%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22logstash_node_plugin_events_in_total%7Bplugin_id%3D%5C%22input%2Fkafka%2Fmediawiki-php-fpm-slowlog-eqiad%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D [16:36:21] https://grafana.wikimedia.org/goto/nVgkB7o4z?orgId=1 < I see input here [16:36:49] sure enough, there's events there. hmmmm [16:36:50] I'm missing something [16:37:43] ah, I think they're getting dead-lettered [16:37:59] have a look at the `dlq-*` index pattern [16:38:17] `failed to parse field [process] of type [text] in document with id 'XVu9xYUBSpt2cI3LP2gs'. Preview of field's value: '{pid=8}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:278"` [16:38:43] * cwhite grumbles about schemas [16:39:34] So we're sending it with the wrong schema? [16:41:57] The event I pulled up (_id:0Fu9xYUBSpt2cI3LW3S0) is being treated as a legacy schema and is conflicting with the auto-generated logstash schema [16:43:07] the error above means "I expected the `process` field to be a string, but got an object instead." This is because some other log producer defined the process field as a string when the index was generated. [16:46:14] claime: is this a schema we've defined or is this the format that upstream provides? [16:48:25] cwhite: I think it's ours, we have a rsyslog ruleset for it https://github.com/wikimedia/operations-deployment-charts/blob/master/charts/mediawiki/rsyslog/slowlog.ruleset [16:51:06] claime: this is good https://github.com/wikimedia/operations-deployment-charts/blob/master/charts/mediawiki/rsyslog/templates.conf [16:51:37] It looks to me like we're using the `syslog_cee_slowlog` template? [16:55:14] it'd look like it since we don't have the constant(value="1.7.0" outname="ecs.version" format="jsonf") in the message [16:55:25] maybe: https://github.com/wikimedia/operations-deployment-charts/blame/master/charts/mediawiki/templates/rsyslog/configmap.yaml.tpl#L40 [16:55:53] * claime facepalms [16:56:14] Ok I'll try and switch that to just slowlog [16:58:49] claime: the `slowlog` template looks close to ecs-compatible. I'm not sure `error.stack` will work. I'm certain the `kubernetes` top-level field will be dropped altogether. [16:59:10] cwhite: they are dropped from the accesslogs so yeah [17:01:17] normalized.dropped.no_such_field [17:01:18] kubernetes, normalized_message, timestamp, _log [17:01:37] (for tags:input-kafka-mediawiki-httpd-accesslog-* in ecs-*) [17:02:24] Yep. If we want those, we'll need to define them. [17:02:37] We'll probably want them :p