[14:52:51] hello folks! [14:53:18] I filed https://gerrit.wikimedia.org/r/c/operations/puppet/+/941434 as proof of concept to start a discussion, maybe it is not the best way forward [14:53:46] I realized that some logs from Istio Gateway pods (more specifically, istio-proxy containers) doesn't show up in my istio dashboard on logstash [14:54:16] after some help from Filippo I got that it may be the case that we are heavily sampling it [14:55:02] so I was wondering if there is the space to ingest access logs from those gateways from now on [15:19:33] SGTM. Will add around 2.3M logs to the webrequest partition - there is room. [15:19:42] 2.3M/day [15:21:38] cwhite: o/ to the k8s partition or the webrequest one? [15:23:30] good catch, you're right the k8s partition [15:23:44] still has room :) [15:27:44] ah super, I thought I had made the wrong assumptions! [15:27:46] thanks a lot! [15:27:50] I think we'll have to watch it closely though. If this is the source of the incredibly bursty logs that occurs regularly, we'll want to reconsider. [15:28:05] definitely yes [15:29:24] Someone on that stream dumps a ton of logs quickly and regularly, but we don't have visibility into who it is yet. [15:29:45] It's the reason we had to change our SLO lol [15:30:17] * elukey blames cloud-native things [15:33:58] the new filter works nicely, thanks again! [15:36:21] I'm going to reroute those logs back into the webrequest partition, but keep sampling disabled. [15:36:50] simple because we treat webrequest logs differently [15:48:12] ack! [15:48:50] one question - I didn't see the new logs in codfw, is the change rolled out in there or it is something that puppet takes care of ? (so there may be some delay after merging etc..) [15:49:44] I'm not sure what you mean. Do you mean logs on the codfw logstash cluster? [15:50:31] yes sorry, I don't see them popping up in the logstash dashboard [15:51:14] wondering if it is the last change [15:51:15] mmmm [15:51:32] codfw logstash cluster is backed up in kafka because of the bursty traffic I mentioned earlier. someone is blasting logs [15:51:48] ahhh sorry I didn't get it [15:52:20] I thought it was something that happened every now and then [15:52:22] not now :) [15:54:04] It happens regularly. See: `sum(rate(logstash_node_plugin_events_in_total{plugin="drop", instance=~"logstash10[23].*"}[5m]))` [15:55:03] yeah I see the logs now [15:55:29] thanks for the follow up :) [16:52:24] cwhite: last thing I promise - I filed https://gerrit.wikimedia.org/r/c/operations/puppet/+/941455 for the logging clusters, it is not urgent but lemme know what you think about it in these days [16:52:36] it requires a restart of the brokers to be applied [16:54:07] (going afk, I'll read tomorrow!) [16:56:03] added herro.n for review for when back in the office :)