[09:26:59] The 'Check for large files in client bucket' NRPE alerts (https://alerts.wikimedia.org/?q=team%3Dwmcs&q=alertname%3DCheck%20for%20large%20files%20in%20client%20bucket) are related to the move out of icinga? @godog? [09:27:43] should I ignore them? [09:28:22] dcaro: indeed, cc slyngs ^ [09:28:41] ack, thanks :) [09:28:42] I think it'll eventually converge [09:29:11] the puppet-exported-resource-icinga-and-host-sync dance [09:30:07] I think it already resolved [09:30:16] Or disappeared [09:30:44] And a new one appeared :-) [09:31:00] yeah we're basically looking at puppet runs on the icinga host [09:31:17] i.e. when the existing unknowns clear [09:36:23] Cool, so it's just the number that's terrifying :-) [09:37:33] that's right yeah [11:13:06] I wonder if it'd make sense to add a method to spicerack for wait for all alertmanager alerts for specific hosts to clear. so something like IcingaHosts.wait_for_optimal() but for AlertmanagerHosts [11:16:00] That was part of the original ask, see T293209 for historical context. That's something clearly needed but IIRC there wasn't a clear solution at the time. Not sure if anything has changed. [11:16:00] T293209: Spicerack: add support for Alertmanager - https://phabricator.wikimedia.org/T293209 [11:21:12] yeah we could probably match on instance =~ 'host:.*' as a good approximation [11:54:13] FYI, there's a new grafana security issues, but doesn't affect us: https://grafana.com/security/security-advisories/cve-2023-6152/ [11:55:02] but let's still update to 9.5.x now that the OS update is complete? then we don't need to rush a 9.4->9.5 update when there's a new security issues which does affect us [12:47:38] moritzm: ack, thanks for the heads up, yeah I'm opening a task for the 9.5 upgrade which I'm expecting to be quick/straightforward [12:47:42] denisse: ^ FYI [13:17:00] ack, thx [14:58:28] Do we have a prometheus sandbox? I'm working thru requirements in T357537 and it might be helpful to test some of this outside of prod [15:36:15] inflatador: we have a shared pontoon o11y stack, which does run prometheus, depending on the kind of testing you are after even production might be fine tho [15:38:26] godog ACK, I just want to avoid grinding all changes thru gerrit/puppet. If you have details on the pontoon stack LMK, I don't think I'm brave enough to test in prod quite yet ;P [15:38:39] if it's easy enough we could spin up our own pontoon stack [16:03:53] yeah fair enough inflatador re: not wanting to grind changes through gerrit, I'll reach out tomorrow re: pontoon [16:15:04] godog ACK . If you're available next wk we could push back and meet w/ ottomata as well. I think he has a much better idea of what we're trying to do ;) [16:17:08] inflatador: sounds like a plan! feel free to send over an invite, calendar's open [16:36:16] great, sent an invite for ~2 wks from now [21:54:54] godog: is a-alert-01.monitoring.eqiad1.wikimedia.cloud intentionally trying to run a copy of logmsgbot?