[09:11:35] <jinxer-wm>	 (ThanosSidecarNoConnectionToStartedPrometheus) firing: Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus
[09:16:35] <jinxer-wm>	 (ThanosSidecarNoConnectionToStartedPrometheus) firing: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus
[09:21:35] <jinxer-wm>	 (ThanosSidecarNoConnectionToStartedPrometheus) resolved: (2) Thanos Sidecar cannot access Prometheus, even though Prometheus seems healthy and has reloaded WAL. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarNoConnectionToStartedPrometheus
[15:49:52] <cdanis>	 hello o11y i think icinga is struggling to process events? https://grafana.wikimedia.org/d/rsCfQfuZz/icinga?orgId=1&var-datasource=codfw%20prometheus%2Fops&from=now-7d&to=now&viewPanel=4
[15:51:00] <denisse>	 cdanis: Thanks for the heads-up. Taking a look.
[15:51:28] <cdanis>	 icinga also sent two pages to vops via email but nothing posted on IRC
[15:52:22] <godog>	 yeah I noticed that too, we should be doing better on that front cdanis 
[15:52:28] <godog>	 i.e. ircecho restarted
[15:53:30] <godog>	 yeah I saw icinga-wm just posted sth
[16:05:29] <godog>	 I'm keeping an eye on that dashboard btw, maybe we can chalk this up to "icinga takes the same time to warm up as a container ship engine"
[16:10:06] <godog>	 the max latency check shot up yesterday on alert1001 too, defo unexpected though not failover-related
[16:11:28] <denisse>	 icinga-wm is sending messages correctly to -ops
[16:15:00] <denisse>	 The latency check seems to be consistent at 21.9s
[16:15:28] <denisse>	 Tho I'm not sure about the possible root cause for that.
[16:37:06] <denisse>	 Aside from the high latency (unrelated to the failover) everything looks good for the alert hosts.
[17:12:16] <topranks>	 We also have a strange scenario where our Icinga BFD check is failing for past 2+ hours 
[17:12:28] <topranks>	 The Python script seems to be failing to load the MIB 
[17:12:34] <topranks>	 https://phabricator.wikimedia.org/T359198
[17:12:46] <topranks>	 I don't have time to look further right now, but just FYI 
[17:13:16] <denisse>	 Thanks for filling the task, I'll take a look at it.
[17:17:03] <topranks>	 denisse: thanks! 
[17:19:12] <godog>	 looking too, thanks topranks 
[17:21:47] <godog>	 hah I think we were missing a 'download-mibs' invocation
[17:25:15] <topranks>	 ok yeah I was wondering
[17:25:35] <godog>	 ok should be better now topranks 
[17:25:51] <topranks>	 godog: ok great!  was the system reimaged or something?
[17:26:22] <denisse>	 topranks: Yes, we upgraded it from Buster to Bookworm, this also involved a Python 2 to Python 3 upgrade.
[17:26:28] <topranks>	 ah ok thanks 
[17:26:40] <topranks>	 and I guess the download-mibs wasn't part of the automation flow 
[17:26:56] <topranks>	 showing green across the board now!
[17:26:57] <topranks>	 thanks :)
[17:27:01] <denisse>	 You can find more info of the upgrade in here: https://phabricator.wikimedia.org/T333615
[17:27:01] <denisse>	 Apologies for the inconvenience caused!
[17:27:13] <topranks>	 no probs, glad it wasn't anything too tricky :)
[17:30:46] <mutante>	 see this package on the alert hosts:
[17:30:53] <mutante>	 ii  snmp-mibs-downloader                 1.2     
[17:31:09] <mutante>	 my guess is this was supposed to download the missing MIB
[17:31:26] <mutante>	 but either lacked a timer or it was just downloading in the wrong/different path 
[17:31:33] <mutante>	 since the fix seemed to be a symlink?
[17:33:01] <topranks>	 I think the fix was to execute the downloader command that is installed with that package 
[17:33:40] <topranks>	 it's an odd package that one, it installs a tool that downloads mibs from the internet, but the tool isn't executed automatically after package installation 
[17:34:17] <denisse>	 topranks: One question regarding the MIBs, do you know how often are they updated? I'm wondering if a systemd timer would be ideal for it or if a 1 time execution is the best approach.
[17:34:30] <topranks>	 they are never updated 
[17:34:31] <topranks>	 ever :)
[17:35:08] <mutante>	 systemd timer every Feb 29 :)
[17:35:21] <godog>	 yeah the installation paths changed between buster and bookworm, I did the quick and dirty thing of the symlink, the more proper fix is likely to ship /etc/snmp-mibs-downloader/snmp-mibs-downloader.conf with BASEDIR=/var/lib/snmp/mibs  or understand by snimpy isn't loading from where snmp-mib-downloader is downloading
[17:35:24] <topranks>	 I am open to correction, but I don't think those standard MIBs ever change, or have for many years 
[17:35:32] <denisse>	 Okay, I have another question. If they're never updated, would it be better for us to use quickdatacopy to sync them between the hosts?
[17:36:07] <mutante>	 if they are small I think it's just easier to pull on both/all machines. but either works
[17:36:11] <denisse>	 @godog Thanks for sharing your findings and for the fix. <3
[17:36:40] <topranks>	 denisse: I guess that's up to you.  They are quite small, just text files.
[17:36:43] <mutante>	 that way you dont need to define an active/source host in Hiera
[17:36:54] <topranks>	 probably feel free to do whatever the easiest thing is to get them on new hosts 
[17:37:40] <denisse>	 Okay, I'll work on a patch and keep you posted. Thanks.
[17:37:49] <godog>	 denisse: sure np
[17:41:40] <godog>	 ok I'm logging off for the day, ttyl
[17:42:43] <mutante>	 topranks: dennise: re: "was supposed to run the download tool". the download tool has config like this:
[17:42:46] <mutante>	 AUTOLOAD="rfc ianarfc iana"
[17:43:05] <mutante>	 maybe it means stuff can be added that is "auto loaded"?
[17:43:27] <topranks>	 mutante: I may have been mistaken tbh 
[17:43:44] <topranks>	 The BGP MIB is an IETF one, I believe covered by 'rfc'
[17:44:04] <topranks>	 godog can probably confirm if the tool ran, but just downloaded to a new location in bookworm (hence symlink)
[17:44:16] <topranks>	 or if he had to kick off the downloader and then also add the symlink 
[17:45:09] <mutante>	 ACK, you are right, if the symlink fixed it that is probably all
[18:56:54] <denisse>	 I can confirm it was a file path change. I can see calls to "Exec["download-mibs {title}"]
[22:08:32] <mutante>	 :)