[07:20:55] FIRING: [2x] SystemdUnitFailed: prometheus-ipmi-exporter.service on db1179:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:44:23] Am.ir1: let me know if I can help with anything this week. I'm going through arnau.db email [07:50:29] for the db1179 alert above is caused by https://phabricator.wikimedia.org/T368088#10001441 [07:50:57] as soon as I get an ok to update I'll do, I guess the host was down when the upgrade happened (uptime 2 days) [09:23:57] arnaudb: the silence I added last week for clouddb* is about to expire, shall I extend it for a bit longer? [09:24:02] (that's the one named "Ignoring until team label is set correctly") [09:27:58] dhinus: he's out for vacations [09:28:25] ah sorry for the ping then :/ I will extend it [09:32:31] the silence is to prevent the new alert "MysqlReplicationLagPtHeartbeat" from appearing in the data-persistence board if it's related to clouddb* hosts [09:32:59] I will have a look if I can find the proper way to tag it with team=wmcs [09:33:00] ack, thanks for the advance notice and I agree to exend [09:33:03] *extend [09:33:09] thx [09:35:24] extended for 1 week [09:56:52] volans: I repooled db1179 last night. I can depool it for a bit. What needs updating? The version doesn't seem to be kernel. Is it firmware? [09:57:31] it's the prometheus-exporter version [09:57:46] probably no need to depool at all [09:58:00] I was just unsure if the exporter itself needs any extra step like config change or something [09:58:01] ah okay, wanna go at it? [09:58:18] I have no clue either. Only one way to find out :P [09:58:44] I was hoping in an answer on task, I'll ping the people involved in the upgrade to be sure :) [10:35:55] FIRING: [2x] SystemdUnitFailed: prometheus-ipmi-exporter.service on db1179:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:39:43] mmmh I've updated the packate and is running fine, there is no systemd unit failed on the host [10:46:40] ahh it's wmf_auto_restart_prometheus-ipmi-exporter, got it [10:47:25] now it should be cleared [10:50:55] RESOLVED: [2x] SystemdUnitFailed: prometheus-ipmi-exporter.service on db1179:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:51:03] thx jinxer-wm