[00:47:06] !log admin deploying a change so that openstack clients use tls endpoints: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732738 [00:47:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:48:36] legoktm: how would you like to be notified if that counter stays the same for 24h? [18:23:52] majavah: is email an option? otherwise IRC is fine [18:25:23] legoktm: both of those and their combinations are available [18:26:05] just email then [18:26:18] which one? [18:26:21] debian? [18:27:19] yes please [18:29:08] after the current run finishes I'll turn it off so we can test it [18:29:21] that should be set now [18:29:53] the prometheus rule I ended up going with is "increase(libup_runs[1d]) == 0" [18:31:47] thank you :D this is great [22:20:41] !log codesearch restarted everything to add new prometheus metrics endpoint: https://codesearch.wmcloud.org/_metrics [22:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [22:22:05] majavah: ^^ I think we could do something like, if a backend is down for more than 15min, send an alert [22:22:40] it automatically restarts, and the start up process can take a while, so we do need some time buffer before alerting [22:39:24] !log codesearch one more restart of everything, switched hound container over to bullseye for newer git [22:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL