[08:15:20] <_joe_> godog: it looks like prometheus-k8s in codfw is down/unavailable [08:15:42] _joe_: ack, checking [08:16:10] <_joe_> you can e.g. see https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&refresh=30s&var-dc=codfw%20prometheus%2Fk8s&var-service=mediawiki&var-namespace=mw-web&var-release=main&var-container_name=All&var-site= [08:16:21] <_joe_> if you select eqiad it works, else it doesn't [08:18:14] indeed, I think it is the oom/crashloop issue I was mentioning yesterday at the meeting [08:18:32] interestingly enough first time I see codfw affected, I'll bring it back up [08:20:06] <_joe_> yes [08:20:12] <_joe_> it's the same I think [08:20:20] godog: thanks! [08:20:23] <_joe_> and btw, right now we're doing backports so it can happen again [08:20:33] for everyone else, context is at: https://phabricator.wikimedia.org/T354399 [08:22:57] good to know re: backports, will keep an eye on it [08:25:21] we're back btw [08:26:35] thanks [08:27:49] np akosiaris [08:27:50] upgrades aside, does it make sense to think about shard prometheus instances across more hardware nodes? [08:28:00] sharding* [08:29:47] yeah it might, at least for k8s since we're gaining experience on what it means to operate it at bigger scale