[07:10:16] <moritzm>	 I'll be rebooting bast1003/bast3005 in 20 minutes
[07:36:02] <moritzm>	  both bastions are back up now
[08:19:47] <moritzm>	 headsup; I'll rebooting deploy1002 in ~ 5 minutes
[08:32:08] <moritzm>	 and back up again
[09:17:19] <kormat>	 is there a way to downtime a single icinga check across multiple hosts?
[09:19:47] <jynus>	 I don't think, programatically, but what I do is to search the check name on the web and usually it is quite easy
[09:19:51] <kormat>	 (in this case, i want to downtime 'Check systemd state' for all prometheus hosts)
[09:23:23] <kormat>	 jynus: ah. i didn't realise that you could search by check name. the 1860 results is a bit much, but still, this is useful. thanks!
[09:23:45] <jynus>	 yeah, it works better when the check is only on a few hosts :-(
[09:36:27] <volans>	 kormat: yes
[09:36:30] <volans>	 the downtime cookbook
[09:37:05] <kormat>	 volans: i couldn't see any flags on it for specifying a check
[09:37:11] <volans>	 (at least spicerack has the capability, checking if it was exposed there or not)
[09:38:40] <volans>	 kormat: so, it's not exposed to the cookbook, but can be easily used by a REPL if needed, patches are welcome :)
[09:38:56] <volans>	 https://doc.wikimedia.org/spicerack/master/api/spicerack.icinga.html#spicerack.icinga.IcingaHosts.downtime_services
[10:53:50] <Emperor>	 vgutierrez: Am I OK to puppet-merge Log emergency messages to disk (887dc7cb96) ?
[10:53:55] <vgutierrez>	 yup
[10:54:01] <vgutierrez>	 go ahead please
[10:54:13] <Emperor>	 done, thanks :)
[10:54:17] <vgutierrez>	 thx
[11:03:41] <moritzm>	 what again is the point of "predictable" network interface names if they keep changing with every major OS release? for a random Ganeti server:
[11:03:43] <moritzm>	 eno1 (stretch) -> ens3f0np0 (buster) -> enp175s0f0np0 (bullseye)
[11:04:26] <moritzm>	 looking forward to bookworm, maybe enX8493s0044m888dfm4fffe4 or so?
[11:04:29] <paravoid>	 not gonna defend the way predictable interface names are implemented but in this case this may be more on ganeti and/or qemu moving the card to a different PCI slot
[11:05:27] <moritzm>	 sure, but if the PCI assignments are seemingly random that makes the whole assumption of predictable kinda moot...
[14:37:05] <vgutierrez>	 predictable as long as "HW" doesn't change
[14:37:11] <vgutierrez>	 with VMs that's kinda volatile as well
[14:38:05] <vgutierrez>	 but I'm not the one that's going to defend predictable interface names here :)
[14:49:30] <ori>	 how do operational metrics like QPS get from individual services to grafana these days? Where are they aggregated?
[14:54:49] <godog>	 ori: basically graphite is around for mediawiki, prometheus for ~everything else, and there's some global aggregation done by thanos
[14:55:22] * bd808 guesses about the same as godog has confirmed
[14:57:05] <godog>	 bd808: o/ great guess
[14:57:35] <bd808>	 I was making assumptions based on what I've learned about the k8s cluster with Toolhub :)
[14:59:24] <godog>	 heheh, I hope that means you didn't have to deal with the graphite bits
[15:00:31] <bd808>	 heh. no just learn more about prometheus which I has somehow been avoiding
[15:02:18] <godog>	 *nod* only tangetially related but web access to the prometheus interface per-site is coming soon
[15:04:46] <bd808>	 https://thanos.wikimedia.org/ got me what I needed for exploring.
[15:05:29] <godog>	 inevitable!
[15:05:50] * bd808 looks for the 1 timeline where he survives
[15:07:54] <godog>	 lolz
[15:13:53] <_joe_>	 ori: more in detail, services that use the node template all have prometheus metrics export baked in
[15:14:28] <_joe_>	 and they're standardized, so once your service is deployed to k8s, it takes a few clicks to have a grafana dashboard like
[15:14:53] <_joe_>	 https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?orgId=1
[15:15:27] <bd808>	 ak.osiaris documented the scraper config magic for pods recently too -- https://wikitech.wikimedia.org/wiki/Kubernetes/Metrics#Workload/Pod_metrics
[15:15:30] <_joe_>	 also the envoy that acts as a tls terminator/service proxy exports more metrics, also collected
[15:27:57] <ori>	 thanks for the pointers all