[09:11:00] <jynus>	 I found an interesting flaw in our backup metrics monitoring- stats for prometheus: https://grafana.wikimedia.org/d/413r2vbWk/bacula
[09:11:57] <jynus>	 I sent metrics there for the latest backup, as it is time-based metrics, so we can get a history of all backups produced
[09:12:48] <jynus>	 the problem: full backups sometimes take so much time, than by the time they finish, an incremental backup for it is schedule after that, shadowing the "last backup metrics"
[09:13:19] <jynus>	 and because the incremental runs in less than 1 minute (scraping time), it hides all metrics for the full backup
[09:13:40] <jynus>	 while I love prometheus, it is not the right model for non-time-based metrics
[09:13:58] <jynus>	 so I will have to workaround by sending the latest metrics for each level of backups
[09:14:18] <jynus>	 incrementals and full separatelly
[09:15:23] <Emperor>	 I'm not sure I see the issue - you're sending metrics timestamped with the start time of the backup, or the end time?
[09:16:03] <jynus>	 scrapping happens every minute
[09:16:22] <jynus>	 so I send the state and data I have for the latest backup, finished or not
[09:17:20] <Emperor>	 Ah
[09:18:09] <Emperor>	 could you arrange to always send a metric when a backup finishes?
[09:18:10] <jynus>	 so what happens is: scrape (nothing new), full finishes (with the final size, files backed up, errors if any, etc), incremental runs in 0 seconds, scrape(sending data about the incremental only)
[09:18:51] <jynus>	 because of the 1 minute granularity, what I sent was not enough
[09:19:10] <jynus>	 and the start for the full were more interesting than the incrementals
[09:19:13] <jynus>	 *stats
[09:19:47] <Emperor>	 Sure; I think arranging to send a metric when a backup finishes might be the answer? [I'm assuming this can be done with prometheus...]
[09:19:59] <jynus>	 Emperor: that's not possible
[09:20:34] <jynus>	 I mean, I have those metrics through other means (bacula client, logs, etc.)
[09:20:52] <jynus>	 but our prometheus setup only allows for 1 minute metrics
[09:21:12] <jynus>	 so I have to bend to that model
[09:21:50] <jynus>	 in a push model I could send metrics only at the end, but prometheus is a pull only model
[09:22:51] <jynus>	 alternatively, I can send logs to opensearch and let grafana use the logs only, but not prometheus
[09:23:11] <jynus>	 logs are more flexible, and can do what you suggest
[09:24:37] <jynus>	 not too worried because in general monitoring options are plentiful, just nice graphs are not currently reliable until I fix how I sent metrics
[09:32:24] <Emperor>	 there is a pushgateway - https://prometheus.io/docs/instrumenting/pushing/
[09:32:43] <Emperor>	 [which I guess would give you a separate set of metrics about completed jobs only, rather than about the running ones]
[09:34:56] <jynus>	 that's setting up infra I don't want to maintain, it is just easier to split the metrics
[09:39:11] <jynus>	 in any case, not a big deal, my point was more like prometheus is great, but doesn't fit well for all monitoring needs
[09:48:29] <Emperor>	 fair enough
[19:47:47] <jinxer-wm>	 (SessionStoreOnNonDedicatedHost) firing: Sessionstore k8s pods are running on non-dedicated hosts - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSessionStoreOnNonDedicatedHost
[23:47:47] <jinxer-wm>	 (SessionStoreOnNonDedicatedHost) firing: Sessionstore k8s pods are running on non-dedicated hosts - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DSessionStoreOnNonDedicatedHost