[10:58:14] godog: thanks for the review on the gnmic stuff! [10:58:44] with the new changes I'm a lot more confident in everything than before, let's see how it goes [10:59:00] can I ask for one more quick review? [10:59:01] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114967 [11:01:36] topranks: sure you are welcome, I can only vote verified but anyways LGTM [11:02:39] ok, thanks! [11:02:39] that's weird you can only vote verified [11:03:04] indeed, not sure what's going on [11:38:47] see -sre [14:20:07] FIRING: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [14:27:41] FIRING: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [14:30:07] FIRING: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [14:40:07] RESOLVED: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [14:42:41] RESOLVED: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [16:40:15] hi dcaro ! was looking at grafana, and am curious if there is anything we )olly) can help with to facilitate the move to Prometheus for WMCS ceph dashboards? https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1 and https://grafana-rw.wikimedia.org/d/613dNf3Gz/wmcs-ceph-eqiad-performance?forceLogin=true&orgId=1 [16:40:40] -messed up the parens :( [16:41:17] lmata: FYI he's out for few days ;) [16:46:02] ah [16:46:06] thanks volans! [17:31:40] lmata: All that's left is getting the right metrics, that iirc have changed since the first migration. Ideally we would like to preserve the rack<->rack info more than the raw switch names, do might be a bit repetitive if that info is not in the labels (I think it was not), but yep, I'm away until mid Feb, feel free to take a stab at it in the meantime [18:23:05] the interface description is in every metric, which has the remote device name, and the device names have the remote rack. [18:24:45] Given we are talking about 6 interfaces between these racks in total, I don't think its excessive to manually label the graphs if that is not suitable [18:29:25] seems the existing ones already have that, and are based on bits [18:30:05] lmata: I'll see if I can have a stab at some of them [19:35:20] I added a copy of the discards graph based on the new metrics to that dashboard now [19:35:33] won't have time to look at the rest but hopefully it demonstrates how to do the same thing