[10:58:14] <topranks>	 godog: thanks for the review on the gnmic stuff!  
[10:58:44] <topranks>	 with the new changes I'm a lot more confident in everything than before, let's see how it goes 
[10:59:00] <topranks>	 can I ask for one more quick review?  
[10:59:01] <topranks>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114967
[11:01:36] <godog>	 topranks: sure you are welcome, I can only vote verified but anyways LGTM
[11:02:39] <topranks>	 ok, thanks!  
[11:02:39] <topranks>	 that's weird you can only vote verified 
[11:03:04] <godog>	 indeed, not sure what's going on
[11:38:47] <Emperor>	 see -sre
[14:20:07] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[14:27:41] <jinxer-wm>	 FIRING: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures
[14:30:07] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[14:40:07] <jinxer-wm>	 RESOLVED: [2x] ErrorBudgetBurn: logging - logstash-availability - https://wikitech.wikimedia.org/wiki/Monitoring/ErrorBudgetBurn   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[14:42:41] <jinxer-wm>	 RESOLVED: PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan1001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures
[16:40:15] <lmata>	 hi dcaro ! was looking at grafana, and am curious if there is anything we )olly) can help with to facilitate the move to Prometheus for WMCS ceph dashboards? https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1 and https://grafana-rw.wikimedia.org/d/613dNf3Gz/wmcs-ceph-eqiad-performance?forceLogin=true&orgId=1
[16:40:40] <lmata>	 -messed up the parens :(
[16:41:17] <volans>	 lmata: FYI he's out for few days ;)
[16:46:02] <lmata>	 ah
[16:46:06] <lmata>	 thanks volans!
[17:31:40] <dcaro>	 lmata: All that's left is getting the right metrics, that iirc have changed since the first migration. Ideally we would like to preserve the rack<->rack info more than the raw switch names, do might be a bit repetitive if that info is not in the labels (I think it was not), but yep, I'm away until mid Feb, feel free to take a stab at it in the meantime
[18:23:05] <topranks>	 the interface description is in every metric, which has the remote device name, and the device names have the remote rack. 
[18:24:45] <topranks>	 Given we are talking about 6 interfaces between these racks in total, I don't think its excessive to manually label the graphs if that is not suitable 
[18:29:25] <topranks>	 seems the existing ones already have that, and are based on bits 
[18:30:05] <topranks>	 lmata: I'll see if I can have a stab at some of them 
[19:35:20] <topranks>	 I added a copy of the discards graph based on the new metrics to that dashboard now 
[19:35:33] <topranks>	 won't have time to look at the rest but hopefully it demonstrates how to do the same thing