[06:04:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:14:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [10:50:46] hi folks, FYI while looking into T411470 I realized I might as well fix T399807 so that's what https://gerrit.wikimedia.org/r/q/topic:%22bug/T399807%22 is about [10:50:47] T411470: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470 [10:50:47] T399807: Allow team customization for service::catalog probes - https://phabricator.wikimedia.org/T399807 [12:16:15] Hi Filippo, it's very nice to see you here again. :3 [12:16:25] The patches LGTM, thank you!! [12:36:32] thank you denisse <3 it is good to fix stuff I wanted to fix [14:56:25] FIRING: SystemdUnitFailed: statograph_post.service on alert1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:03:19] FIRING: ThanosQueryInstantLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [15:08:12] RESOLVED: [2x] ThanosQueryInstantLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [17:35:41] FIRING: [2x] PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan2001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures [17:40:41] RESOLVED: [2x] PrometheusRuleEvaluationFailures: Prometheus rule evaluation failures (instance titan2001:17902) - https://wikitech.wikimedia.org/wiki/Prometheus - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fops - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRuleEvaluationFailures