[15:16:52] Is the prometheus-pushgateway out of service? [15:26:32] good question... prometheus1005 seems to be acting up which is one of those hosts. investigating... [15:29:00] Thanks! Can open a task if you like [15:29:22] eoghan: please do. it looks like we're facing hardware issues [15:29:43] Sure thing. Exactly what you want before a weekend! [15:30:05] ^_^; [15:33:01] https://phabricator.wikimedia.org/T362989 [15:38:54] Thanks cwhite! [15:41:14] task is off to eqiad ops - working on failover now [15:45:19] herron: you around? [15:47:17] cwhite: yes [15:47:35] just sent some patches your way for a look [15:51:22] cwhite: +1d [15:51:38] thanks - here we go [16:08:13] eoghan: looking any better? [17:41:06] by default, the blackbox exporter doesn't consider 4xx errors to be failing, does it? [17:41:20] I mean "by default" as in how it's configured at WMF [17:45:11] inflatador: it depends on the blackbox module you're using, but alerting on a 4xx response is most likely appropriate [17:47:04] cwhite ACK, I'm seeing a lot of 403s returned by WDQS at the moment. Agreed that there's a problem, just wanted to make sure that would trigger alerts [17:49:20] surveying the modules, it seems we have a few that expect http 400 responses: puppetmasters, urldownloaders, and squid [18:03:39] inflatador: we also use it in some cases to expect a 302 [18:04:19] the parameter is "stats_matches" (of prometheus::blackbox::check::http) [18:04:35] can be more than one value [18:04:47] status_matches [18:08:00] modules/prometheus/templates/blackbox_exporter/common.yml.erb for the list of "valid_status_codes" [18:12:31] ah nice, thanks