[16:01:18] Hello team, someone asked on Slack if the WMF site was down (it's not, don't worry). [16:01:19] From what I understand the site is managed by Automatic therefore a downtime would not be visible in our Grafana instance. [16:01:19] Do we have a place where we can see if https://wikimediafoundation.org is down? [16:07:16] denisse: yes, we monitor wikimediafoundation.org from within the infra and reaching out to the internet, via "watchrat" [16:07:38] which is an internal name for the "thing" that replaced ... wait for it ... watchmouse [16:07:44] https://grafana.wikimedia.org/d/GYciEga7z/watchrat is the dashboard [16:13:49] godog: Thanks a lot Filippo, I'll pass the word on the Slack thread. [16:14:04] that dashboard show no data to me :) [16:14:18] Btw, watchmouse and watchrat are funny names. :) [16:14:28] denisse: sure np! [16:14:34] volans: I think it only shows the non successful requests. [16:14:41] yeah I think so too [16:15:36] sure, but how does one know that it's all good vs we're not monitoring it anymore? :) [16:15:44] But I do think that can be a little bit confusing. It may be a good idea to also show the successful requests as that would not only make it less confusing but it'd also help us to know if the daemon is working correctly. [16:16:09] volans: currently it's split with http response >400 on the left and blackbox probe errors on the right [16:16:59] we could show all the healthy probes yeah [16:17:35] annnnd nerd snipe successful [16:42:30] denisse: I've considered also asking about Automattic adding our same NEL response headers to get data that way, but it hasn't been high-priority [16:42:38] https://wikitech.wikimedia.org/wiki/Network_Error_Logging [17:02:52] hopefully the watchrat dash is a bit clearer about what's being checked now