[10:35:03] Hello! I have a problem with downtiming a host. It fails with "spicerack.icinga.IcingaStatusParseError: Unable to parse Icinga status". [10:35:03] I looked at the command which is executed on the icinga host and this fails with [10:35:03] jelto@alert1002:~$ sudo /usr/local/bin/icinga-status wikikube-worker2084.codfw.wmnet [10:35:03] IcingaStatusParseError: corrupt status.dat: Failed to find downtime object [10:35:03] Is there anything I can do about the corrupt status.dat file? [10:36:36] jelto: is this https://gerrit.wikimedia.org/r/c/operations/puppet/+/1104967 ? cc volans [10:37:05] I'm assuming so [10:37:45] yes, I mentioned it to Riccardo earlier and that fixes it [10:38:14] oh nice, yes that sounds like the issue I'm facing as well. Thanks for the quick help [10:38:27] so far, the check has assumed that there's always one active downtime, but surprisingly none of our 2300 hosts and 100ish services currently has one [10:39:35] interesting [10:45:46] btw is it normal we don't have any downtime at all? [10:45:47] seems weird [10:55:29] I don't know if it is normal tbh [10:56:05] btw I've run puppet on alert1002, all good [10:56:21] thank you volans [11:02:59] my cookbooks are unstuck as well, thank you! [21:56:30] Hey 0lly! I've got an exporter running on 9194 and outputting metrics, but nothing in prometheus or thanos (ref https://phabricator.wikimedia.org/T374916#10411079 ) . Based on the output of `/srv/prometheus/ops/targets/blazegraph_eqiad.yaml` on prometheus1006, it does look like the targets are configured correctly. Any suggestions where I should look next?