[10:53:35] Hello, quick question. Do we have any existing or preferred method for generating alerts from arbitrary logfile content? For context: https://phabricator.wikimedia.org/T309649#8008480 [10:56:47] I can see that we have a `check_journal_pattern` but the log file I'm interested in can't be viewed with journalctl. I could just write a grep but I wonder if this is an issue that anyone else has faced already. [11:21:44] btullis: thank you for providing context, I think we do have some examples in puppet already (looking for them) but before that, I'm wondering if you'd be best served by alerting on the backup image age instead ? [11:22:33] that way you are also catching the fetchimage service not running at all [11:24:15] godog: Thanks, that's true but in this instance I think that the backup itself was created but the backup /source/ (`/srv/hadoop/name/current`) was stale. [11:25:10] I think I might be able to get what I need by checking on the date mentioned in the file `/srv/hadoop/name/current/VERSION` [11:25:31] btullis: yeah that wasn't clear, I meant the backup source indeed [11:25:56] I'm looking for an example in puppet, pretty sure we have "export file age as a metric" already [11:28:21] OK, thanks. I'll go with this approach then. Hopefully the VERSION file is kind of written atomically with a successful flush to disk of the fsImage. [11:30:14] sigh, I was sure we had sth in puppet to export file mtime [11:30:43] prometheus::node_file_flag is pretty close, though exports 0/1 if the file exists or not [11:31:03] arguably it could/should export mtime when the file exists too [11:35:54] Oh right, I wasn't aware that we could use timestamps like this in prometheus but it seems like a good idea. [11:37:15] yeah! same way we're checking for cert expiration [11:38:18] btullis: still can't find an example in puppet, I'm afraid there will be some work required either to get the existing "file based exporters" to DTRT or write a new one based on the existing ones [11:39:08] personally I think node_file_flag could be adapted to export mtime too, a brand new one would work too of course [11:39:54] No worries, thanks anyway. I think I'll just go with a simple `/usr/lib/nagios/plugins/check_file_age /srv/hadoop/name/current/VERSION`for now. [11:41:18] ok! [11:41:23] There's an additional gotcha that this only gets written on the standby server, so it'll fire when we do a role swap with the sre.hadoop.roll-restart-masters cookbook or similar. [11:41:37] Anyway, thanks again for your help. 👍 [11:41:44] can't say that I endorse adding new icinga checks, but I see where you're coming from :) [11:41:49] sure anytime! thanks for reaching out [11:43:06] Ah well, if you put it like that, maybe I should tackle the exporter option :-) [11:45:01] btullis: haha! I was being a little facetious, I understand you want to get a thing done and the simple solution will work (i.e. icinga) for now, so +1 to that [11:46:37] need to step afk, brb