[09:53:24] <urbanecm>	 thanks chicocvenancio 
[15:18:00] <urbanecm>	 !log tools.phabbot Manually run `tools.phabbot@tools-sgebastion-07:~/phabbot$ ./new_wikis_handler.sh` to re-run the new wikis bot
[15:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.phabbot/SAL
[15:18:29] <urbanecm>	 !log tools.phabbot tools.phabbot@tools-sgebastion-07:~$ rm ~/*.err && rm ~/*.out
[15:18:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.phabbot/SAL
[15:58:45] <Spookreeeno>	 Amir1, legoktm: codesearch just crashed
[15:58:55] <Spookreeeno>	 16:54:04 <+wmcs-alerts> (WidespreadInstanceDown) firing: Widespread instances down in project codesearch   - https://prometheus-alerts.wmcloud.org
[15:59:19] <legoktm>	 O.o
[15:59:53] <Spookreeeno>	 widespread seems to go off even if only 1 instance
[16:00:01] <Spookreeeno>	 but it's got issues
[16:01:03] <legoktm>	 I'll be at my laptop in like an hour
[16:01:39] <Spookreeeno>	 K
[16:06:59] <majavah>	 !log codesearch hard reboot codesearch8 after OOM crash
[16:07:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL
[16:08:36] <majavah>	 legoktm: ^ I think that fixed it at least for now, console was full of "Out of memory: Kill process 23034 (houndd) score 711 or sacrifice child" or similar
[16:09:34] <legoktm>	 ty, it's probably time we move to a bigger instance 
[16:12:04] <AntiComposite>	 all y'alls writing too much code
[16:12:37] <majavah>	 happy to see https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra useful, btw
[16:13:54] <majavah>	 if you want I can send the alerts for metricsinfra to something else than -cloud-feed and if codesearch does prometheus metrics I can also make it can alert for other problems than "an instance is down" or "an instance is failing to run puppet"
[16:30:50] <Spookreeeno>	 majavah: why does 1/1 instances trigger widespread alert
[16:31:45] <majavah>	 because it's some percentage treshold, and the rule was designed for projects like tools with much more instances
[16:34:55] <Spookreeeno>	 majavah: would it be easy to ignore it if instances == 1
[16:37:23] <legoktm>	 majavah: is there an associated grafana instance or just for alerts?
[16:38:05] <majavah>	 legoktm: it's available in https://grafana-labs.wikimedia.org too
[16:38:54] <majavah>	 it's the "metricsinfra prometheus"
[16:45:30] <Spookreeeno>	 legoktm: if I can read openstack, there's enough quota left for a memory increase
[16:46:22] <Spookreeeno>	 Don't know if it can be changed live though
[17:31:52] <legoktm>	 I filed https://github.com/hound-search/hound/issues/410 upstream for prometheus metrics, we can probably add some of our own in though
[17:42:49] <legoktm>	 majavah: can this be used for Toolforge tools too?
[17:59:25] <majavah>	 legoktm: not really, metricsinfra is currently pretty much designed around cloud vps projects and I'm not expecting to change that any time soon
[18:00:19] <legoktm>	 Hmm, okay
[18:00:55] <legoktm>	 I've been wondering what if we just set up Prometheus/Grafana in a Toolforge tool itself
[18:02:03] <majavah>	 a toolforge tool prometheus cluster would probably by integrated with kubernetes, and I'd rather not mix it on the same prometheus instance that metricsinfra uses for cloud vps instances
[20:04:18] <legoktm>	 majavah: ok, I added https://libraryupgrader2.wmcloud.org/metrics - how do I go about getting it to alert if libup_runs doesn't increase within 24h?
[20:09:35] * legoktm is not going to try https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring#Adding_new_projects by myself
[20:43:21] <majavah>	 legoktm: I configured prometheus to scrape that, let's look at alerting tomorrow when it has stored some data and I've slept
[20:44:01] <majavah>	 https://prometheus.wmcloud.org/cloud/graph?g0.range_input=1h&g0.expr=libup_runs&g0.tab=0
[20:49:06] <legoktm>	 majavah: sounds good, thanks!