[06:07:37] <wm-bot>	 !log tools.masto-collab <legoktm> Updated from 0b1e1a7 to ae62c97
[06:07:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.masto-collab/SAL
[10:29:54] <arturo>	 taavi: FYI https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/41
[10:53:34] <arturo>	 godog: 
[10:54:47] <arturo>	 I just opened T335943 and would appreciate your quick eyeball on it for any quick hints as to where start looking :-P
[10:54:48] <stashbot>	 T335943: prometheus-openstack-exporter: collected data shows regular null intervals - https://phabricator.wikimedia.org/T335943
[10:59:06] <dcaro>	 arturo: did you check the prometheus logs in cloudmetrics? if it's timing out to scrape should show there I think
[10:59:15] <godog>	 arturo: yeah I think you got it right, also what dcaro said
[10:59:34] <godog>	 IIRC default scrape interval is one minute, and prometheus gives up if the scrape takes more than that I think
[11:00:25] <godog>	 accessing the web interface via ssh tunnel will show you the errors too if any
[11:01:39] <arturo>	 I can't seem to find any relevant logs
[11:02:59] <dcaro>	 from default prometheus config it seems the timeout is 10s xd
[11:03:01] <dcaro>	 https://www.irccloud.com/pastebin/2upSFTBK/
[11:03:40] <arturo>	 ok, trying the web console. By default scrape logs are not active on the journal/system side apparently 
[11:04:31] <godog>	 ok
[11:05:35] <godog>	 dcaro: interesting, yeah I can't recall if that's the timeout e.g. to wait for an answer or the reply is there and it takes a long time for the metrics
[11:05:52] <godog>	 at any rate >60s for metrics is definitely suspicious with the scrape interval
[11:06:58] <dcaro>	 we seem to set it to 120s  for openstack, and scrape every 15m
[11:07:16] <dcaro>	 https://www.irccloud.com/pastebin/R2epJR87/
[11:07:58] <godog>	 interesting
[11:08:10] <godog>	 I need to go to lunch now, will read later and take another look
[11:08:22] <arturo>	 thanks!
[11:26:46] <arturo>	 also the scrape is configured for http://openstack.eqiad1.wikimediacloud.org:12345/metrics
[11:34:08] <arturo>	 oh, so if the scrape is each 15m the gap in the data may be the scrape interval itself
[11:36:05] <arturo>	 ok, now discovering this https://gerrit.wikimedia.org/r/c/operations/puppet/+/802434
[11:55:05] <dcaro>	 that rings a bell yes, iirc openstack get too overloaded and unstable by that exporter (or was getting)
[11:55:50] <dcaro>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/802956 the revert
[11:56:15] <dcaro>	 as far as I remember the revert helped, maybe a.ndrewbogott remembers more
[14:03:16] <godog>	 arturo: did the revert of the revert help?
[14:10:47] <arturo>	 godog: let me check
[14:28:18] <godog>	 looks like it!
[14:28:28] <godog>	 it == it helped
[14:36:37] <arturo>	 yes!
[14:38:04] <godog>	 \o/
[14:39:41] <arturo>	 godog: thanks for the assistance, I'll keep an eye on openstack to see what is the impact of the new scrape interval
[14:40:20] <godog>	 sure np! yeah hopefully it isn't too bad/expensive
[15:11:20] <dcaro>	 !log metricsinfra rebooting metricsinfra-prometheus-2 as it was unresponsive
[15:11:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL
[22:49:27] <andrewbogott>	 !log removed fullstack-* puppet reports on puppetmaster-02.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud and cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs to free up disk space
[22:49:28] <stashbot>	 andrewbogott: Unknown project "removed"
[22:49:40] <andrewbogott>	 !log admin removed fullstack-* puppet reports on puppetmaster-02.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud and cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs to free up disk space
[22:49:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL