[09:13:41] godog: have you been working on ms-be2068? It's sending a lot of cron-spam [09:16:56] Emperor: mmhh I fixed its provisionig earlier today, I'll take a look [09:35:16] Emperor: not sure exactly what was up but I rebooted all the newly provisioned hosts, LGTM now [09:35:20] thanks for the heads up [09:35:51] Thanks for fixing :) [11:23:25] I have a puzzle for any DBAs/database gurus around, we have a mysql database (openstack one), it has galera running in multimaster mode (currently 2 nodes), though only one of them is being used. So it's been some days that the database misbehaves from time to time, as what it seems stopping to write things to disk (not completely, just slow), and ends up choking the connections [11:23:43] see https://grafana-rw.wikimedia.org/d/tN1aK6MGk/cloudcontrol-mysql?orgId=1&from=now-2d&to=now&var-server=cloudcontrol1005&var-port=9104&forceLogin=true for graphs (not all data is there, sorry) [11:23:57] any ideas or things to investigate more in depth? [11:26:51] dcaro: from a pure prometheus monitoring perspective, AFAICS those graphs have huge gaps (hours) so they won't be very useful for debugging [11:30:01] yes, I know, I'm trying to solve that in the future, those gaps wheere times when everything was "calm", the three big chunks right before the gaps are data when the machine started struggling, you can see there things like the inoodb ios going down, the threads spiking, and the connection problems spiking too [11:30:29] I'm thinking that the innodb dirty pages might be related too (can't flush to disk), but the disk seems kind of idle [12:03:47] upgrading s8 codfw primary to bullseye, stay tuned