[06:01:43] I am going to start repooling A1 codfw hsots [06:01:58] hosts [06:09:19] Can someone check /usr/local/lib/nagios/plugins/check_mariadb ? think it's stopped working [06:09:32] ~# /usr/local/lib/nagios/plugins/check_mariadb [06:09:32] OK mariadb_running connected [06:09:40] I don't think that is what is supposed to be shown [06:09:46] I'd appreciate if someone can take care of checking it [06:26:02] I am going to start switching over pc2 master [07:14:01] All database hosts for B5 network operation today are depooled [07:53:13] right the problem isn't check_mariadb but db-check-health [07:53:44] arnaudb: I think this is part of the on-going thing? https://phabricator.wikimedia.org/P55649 [07:54:05] absolutely [07:54:26] can you please take care of that? I really don't have any more space for things in my plate at the moment [07:54:42] please retry your command [07:54:55] ah still the same [07:54:55] it is broken :) [07:54:56] on it [07:55:12] that's possibly broken everywhere or on lots of hosts [07:55:18] yep [07:55:31] I have been using db2136 for testing [07:56:31] all good now, morning caveat I upgraded the wrong package! :D [07:56:55] great, I can see it now and also icinga showing all green [07:57:09] make sure it is spread everywhere! [07:58:15] it's one of today's todo yep! [11:24:17] marostegui: okay if I do a schema change on the old s2 master? [11:24:28] yep [11:24:33] it is depooled [11:24:37] awesome [11:24:41] leave it depooled please [11:24:51] as that host is part of the network operation that starts at 16:00 UTC [11:25:03] sure [11:25:59] actually, let me know when done, so I can upgrade its kerbel [11:26:00] kernel [11:26:44] sure [11:51:52] marostegui: I'm done [11:51:59] great thanks [11:53:21] shit I just stopped s2 master [11:53:26] it is back now [11:53:31] mariadb was stopped for a few seconds [12:27:28] just a few seconds? We are not give out stickers for a few seconds [12:27:33] *giving [12:27:43] :) [12:28:02] i have too many stickers already, no need for more! [12:28:16] marostegui: I actually wanted to say some changes are happening regarding rest base sunset which is increasing read on PC [12:28:21] one just got deployed [12:28:36] i switched over pc2 master today too right before s2 [12:28:41] https://phabricator.wikimedia.org/T344945#9488119 [12:31:03] I wasn't aware [12:31:04] thanks [12:32:05] writes shouldn't increase which has been usually the pain point of PC [16:45:30] urandom: restbase2013 seems unhappy for some reason [16:45:49] I can ping it but some alerts firing for connection problems [16:46:16] topranks: probably ok...that was has been decommissioned [16:46:25] you mean cassandra alerts? [16:46:27] ah ok [16:46:33] alerts are like ' Expecting active but unit cassandra-a is inactive' [16:47:07] oh... my bad, I had that under planned maintenance, but it looks like it expired [16:47:20] 2014 too, but that downtime is still active, wth? [16:48:04] no I undowntimed everything in the rack a few mins ago after we confirmed connectivity was ok [16:48:11] oh [16:48:13] so I messed up I think [16:48:14] sorry [16:48:19] no worries [16:48:26] mystery solved! [16:48:28] didn't expire - I didn't consider that case [16:48:30] sorry [16:48:45] no no, it's fine [16:49:00] it's a lot of plates to keep in the air at once [16:49:09] yeah I guess we just downtime again - can I leave that to you as you know what time is appropriate? [16:49:16] I already did [16:49:21] awesome - thanks! [17:45:42] I am going to start repooling all the hosts involved in the network maintenance as everything looks stable [17:47:25] marostegui: great thanks for the help getting it across the line [17:47:37] no worries [17:48:11] next set of moves starts Feb 6th I'll drop a line about it [17:49:24] I will switch back pc2 on monday [17:49:31] ok [17:49:49] I need to deploy MW for that and there's the train now, so.. [17:49:56] urandom: fwiw I spoke to volans re downtime, next time I will set a more precise time when downtiming the rack and let it expire "naturally" [17:50:08] which will leave any existing downtimes in place and avoid the issue I caused today [17:50:17] topranks: auh, yeah, that would do it [17:51:16] it's not ideal but about the only option