[10:47:43] I am going to do a switchover on es1, es2 and es3, it should be a noop, but just mentioning here in case [10:47:47] It will be done in around 1h [10:47:53] Once I finish the warm up [14:54:38] jynus: thank you, but I think no need to rush it 5 minutes before the actual start [14:54:48] unless it's literally one command for you [14:55:03] yes, I just want to make sure all people are in agreement [14:55:13] e.g. not sure if manuel is still around [14:55:25] is there a channel for the maintenance? [14:56:13] we may also want to downtime some dbs for probable lag [14:56:36] I'm around yep [14:56:43] sort of [14:56:58] I can take care, just want to make sure not 2 people try to do the same at the same time [14:57:31] my plan was to stop the multi-instance replication when going read only/just before maintenance [14:57:49] I am not sure which channel but I will randomly say #wikimedia-releng [14:57:53] *the replication for m3 on the multiitince hosts [14:57:54] I am about to join a meetup for that [14:58:52] already forwarding the message [14:59:07] jynus: sure [14:59:12] I'm here if something happens [14:59:36] I am downtiming m3 lag for 12 hours too [15:01:51] we are using -operations for the maintenance [15:01:55] just joined the meetup [15:02:00] forwarding the plan [15:02:35] yes, that sounds good to us. so we can tell you when to go readonly? [15:03:17] as soon as you go read only, I will run 2 commands (5 seconds) [15:03:26] and you can continue as normal [15:04:19] what's the ticket? [15:35:47] I stopped replication on db1117, db2078 [15:37:13] mutante: I still go AFK now, as I guess things will take a bit [15:37:32] feel free to call me if I don't answer- I will be around [15:37:46] I am still around for another 30 minutes or so [15:37:54] I may check later if things are finished [15:38:07] jynus: ok, thank you [15:38:13] and restart replication- otherwise manuel will do it on his morning [15:38:18] it's right in the middle of it [15:38:18] :) [15:38:20] thanks jynus [15:38:42] if possible send verbosely on phab or on email if we can or should not restart it [15:38:44] they are checking "storage status" right now [15:39:11] if we are not around [15:40:06] (I think it would be ok to leave it disabled for a while, in case there is some issue not noticed for some hours) [15:40:15] e.g. on codfw [15:41:45] ok! ack [15:41:59] they are in the middle of patching / .sql files [15:42:06] will let you know asap [15:42:34] Applying patch ... to host m3-master... a bunch of them. schema change [15:42:57] eh, maybe not that. but patches [15:47:50] (and sorry to be so verbose, but I like for everybody to be 100% aware of current status), rather than to have missunderstandings with too many hands at the same time 0:-) [16:30:06] marostegui: so. did we miss out? or still here [16:30:16] we just said we'd re-enable it again at this point [16:30:51] but no harm either to keep it off..and we realize you are out now.. so dont worry about it [16:33:51] I'm afk [16:33:58] but I can enable it later [16:35:34] ok, thank you. we are fine with either [16:47:26] I've started db1117 replication, shoult be recovering lag now [16:47:48] I will extend db2078 for 24 hours and we can decide when to reenable [16:47:53] *downtime [16:52:23] No rush on restarting it, but up to anyone involved to decide [16:52:43] (I think it is not bad to leave if off for the night, just in case, though) [16:59:06] thank you jynus ! ACK :) [16:59:23] that sounds perfect to me [16:59:43] db1117 is caught up [16:59:56] cool [17:02:20] jynus: yea, forwarded. we like it:) good night then [17:02:25] thanks again [20:02:43] there seems to be a minor error: https://phabricator.wikimedia.org/T310742 probably an easy fix, but to give more data points re: restarting replication on codfw [21:03:48] I won't touch it then jynus and mutante until you give green light