[07:28:06] Amir1: reads to revision_actor_temp are stopped everywhere too, right? [07:28:28] I want to rename it on one s1 and s8 host [07:31:25] marostegui: yes, it should have been for a week now but there might be something in mw not respecting the config (highly unlikely but not impossible) [07:38:27] ah cool [07:38:32] I will keep you posted [07:38:56] will do it today and will give it till Monday [07:48:10] SGTM [07:56:50] marostegui: I'm revoking alter from wikiadmin everywhere now [08:04:07] oh sweet [08:04:41] with that I think we can resolve that task [08:12:57] done now, except es. We have a tiny problem there. It's on 10.64 and 10.192 instead of blank 10. I know the former is better but consistency :D [08:18:43] aah and it's not all of es either, only es4 and es5 [08:19:45] * Amir1 dies a little inside [08:35:45] I pretend that I didn't see this mess and just revoked alter from 10.192.% and 10.64.% there [08:36:43] (specially since 10.% is not there at all, you need to create the user and grant every right needed like on sys and p_s which gets funnier because es3 for example doesn't have grant on p_s but es4 has) [08:38:11] Amir1: Not sure I get what you mean [08:38:21] Amir1: You mean that there's still wikiadmin@10.% with alter? [08:38:57] no, I'm mostly talking about es4 and es3 having different grants and users [08:39:06] ah yeah.. [08:39:10] I revoked it everywhere regardless of target [08:40:00] coool [08:40:27] I'm rerunning omg.py to update the report [08:40:39] excellent [08:50:08] marostegui: what should we do about T138208? Shall we resolve it? [08:50:08] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium - https://phabricator.wikimedia.org/T138208 [08:50:48] Amir1: we can close it and reopen if we see it again [08:51:00] SGTM [08:58:25] so es5 just broke the assumption that "backups will take less than 24h to run": https://github.com/wikimedia/operations-software-wmfbackups/blob/7be71908bc7325ca9b5060cdcb657aa1c6b137aa/wmfbackups/BackupStatistics.py#L84 [08:59:29] I wonder if I should put a week there, or just drop fully the time constraint [08:59:31] time to make es6!! :P [09:17:37] jynus: maybe 5 days? (like a working week) [09:17:52] marostegui: too late :-) [09:18:07] xdddd [09:18:40] I ended up saying 7 days- I calculated that longest es backup shoult take 3-4 days + some buffer days for "time until a human notices an error" [09:18:54] yeah, sounds good! [09:20:31] this was only for the backup statistics, so it was not a big deal (backup completed well, backup statistics failed to be gathered) [09:42:41] ms-be2054 reimaged OK, ms-be2055 playing "trash the filesystem in the installer" [09:52:16] are there any scheduled jobs for DBs or backups which would prohibit a reboot of cumin2002 later the day? [09:53:08] none for me [11:09:27] sigh. ms-be2055 is on the 6th attempt to get the SSDs to appear as a/b [11:17:09] 8th.. [11:26:48] :( [11:27:44] Emperor: a-b testing? [11:45:11] FYI (dbas): T308120 although my guess there are no short plans to embrace ipv6 for dbs [11:45:12] T308120: Better support of ipv6 for transfer.py file transfer tool - https://phabricator.wikimedia.org/T308120 [11:46:13] jynus: 👍 [11:46:47] the fix will be easy- just opening the firewall on both stacks [11:55:28] cumin2002 has been rebooted, now what would be a good time with no DB/backup scheduled activity to reboot cumin1001? [11:56:12] moritzm: same answer for me, but someone else should comment about potential ongoing db maintenance [11:59:37] moritzm: I don't have anything for today but might start schema changes next week [11:59:57] moritzm: let me check my stuff on cumin1001 [12:00:32] moritzm: sorry, currently mid-reimage on cumin1001 which is stuck on my trying to get ms-be2055 to boot up with its drives in enough of a sensible state that puppet will not fall over in a heap [12:01:25] 11th reboot in progress [12:01:36] :( [12:02:02] ah, true, ongoing reimages would be affected, too [12:04:31] 11th reboot was the charm! puppet has run OK, but the playbook is going to wait for ~10m at this point before noticing [12:04:34] yeah, I wouldn't expect this to be possible today, a lot of people have screens etc there [12:05:19] so my plan would rather to pick a time for next week and send a headsup, so if there's a window next Wednesday e.g. that works for anyone here, then I'd propose that one on the ops mailing list [12:05:55] +1 for me [12:06:10] moritzm: I'm on leave Thu/Fri/Mon so any of those is fine with me ;-) [12:06:56] Next Wed fine too as long as I know (and remember to move to cumin200x for reimages) [12:07:27] * kormat mourns all the state she just threw away [12:07:45] ok, tentatively I'd pick next Wed 7:00 UTC, then [12:08:09] I'll wait a little for further comments and otherwise going to send a mail later [12:08:16] moritzm: 👍 [12:08:30] weds is a good day, from a DBA perspective. [12:08:37] no backups or primary switchovers [12:10:39] 07:00 is early enough I'm unlikely to be awake and stupidly starting a reimage because I forgot too... [12:11:20] Emperor: besides, it's not like rebooting a cumin host in the middle is likely to make the reinstall substantially _less_ likely to succeed. [12:12:21] xD [12:12:38] haha :-) [12:14:52] 😿 [12:26:54] jynus: At some point (not today, not next week) I would like to migrate db2078 (misc backups) to 10.6 to make sure everything keeps working fine with logical backups (which I think it'll be fine) [12:27:04] Would you be ok with that migration? [12:27:46] sure [12:28:16] let's sync on the day when near, but other than tuesday should be ok [12:28:23] sounds good [12:28:40] I belive m2 take longer to backup- up to midday due to otrs [12:29:56] do you preview large changes on file format (for logical or later for xtrabackup)? [12:30:16] No, I think it will be all fine in terms on 10.4 -> 10.6 compatibility [12:30:52] I think we should add an "in theory" there, just to be sure :-D [12:32:18] heh [12:33:31] there is one think I also need to talk to you regarding dbs, but would be for next Q [12:34:20] I think with new space we will free some space for database backups, so I will ask you the best way to reuse it (more retention of what? more frequency of backups?) [12:35:02] but will wait first to confirm who much space I will have available [12:35:41] sounds good yep