[06:01:10] I am going to restart db1123 to clear the memory usage alert [07:01:29] marostegui: morning [07:01:39] hey [07:05:42] marostegui: before doing next section, shall I merge my patch? [07:07:04] Amir1: yeah, go for it [07:07:14] I am going to do basically s8 entirely with replicas = ['db2079','db1099:3318','db1101:3318','db1104','db1111','db1114','db1116:3318','db1126','db1167','db1171:3318','db1172','db1177','db1178','dbstore1005:3318'] [07:07:16] but yeah, merge it [07:08:32] okay, let me modify your file and run dry [07:08:56] let me rebase [07:09:38] done [07:09:41] cool [07:09:53] Let me prepare the script [07:11:06] Amir1: done, can you give it a review? [07:11:14] sure [07:11:37] looks good [07:11:45] I am reviewing the dry-run [07:13:26] the dry run also looks good [07:13:33] and it handles well both masters (codfw and sanitarium master) [07:15:18] yeah and depools/repools correctly, logs also looked fine (I deleted them though). I think we should reduce the downtime. What do you think? [07:15:33] Yeah, not a big deal [07:16:07] yeah, I think we can run it now, did you start it? [07:16:15] yeah, it is now running [07:16:27] you can tail the log at /home/marostegui/git/software/dbtools/auto_schema/logs [07:16:29] \o/ [07:17:17] "Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 13 hosts with reason: Maintenance" :D [07:17:33] that's actually something we might need to change [07:17:48] Cause if a schema change takes 1 day, then once it arrives to the next host, it won't have downtime [07:18:55] do you want it to automatically double (or triple) downtime for hosts with replicas? [07:19:06] yeah, that could be a solution indeed [07:19:13] we can double it for now [07:20:22] cool. Add it to the list and prioritize them :D [07:21:00] Added [07:21:05] I will set some priorities later [08:11:08] I have added stuff to the SRE Meeting doc, please add your updates too [10:04:14] afk for coffee [10:05:02] jynus: do you recall the key words for that ticket we had about dumps accesing main traffic servers? [10:05:17] I have tried dumps, accessing, main... [10:05:19] But no luck XD [10:07:00] T138208 ? [10:07:01] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium - https://phabricator.wikimedia.org/T138208 [10:07:23] Yes that one!!! [10:07:25] Thanks [10:07:34] I don't know why I thought it had the word dumps and accessing on the title [10:21:49] Amir1: for when you are back, I would appreciate a review on the script, I am going for s3, so if you can cross check the hosts that'd be nice [10:22:03] No rush [10:32:10] marostegui: that new mariadb release model looks like a bit of a nightmare for us [10:32:29] yep it does [10:32:39] we'll see how that actually goes [10:33:44] marostegui: re: https://phabricator.wikimedia.org/T296930#7565753 .. codfw instances were depooled.. ? [10:33:50] i never even thought to check [10:34:46] Yeah, I depooled them [10:34:49] not a big deal, no worries [10:39:56] why would you do such a monstrous thing? [10:48:05] back [10:48:13] marostegui: sure [10:48:42] regarding mariadb release changes, I suggest to bump T239814 :D [10:48:42] T239814: Automate DB upgrades - https://phabricator.wikimedia.org/T239814 [10:50:49] marostegui: looks good, let's go :) [10:50:57] it's gonna be fun on s3 [10:51:38] yeah! [11:49:51] marostegui: core mulitiinstance doesn't have the grant file :D [11:50:03] https://www.irccloud.com/pastebin/FEV5NjrH/ [11:57:37] Amir1: Yeah, I think that's known [11:57:54] if you don't mind, I fix it [11:58:15] Sure, I think we need to rethink if we really want to have the grants file on the servers itself, but yeah, let's make it consistent for now [12:02:08] why role::wmcs::openstack::codfw1dev::db has core grants? ::profile::mariadb::grants::core [12:02:11] ugh [12:03:55] I change it to ::profile::mariadb::grants::cloudinfra [12:04:05] umm, that does not sound right either [12:05:02] as far as I know, clouddb2001-dev.codfw.wmnet has a mediawiki database (labtestwikitech.wikimedia.org aka labtestwiki) [12:05:07] yeah [12:05:08] it does [12:05:20] I can remove it for now [12:05:28] didn't move that to s6? [12:05:40] labswiki did [12:05:44] oh no that was wikitech itself [12:05:48] yeah [12:06:26] yeah, labswiki is wikitech and lives in s6, but labtestwiki lives on clouddb2001 [12:06:56] which remindes me I need to apply the chemical schema change there too [12:07:35] ah, it is there already [12:08:26] honestly, it will need a full drift report [12:08:55] yeah, I have been applied changes there for the last few years, but who knows what is missing [12:20:14] db1123: MariaDB sustained replica lag on s3 [12:21:49] that's an icinga issue, there's no lag [12:25:13] I have fixed that [12:25:21] it was the nrpe [14:51:04] afk for a bit [15:46:45] enjoying some nice pierogis ^^ [15:58:37] Amir1: don't make Manuel jealous [15:58:49] That was exactly why I wrote it :P [15:58:50] We have some frozen ones! [15:59:01] I mean homemade but frozen [15:59:02] not as good as fresh ones :D [15:59:29] Yeah, but you can not eat them with 22C outside [15:59:32] And I do [15:59:54] ugh. There is still snow here 😭 [16:26:22] https://phabricator.wikimedia.org/T277354#7566788 [16:26:22] :) [16:26:27] Now, I am going offline! [16:28:01] Nice! [18:56:28] buys one of https://github.com/alevchuk/vim-clutch for Emperor [21:56:47] I could use it for M-S :)