[08:52:18] godog: replacing a swift key is just update private puppet and rolling-restart frontends; and then puppet when it next runs on a host should re-template any files containing the credential, right? But that will mean anyone using the old key will be locked out until puppet runs on the relevant system. This is apropos T296767 ; so presumably will need some co-ordination (even if just finding the right query to use cumin to [08:52:18] run-puppet-agent [08:56:49] Emperor: ow :( yes that's correct though, in some places the key is duplicated though in the software-specific hiera so it'll need to be changed there too, post-restart things shouldn't start to fail right away IIRC as the auth tokens have an expiration [09:01:14] godog: YM there are places where the secret is copied around by hand? [09:01:40] [presumably those are the responsibility of whoever owns that software/repo?] [09:02:40] yes in the mw case there's a separate repo with the private settings [09:03:33] I think we'll need to brainstorm and coordinate a little [09:04:07] I can tell you how they use to do database password changes- if it helps [09:04:40] jynus: thank you, yes that'd be useful [09:04:46] +1 [09:04:54] not sure if it would apply, but the mw issues may be similar [09:05:19] they temporarily create 2 users with the same permissions [09:05:38] so as mw apps don't change instantly the user, they can connect with the 2 [09:05:51] once it has rolled forward, the old one is dropped [09:06:13] it may have to be done twice, if the same user name is wanted e.g. user -> user2 -> user [09:06:33] that way you have "all the time in the world" to do the migration [09:06:51] plus monitoring unexpected apps connecting to the old user for detection [09:07:13] e.g. logging old user connections [09:07:16] If we can avoid that much hassle, I'd be happier :) [09:07:34] just sharing it, in case it was helpful :-D [09:07:41] Oh, certainly, thanks :) [09:08:14] I think I'll update the phab ticket to say "well, changing the credential is easy, but..." and see if the submitter has views on how to co-ordinate the rollover [09:08:54] thanks, yeah it'd be nice to be able to use multiple users, currently not feasible/implemented for swift/mw/filebackend :| [09:09:03] Emperor: SGTM [09:09:16] no, they don't really use multiple users [09:09:37] it is just that as it takes a long time for a deploy to get transmitted [09:10:09] they do the user -> user2 deploy, and for minutes apps will continue using the user1 credentials [09:10:29] e.g. maintenance scripts, crons, uploads, encodes, jobs, etc [09:11:01] let me send you the ticket, and you can read - again, only in case it was useful [09:11:39] jynus: ok! thanks [09:11:53] https://phabricator.wikimedia.org/T201662 [09:12:23] that's the procedure for dbs- obviously it may not apply for swift- but something may be commond due to mw [09:14:17] yeah the additional complication here would be adding the new user to all ACLs to swift, doable but we can likely avoid it I think due to the auth tokens expiration [09:15:14] that's great :-), less overhead! [09:15:19] I'll be fighti^Wworking with puppet this morning, will be reading IRC later [09:29:11] jynus: quick question, db1139:3311 and db1140:3311 not using log-slave-updates is intended? [09:29:24] they don't have binary logs :-) [09:29:40] so not much of a difference [09:30:21] cool! [09:30:29] they will in the future? [09:30:33] so we can save them? [09:31:24] so I don't think anyone decided "they shouldn't have binlogs"- or at least not me- although it helps with performance (they are much more limited than core hosts) [09:31:46] the plan with binlogs was to store them at dbprovs/backup* hosts directly from the master [09:31:49] yes, I am not blaming anyone, just wanted to ask about it [09:32:19] I think thats probably one reason why they are not there- they should never be replicating to anything [09:32:27] good! [09:32:32] thanks! [09:32:36] but it is not like, a super hard decision [09:32:55] I think they juts inherited it from whatever was at the dbstores [09:33:16] I guess your question would be: can we set that it up? [09:33:32] assuming that works for no binlogs, I don't think it is a problem? [09:33:44] no, my question was if it is intended that they are not having log-slave-updates, cause I just saw that in orchestrator. You answered it already :) [09:34:26] they would fail as masters anyway because no binlogs [09:35:43] binlog backups is something I would like to rethink, but I ran into the same issues than you with recovery after a master switch :-/ [09:36:56] so definitely nothings there is super fixed in stone. I am not trying to be defensive, just not sure about what is the right way :-( [09:38:07] no problem, you answered my question. thanks! [09:41:26] actually, your question was super useful! now that you pointed it out- you gave me another answer on "how to visually identify backup sources" They don't have the ⏩ icon on orchestrator! that's super useful to me! [09:42:16] yes, that's how i noticed [12:01:47] going for lunch, will crank up codfw media backups later [15:00:16] I'm querying hadoop to collect time to run Special:RecentChanges so we can put a meaning max time for the queries. [15:00:39] the query has been finished but I think it's an hour that it's trying to move 500000 rows to my file [15:00:51] *meaningful [15:12:29] I am currently backing up at 100MiB/s, which is less than 10% of the codfw swift cluster, with no observable latency changes [16:59:48] > 500,000 rows selected (10916.84 seconds)