[00:02:32] are we ready for multi-DC full deployment? [00:02:56] James_F: in addition to php74 and cpu freq, we also have mainstash: redis > x2-mainstashdb, and tokenstore redis > mcrouter, and chronoprotector redis > mcrouter. ref T212129, T267581 [00:02:57] T267581: Phase out "redis_sessions" cluster and away from memcached cluster - https://phabricator.wikimedia.org/T267581 [00:02:57] T212129: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 [00:03:05] and various other things at ref T302623 [00:03:05] T302623: FY2022-2023: Improve Backend Pageview Timing - https://phabricator.wikimedia.org/T302623 [00:03:44] https://gerrit.wikimedia.org/r/c/operations/puppet/+/827616 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/827617 ? [00:04:14] as I said at https://phabricator.wikimedia.org/T279664#8196965 , I want to just roll it out everywhere now [00:05:19] TimStarling: I think reducing x2 to master-only might help to reduce risk/confusion. [00:05:30] cdanis has a patch I just +1'ed [00:05:48] we could do a wmf-config hack first.. [00:06:48] there's a fair amount of debt there that we should clean up by making dbctl output the lbfactoryconf shape more closely since this etcd key is a mirror specifically for MW, so it doesn't realy make sense to have so much post-processing that is hard to test/verify. oh well.. anyway, I'll file a task for that for later. [00:07:20] chrono protectio should only affect primary dc though, so I guess fine. [00:07:30] I support trying it out thursdday / your today. [00:09:15] TimStarling: freel free to trout me for https://gerrit.wikimedia.org/r/c/RunningStat/+/815429/ [00:09:42] you're saying you would support stage 3/4 rollout today if I first do a config hack for x2? [00:10:06] the silver lining of that RunningStat patch is that (afaik) we currently only use it for in-proc or php-apcu, so not immediately affected by the php72/74 issue. [00:10:12] but still feels silly in retrospect [00:11:17] TimStarling: stage 3 without the hack is fine I think. Doing the hack means we exclude a source of issues. But on stage 3 we can also just fine out whether there are any issues in the first place. [00:11:44] it'd be an unknown at this point. we know quite a lot already. [00:13:02] note I'm traveling in 10h, effectively OOO your "today". [00:14:38] note that there's also the idea from https://phabricator.wikimedia.org/T306118#8177779 to do a production test of x2 replication failure [00:16:37] judging by your calendar you'll be working monday/tuesday? [00:17:31] ack [00:19:35] I'm fine with waiting for a day for cdanis's dbctl patch, then friday or monday we can revert to stage 0 and do a production test of replication breakage in codfw [00:20:02] or stage 1 I guess, i.e. try to break testwiki [00:20:55] then aim for stage 3/4 approximately tuesday my time [00:33:09] OK [10:34:20] TimStarling: Krinkle: I'll get the patch merged and deployed in my workday today [10:34:54] thanks [12:15:15] done