[14:07:14] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 7622 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [14:12:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 4292 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [14:17:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1443 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [14:19:54] howdy fr-tech! [14:22:14] RECOVERY - check_mysql on frdb1002 is OK: Uptime: 4292338 Threads: 18 Questions: 540936764 Slow queries: 2456 Opens: 3005861497 Flush tables: 1 Open tables: 200 Queries per second avg: 126.023 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [15:09:57] pcoombe: not important but when cancelling midflow during a paypal donation I got sent back to an empty page on donate-wiki. https://donate.wikimedia.org/wiki/Ways_to_Give/en-gb?token=EC-4MH53298BB402154S [15:10:26] deja vu? [15:14:36] I feel like you added a redirect last year for the en-gb error page [15:14:48] but this one is Ways_to_Give [15:52:38] fr-tech is someone sending EOY emails already? I can see an entry in the dashboard https://civicrm.wikimedia.org/civicrm/eoy [16:26:56] jgleeson|reading: looks like those were sent on Dec 30, maybe by Donor Relations? [16:30:01] oh thanks ejegg I wasn't sure if that activity date was the date of send [16:37:32] oh hey just noticed the chat here [16:40:12] ejegg: you ready to hit the button? [16:40:27] jgleeson: are you working today? [16:41:24] jgleeson: looking at backscroll... Eileen mentioned she might try internal tests [16:43:01] yes dstrine [16:43:47] dstrine: I'll be ready in 15-20 min when my wife's done her class [16:44:16] ejegg: do we wanna run the calculate job now in advance or maybe just work through it all on the call? [16:44:47] ok I don't want to bug anyone who is trying to be offline. but If ejegg doesn't need help with the EOY email, I think we need to edit the logic around the apple pay name. I think we need to totally cut out billing name and just pull the shipping name [16:44:58] dstrine: it's a two staged process, we run a job to prep the sends and then once that completes we run the sends (or can do after some initial testing) [16:45:20] oh ok dstrine i can look at that [16:45:38] also I need help assembling the the time line and starting a conversation with adyen. I'll start and email [16:45:56] jgleeson: ok thanks. I'm just getting back online. Let me edit that task first [16:48:45] jgleeson: let's do it all on the call [16:48:47] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM, 10FR-Adyen: dedupe support for apple pay weird names - https://phabricator.wikimedia.org/T298547 (10DStrine) [16:48:57] sounds good ejegg [16:50:26] 10Fundraising Sprint Xenomorph Petting Zoo, 10Fundraising-Backlog, 10fundraising sprint Wireless Zipline, 10fr-donorservices, 10MW-1.38-notes (1.38.0-wmf.13; 2021-12-13): applepay donations TY email doesn’t have the donor's name - https://phabricator.wikimedia.org/T296881 (10DStrine) I've split off @Eile... [16:51:28] ok jgleeson I've updated the task https://phabricator.wikimedia.org/T296881 and split out Eileen's comment into another task. Her idea sounds useful regardless of when we can do to clean up adyen. [16:52:52] reading dstrine [16:54:31] Current location??? [16:54:36] how on earth.. [16:56:23] jgleeson: sorry I don't understand. [16:56:38] what's up with current location? [16:57:19] https://phabricator.wikimedia.org/T296881#7591744 [16:57:34] https://civicrm.wikimedia.org/civicrm/contact/view?reset=1&cid=54586471 [16:57:57] jgleeson: oh yeah lol [16:58:01] we're getting all kinds of junk [16:59:00] jgleeson: now that I'm thinking about this. what is the quality of the documentation you got from adyen and apple? would it hurt to review any of that? or does it not address this issue? [16:59:02] dstrine: we can do what you're suggesting but do we know what proportion of donors are being affected by this glitch? Switching everything over to shipping address is a pretty wholesale change [16:59:47] jgleeson: there is evidence that bad names are trickling in after our fix [16:59:56] DR is only getting emails for some of them [17:00:11] once we send this EOY email we'll get a wave of them for sure and we'll have a better idea [17:00:17] also do we wanna have a fallback so that if there isn't a shipping address contact name or last name on shipping, then we revert to the billing name [17:00:23] The other thing I was wondering is have we logged what kind of data is in the shipping fields? [17:00:40] If you are nervous about this maybe get the patch ready to go out and we'll start and email with adyen? [17:01:15] XenoRyet: good point, is there any way to investigate that without messing with a donation? [17:02:28] I'd just hate to make the switch and find out it's just as bad or worse. It logically should be better, but you never know. [17:03:44] XenoRyet: and jgleeson I just started an email thread with fr-tech and evelyn to get the timeline straight and start a conversation with adyen [17:03:59] Sounds good [17:04:33] I'm gonna step away from the keyboard for a while, but I'm around so someone text me if something goes pear-shaped and I can help. [17:05:01] ejegg: so are you starting the process? a few of us just happened to join the call and are hanging out [17:05:09] I just want to know that the process has started [17:05:42] XenoRyet: wanna hop on the call? [17:06:09] be right there [17:22:54] (03PS1) 10Jgleeson: Adyen Checkout: Switch ApplePay logic to use shipping contact by default [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/751475 (https://phabricator.wikimedia.org/T296881) [17:25:30] (03CR) 10jerkins-bot: [V: 04-1] Adyen Checkout: Switch ApplePay logic to use shipping contact by default [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/751475 (https://phabricator.wikimedia.org/T296881) (owner: 10Jgleeson) [17:43:05] (03PS2) 10Jgleeson: Adyen Checkout: Switch ApplePay logic to use shipping contact by default [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/751475 (https://phabricator.wikimedia.org/T296881) [18:03:54] ejegg: has it been 15 minutes? [18:04:19] dstrine: it just finished [18:04:24] logs look good [18:05:28] took ~18m [18:06:42] gives us a likely run time of ~40h to complete the 654k [18:06:47] <2 days [18:07:08] jgleeson: ejegg ok so are you turning it on full throttle? [18:07:35] ah but it'll be a little longer than that with the gaps [18:09:20] jgleeson fr-tech didn't we say it has to take less than 15 min? what's the frequency again? [18:09:40] yeah AndyRussG just looking at that now, it's gonna run over I think if it was 18 [18:09:52] so maybe we should turn down the frequency? [18:10:22] it has a thing that'll prevent them from running concurrently? [18:12:54] I can't see anything in the code stopping two jobs picking up the same contact [18:13:39] we could keep the schedule and just reduce the limits to 4k per patch fr-tech? [18:13:47] hmmm also an option [18:13:56] moe straightforward change WRT the jobs [18:13:59] more* [18:14:42] jgleeson: what I meant is that, doesn't process-control have a thing (always active, or activatable?) that prevents a job from starting if the previous instantiation of it is still running? [18:15:17] oh the lockfile ? [18:15:44] i think in this case the issue is that there's two separate jobs AndyRussG [18:15:50] ahh right jgleeson [18:16:10] and one could run while the other is looping through it's results [18:16:26] ejegg: felt if this passed the early test then we are good to go full throttle ... ejegg you still there? [18:16:29] I thought we might have be marking them as being processed [18:19:05] dstrine: ^ there might be an issue with them running too long on the current schedule? [18:19:34] ejegg: ^^ any feedback? [18:19:50] dstrine: sorry, got called away, back now and looking at the timing [18:22:30] ejegg: jgleeson: we could also switch to 30-min runs to start, and then have a look at how long they're taking on average? [18:23:14] to do that we could just enable the schedule on one job [18:23:26] which would run two every hour [18:23:50] sure AndyRussG jgleeson , even at every 30 min we will get it done in about 3 days [18:24:13] jgleeson: I think we still want to alternate between MXes [18:24:33] so I'll just edit those schedules to run one at 4 past and the other at 34 past [18:24:37] and increase the timeouts [18:25:13] ejegg: other option suggested before by jgleeson could also be to reduce the batch size [18:25:14] sounds good ejegg [18:25:24] AndyRussG: I think your idea is safer [18:25:40] at least for the first bunch [18:25:44] AndyRussG: OK, yeah, let's do that [18:25:53] so does 3000 sound good? [18:26:20] 3000 also feels safe [18:26:29] ahhhh :) [18:26:58] I guess either way gives us sufficient wiggle room [18:27:22] I'd like more data since our first one overrun the expected time [18:27:43] whatever is easier for you to do ejegg [18:27:44] ok, just pushing the new batch size but not enabling the schedule yet [18:28:29] So I'll run another manual run with the new batch size [18:28:35] this time with job _two [18:28:55] see you in 11 minutes [18:28:57] :) [18:34:51] if someone could like just tow New Zealand a few thousand km east or west, that would help eileen's schedule [18:37:38] if any medium-sized islands or continents get in they way they could scoot over too i guess [18:42:39] ok has the new test completed? [18:43:06] lol AndyRussG [18:44:16] still running fr-tech dstrine [18:44:28] and that's with 3000 recipients [18:44:48] so it looks like we've got a slight scheduling issue work out [18:45:46] running for 16m so far but sending 60% of the previous run which took 18m .. [18:48:29] argh it's overrun the previous [18:49:03] AndyRussG: the oracle is strong with you. weekend support indeed [18:51:08] jgleeson: ahhh sorry heheh [18:51:30] jgleeson: so maybe run time is not linearly dependent on batch size? [18:55:05] looks that way AndyRussG. this latest job (which is still running!) is using the alternative MX though so that might be a factor [18:55:34] hmmm right [19:05:36] finished [19:05:57] 33 minutes [19:05:59] :| [19:06:33] ooof [19:06:49] I wonder if there's somehow some db warmup time? [19:06:58] also if there are other queries happening at the same time? [19:07:26] (gotta step away from the keyboard for 10-15 min, back in a bit!) [19:08:53] ejegg: wanna run job one with 3k recipients so we can get a better comparison ? [19:10:26] AndyRussG: I was checking the processlist on mysql and there were a few queries that had been running for 15 minutes [19:12:30] We could even run each on only an hourly schedule for a few hours and then evaluate... just a thought [19:12:41] ah weird, ok, I'll run job 1 [19:13:01] right, could totally have to do with other processes happening at the same time [19:14:10] we could turn off the donations qc, maybe [19:14:11] this query looks related to ours [19:14:28] oh, one of the EOY ones [19:15:05] ya [19:15:30] and it was the same query running three times from the same user [19:15:41] ah wait, that's not user 1 tho [19:15:50] let's see who that user ID is [19:16:24] user id was 1982 [19:16:41] oops, looks like wfan [19:16:51] i'mma kill those queries [19:17:10] WFan: are you running a search or some reports right now? [19:17:36] Ha I just login and want to query table [19:18:05] ah, someone else is doing a dedupe [19:19:19] ejegg: looks quiet now, wanna run job one? [19:19:33] it's running! [19:19:40] ah nice [19:21:38] So the ideal would be to run all the rollup queries on the replica then only run the update on the primary [19:22:02] but with the rpow extension, as soon as we do one update it will run all the subsequent queries on the primary [19:22:10] It would be really nice to be able to reset that [19:22:31] or have query hints to send to one or the other db like mediawiki does [19:27:33] ok, that last run on MX 1 took 11 minutes [19:27:43] trying MX 2 again [19:32:40] OK, I think we need to add a time limit option to the send job [19:33:25] since the time per email can differ wildly [19:33:37] and we don't want to risk running the two simultaneously [19:33:47] and process-control can't actually kill the job [19:39:37] (03PS1) 10Ejegg: Add timeLimit option for EOYEmail.Send [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751489 [19:39:54] fr-tech does that look good for a time limit? ^^^^ [19:40:01] just doing a quick smoke test here [19:41:34] ejegg sounds very reasonable... I have to head out to kid pick-up in a bit but can do CR and testing support after [19:41:43] oh dang, docker doesn't want to start here [19:41:56] failed to create endpoint fundraising-dev_database_1 on network fundraising-dev_default: failed to add the host (vethef62da7) <=> sandbox (veth72625cc) pair [19:42:28] interfaces: operation not supported [19:43:09] testing with non-docker [19:49:06] k, works fine with non-docker [19:49:57] ejegg: does civi api magic solve the getters and setters? [19:50:07] jgleeson: yep, civi magic [19:50:20] resolve* [19:50:20] (03PS2) 10Ejegg: Add timeLimit option for EOYEmail.Send [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751489 [19:50:48] in AbstractAction: [19:50:49] use \Civi\Schema\Traits\MagicGetterSetterTrait; [19:51:20] ha nice [19:51:25] actually called magic [19:51:44] php magic methods I guess [19:51:54] cool, looks good to me ejegg [19:52:21] drush --user=1 -v cvapi EOYEmail.Send timeLimit=10 version=4 [19:52:26] the time is in seconds right? [19:53:57] yep, in seconds [19:54:13] so I tested it by putting a breakpoint and letting it time out [19:54:33] right after setting initialTime [19:54:52] since I only had 1 example to send to in my local non-docker db [19:54:55] ejegg: with docker, try docker-compose down then up agai [19:55:04] n? [19:55:36] sounds like an issue with docker itself, in the network mgmt part [19:56:06] thanks AndyRussG [19:56:57] ejegg: ah I see [19:57:10] huh, even after down, up gives me a ton of [19:57:11] failed to add the host (veth56155b6) <=> sandbox (veth5391c6a) pair interfaces: operation not supported [19:57:49] googling [19:58:18] ah, perhaps I just need a reboot [19:58:26] ejegg: try docker network ls [19:58:40] see if you have a duplicate network running [19:58:54] ejegg system update recently? something installed another docker as a dependency via pip? some weird lock change in your local docker setup? [19:58:56] or maybe just delete the existing network [19:59:01] and reup [19:59:11] ah also what jgleeson said :) [19:59:31] jgleeson: I tried removing fundraising-dev_default and up -d again [19:59:34] and same errors [19:59:50] serverfault suggests a reboot [19:59:55] ah [20:00:11] think I might not have rebooted since last kernel update [20:00:32] back soon [20:04:28] ok, that worked jgleeson AndyRussG , docker is now happy [20:04:30] hi eileen ! [20:05:06] eileen: timing seems to be a lot more variable on a system with load, so we're thinking of adding a timeLimit option: https://gerrit.wikimedia.org/r/751489 [20:05:41] unfortunately the timeout at the process-control level can't actually kill jobs running under drush [20:05:48] ejegg: hi - looking [20:05:49] I think because of the user change [20:06:00] also, happy new year! [20:06:18] happy new year [20:06:43] so the jobs have started ok but are clashing? [20:06:52] not quite [20:07:09] so far jobs have only been run one at a time [20:07:17] the issue is they're running slower than expected [20:07:39] first job of 5k took 18m and second run of 3k took 33m [20:08:16] looks to be possibly related to other processing using the db at the same time [20:08:39] hmm yeah that is slower [20:08:54] the generate ran ok? [20:08:55] I think the third job processed 3k in 11m [20:09:13] hey fr-tech I've already broken the rule for this week and had a meeting lol whats the current status? I can't tell from backscroll [20:09:13] I wonder if the pass off to smtp can be a factor? [20:09:18] yeah eileen the calculate one was fine [20:09:37] the second job (33m) was job_two using codfw MX [20:09:55] dstrine: going out but slower than hoped - although my calc was it would take 2 days which was faster than the expectaions you set [20:10:13] dstrine: eileen: I think we'll try to add a time limit so no jobs overlap [20:10:16] yeah. you can definitely see that the mailers aren't the bottle neck on the middle run: https://frmon.wikimedia.org/d/sZWA7LpZk/mail?orgId=1&from=now-6h&to=now&viewPanel=8 [20:10:43] (03CR) 10Jgleeson: [C: 03+2] "Looks good to me. let's give it a go!" [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751489 (owner: 10Ejegg) [20:11:05] ok r-tech but are you sending out a steady flow of emails or are you still in the investigation phase? [20:11:09] dstrine: we're just making one tiny code change to account for the variability of running it in production [20:11:20] ok [20:11:44] thanks jgleeson ! [20:11:47] dstrine: they are going out already [20:12:07] eileen: we don't have the schedule on just yet, I've been running the jobs 1 at a time [20:12:14] ejegg: oh ok [20:12:51] and since a fixed number of emails was taking a pretty variable amount of time depending on other stuff happenining in civi [20:13:14] we thought it would be best to switch to a time limit regulated by the API command itself [20:14:36] (03PS1) 10Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751491 [20:16:42] np ejegg [20:17:41] (03CR) 10Ejegg: [C: 03+2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751491 (owner: 10Ejegg) [20:21:36] dwisehaupt: does that graph indicate that we were sending less messages per seconds to the MX ? [20:21:41] in the second cluster [20:21:44] (03Merged) 10jenkins-bot: Add timeLimit option for EOYEmail.Send [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751489 (owner: 10Ejegg) [20:23:21] any idea why this query [20:23:22] select year, status, count(*) FROM wmf_eoy_receipt_donor GROUP BY year, status; [20:23:22] +------+--------+---- [20:23:47] has 28 queued for years other than 2021 [20:24:01] good question eileen, we were wondering that too [20:24:03] ejegg: spotted the NULL ones [20:24:24] do we know if they were there before the calc job ran? [20:24:31] yes they were eileen [20:24:39] the calc job only added ones for 2021 [20:24:55] hmm - so could be to do with the earlier code versions [20:24:56] and the send jobs have only been touching those [20:25:03] so we weren't too worried [20:25:07] I think maybe delete anything <> 2021 [20:30:28] 10Fundraising-Backlog: Storing many copies of the same file in civicrm (Planned_Giving_Guide, etc) - https://phabricator.wikimedia.org/T297308 (10Dwisehaupt) After the December fundraising, this has grown to 3366 files with 2.4G of disk space used. ` $ sha256sum Planned_Giving_Guide_* | awk '{print $1}' | sort |... [20:30:44] fr-tech thanks for your help on this and I'm sorry you're having to fiddle with it. I want to respect everyone's time today and this week. how do we get to a "set it and forget it" situation? [20:31:04] (03PS1) 10Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751492 [20:31:29] (03CR) 10Eileen: [C: 03+2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751492 (owner: 10Eileen) [20:31:47] dstrine: we could run one job an hour on schedule and check-in on it throughout the next 24 hours? fairly confident we won't have to touch that [20:31:49] dstrine: we are gonna schedule it once that patch ^^ is out [20:32:13] ok thanks eileen [20:32:41] eileen: oops, I've got a competing merge up [20:33:00] (03CR) 10Ejegg: [V: 03+2 C: 03+2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751491 (owner: 10Ejegg) [20:33:03] ejegg: dang I didn't spot it [20:33:17] I looked back in the irc log - but not far enough [20:33:58] !log civicrm revision aaceb4ab -> 328c8542 [20:34:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:06] ejegg: anyway - the fix is out [20:34:40] Shall I add timelimit to the job [20:34:44] yes please [20:36:56] eileen: let's go with 14 min in seconds (840) on the job [20:37:19] ejegg: I was just about to ask that [20:37:20] and change the yaml timeout to 15 minutes [20:38:30] ejegg: what is an example of one with a yaml timeout to copy? [20:38:48] eileen: i added timeouts to the eoy send jobs [20:39:51] yeah - I needed to git pull [20:43:15] !log config b26653a4 -> 40467fc2 (latest) [20:43:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:26] ejegg: so I pushed that - we need to run one manually? [20:44:46] just kicked off a manual job [20:46:35] fr-tech I was gonna change the ingenico prod key today (expires Saturday) but due to this I'll update that tomorrow when we're in a better place with EOY [20:47:11] thanks jgleeson [20:57:09] eileen: ejegg that job just completed [20:57:21] yep - it hit 3000 [20:57:36] within the 15 mins [20:57:39] 11m like the other [20:57:41] I just kicked off job 2 [20:57:45] oh hah, let's put that limit back up to maybe 5000? [20:58:57] so 11m for 3k is still roughly 17-18m for 5k [20:59:20] yep, but we have the time limit too now [21:00:21] awesome [21:02:20] !log process-control config 40467fc2 -> e58e4e50 [21:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:02:46] ok - updated to 5000 - I guess we just have to witness it stopping at 14 mins rather than doing the whole 5 k & then we can schedule? [21:03:24] yeah, seeing it hit that time limit would make me feel comfortable setting it to go on the schedule you have set [21:04:04] ok - the job 2 is still running from before but I've updated the limits so as soon as it finishes I'll set 1 going with the new limit [21:04:43] thanks! [21:08:06] thx ejegg eileen jgleeson :) [21:08:15] I'm about again if help is needed [21:12:24] that job 2 just finished by virtue of hitting the time limit [21:12:33] twice as slow - [21:12:41] but I think all good to schedule now [21:12:52] yep! [21:14:30] ok - so I expect job 2 to kick off in 5 mins [21:14:33] at 19 past [21:15:03] !log process-control checkout revision (e58e4e50 -> eb83f208) [21:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:59] we are expecting the job to stop at 33 past & then the other job to start at 34 past [21:29:42] time limit looks good [21:30:01] glad it's working! will check in on them tomorrow. bye for now! [21:31:35] night [21:34:38] how's it going? [21:34:41] :) [21:35:09] dstrine: the jobs are on now - just watching for another 15 mins to check the next transition & then will stop watching [21:35:25] ok cool! fingers crossed [21:35:32] 16.5 k have gone out from642k [21:36:01] great [21:36:15] this feels like we are setting up a slinky to go down a really tall set of stairs and just walking away hoping it'll make it to the bottom [21:36:35] job 1 currently running - it should end at 48 past & job 2 should start again at 49 past [21:36:50] note that job 2 has been much slower in all the runs I've done [21:37:00] (diff is just the smtp server) [21:39:21] k, looking good so far. I'mma head to the store and check on the jobs again when I get back. [21:44:01] (03PS1) 10Eileen: Use new money formatting util for smarty formatting [wikimedia/fundraising/crm/civicrm] - 10https://gerrit.wikimedia.org/r/751499 (https://phabricator.wikimedia.org/T296663) [21:49:07] ok - that transition worked fine - 3864 sent - the job 2 sent around 1300 - seems like a trent [21:49:09] trend [21:49:59] dstrine: I'm not monitoring anything now - it's a bit slower than I had hoped based on staging - but not slower than the expectations you set [21:50:34] eileen: ok so can I tell everyone that this is up and running now? [21:50:48] yep - around 20k have gone out [21:51:01] I don't want to pressure you for an estimate for when it ends unless you want to throw one out [21:51:16] I think I created a search kit view if anyone wants to watch [21:51:24] it seems to be doing around 8k per hour [21:52:04] which would make it 3-4 days [21:53:20] eileen: is job 2 going off to the codfw mail servers? [21:54:04] dwisehaupt: yeah looks like it [21:54:06] if so, that seems possible. especially if the process opens a new smtp connection for each mail. the added latency of going across the country 1300 times adds up. [21:54:45] yeah - also explains slower on prod than staging if the smtp connections are slower [21:55:44] unless there is a push to speed it up I think it's OK - I think the 3 ish day range is still comfortably in the expectations dstrine set [21:56:00] ok I'll let them know [21:57:06] dstrine: yeah - at 8k per hour it is a bit shy of 3.5 days - I think we are a bit ahead of that [21:57:58] fr-tech I'm gonna go pick up my dog from daycare. I hope everyone gets to have a chill week [21:58:03] thanks eileen [21:58:47] cool - not much chilling here in these temperatures though! [21:58:59] yeah. the main way i could see to speed it up would be to streamline the messages. so you open one smtp connection, then send across 100 or so messages, then close it and repeat. [21:59:16] let me check the logs to make sure we aren't already doing that. :) [21:59:38] I doubt we are [21:59:43] dstrine: thx, same! :) [22:00:33] yeah. looks like one message per connection. [22:01:07] we'd see some win by sending multiple, but i'm not sure how much of a pain it would be to rework the code. [22:01:14] yeah - I suspect it would be quite a bit to change it too [22:01:23] yeah. that's my thought. [22:01:50] I'm pretty sure stakeholders were primed for 5 days ish so I don't think it's worth it [22:02:47] yeah. definitely not worth it for now. but maybe for the future.