[02:27:52] fr-tech is anyone around to take a look at this DonationInterfaceFormSettings change? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DonationInterface/+/778334 [02:28:11] There are links to test it locally in the gerrit comment [02:30:12] I can ejegg [02:30:33] thanks cstone! [02:34:41] (03CR) 10Cstone: [C: 03+2] "Thanks for the links, looks good!" [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/778334 (https://phabricator.wikimedia.org/T293508) (owner: 10Ejegg) [02:34:48] :) [02:37:02] (03Merged) 10jenkins-bot: Uncomment DLocal ZA form with weight 0 [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/778334 (https://phabricator.wikimedia.org/T293508) (owner: 10Ejegg) [02:38:05] (03PS1) 10Ejegg: Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - 10https://gerrit.wikimedia.org/r/778376 [02:38:12] (03CR) 10Ejegg: [C: 03+2] Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - 10https://gerrit.wikimedia.org/r/778376 (owner: 10Ejegg) [02:38:57] (03Merged) 10jenkins-bot: Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - 10https://gerrit.wikimedia.org/r/778376 (owner: 10Ejegg) [02:39:01] (03PS1) 10Ejegg: Update DonationInterface submodule [core] (fundraising/REL1_35) - 10https://gerrit.wikimedia.org/r/778377 [02:39:03] (03CR) 10Ejegg: [C: 03+2] Update DonationInterface submodule [core] (fundraising/REL1_35) - 10https://gerrit.wikimedia.org/r/778377 (owner: 10Ejegg) [02:47:29] (03Merged) 10jenkins-bot: Update DonationInterface submodule [core] (fundraising/REL1_35) - 10https://gerrit.wikimedia.org/r/778377 (owner: 10Ejegg) [02:52:11] ten hours of battery life remaining on new laptop insanity [02:52:43] * ejegg jelly [02:57:19] 10Fundraising Sprint Anti-matter doesn't matter, 10Fundraising Sprint Fibonachos, 10Fundraising Sprint Xenomorph Petting Zoo, 10Fundraising Sprint e^🥧👀=yum, and 5 others: Enable South Africa through Dlocal - https://phabricator.wikimedia.org/T293508 (10Ejegg) Yep @Pcoombe, uncommenting it with selection_we... [11:43:04] <_joe_> fr-tech we had another flurry of your failmails [11:43:19] <_joe_> we are going to delete them as we're getting paged [12:22:55] argh _joe_ thanks for the heads up will investigate shortly. sorry! [12:23:42] damilare: im just relocating so i might be a bit late to the call. ill ping you when im on reliable internet [12:28:21] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 10357 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 11 keys, up 152 days 11 hours - memory use is 22.62M (peak 22.67M, 0.33% of max, fragmentation 1.20%), connected_slaves is 3, donations is 531, jobs is 0, jobs-adyen is 303, payments-antifraud is 2320, payments-init is 681, pending is 1430, recurring is 0, refund is 0, unsubscribe is 7 https:/ [12:28:21] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:28:52] fr-tech there's another 17k fr-tech-failmail in the mail queue :-/ [12:33:19] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 10640 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 23.09M (peak 23.25M, 0.34% of max, fragmentation 1.20%), connected_slaves is 3, donations is 534, jobs is 0, jobs-adyen is 303, payments-antifraud is 2333, payments-init is 684, pending is 1437, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:33:19] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:35:22] okk jgleeson [12:35:31] fr-tech now 19k and still going up, this is comfortably above the page threshold for the mail queue [12:38:19] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 10700 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 23.30M (peak 23.31M, 0.34% of max, fragmentation 1.20%), connected_slaves is 3, donations is 539, jobs is 0, jobs-adyen is 305, payments-antifraud is 2346, payments-init is 689, pending is 1447, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:38:19] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:39:06] I can bin another pile of them as before, but this obviously isn't entirely sustainable, especially on Friday afternoon [12:43:19] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 11198 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 24.02M (peak 24.15M, 0.35% of max, fragmentation 1.20%), connected_slaves is 3, donations is 541, jobs is 0, jobs-adyen is 306, payments-antifraud is 2354, payments-init is 691, pending is 1455, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:43:19] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:45:25] Emperor: I'll deploy a change to stop those shortly [12:45:43] damilare: I'm back now, wanna jump on a call and we'll fix those emails? [12:47:47] jgleeson: cool, thanks - when you've done so, shall I discard the ones in the queue again? [12:48:03] 24k now... [12:48:19] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 11867 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 25.08M (peak 25.09M, 0.36% of max, fragmentation 1.19%), connected_slaves is 3, donations is 544, jobs is 0, jobs-adyen is 307, payments-antifraud is 2371, payments-init is 694, pending is 1463, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:48:19] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:49:55] Emperor: yeah you can probably clear those out now [12:49:57] !log disabled paypal IPN listener failmail [12:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:12] oh did you just revert the revert ejegg ? [12:50:18] oh hi jgleeson, I just redid that disablement setting [12:50:21] yep [12:50:28] ah cool thanks [12:51:08] so if PayPal can't keep their own server working, maybe they can at least monitor it and stop sending us IPNs when we can't verify them [12:51:43] k, gotta go help with the kiddo [12:51:53] yeah I need to go back to them with a complete explanation [12:51:57] thanks bye now [12:53:17] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12187 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 25.61M (peak 25.70M, 0.37% of max, fragmentation 1.18%), connected_slaves is 3, donations is 546, jobs is 0, jobs-adyen is 311, payments-antifraud is 2381, payments-init is 696, pending is 1473, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:53:17] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:54:42] jgleeson: cleared out again; LMK when you think you've stopped them coming in? [12:58:11] Emperor: a fix just got deployed to switch off the error emails [12:58:17] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12277 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 25.77M (peak 25.87M, 0.37% of max, fragmentation 1.18%), connected_slaves is 3, donations is 547, jobs is 0, jobs-adyen is 311, payments-antifraud is 2392, payments-init is 698, pending is 1479, recurring is 1, refund is 0, unsubscribe is 7 https:/ [12:58:17] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:58:43] jgleeson: I'll delete the 738 that arrived since my last clearout and see if they stop then :) [13:00:36] jgleeson: still arriving at a fair rate [13:01:36] hmm that might be a back log? ejegg|away just deployed a change which prevents failure emails being sent when paypal api calls fail [13:01:58] are the times from earlier? [13:03:10] most recent one (I think!) has 2022-04-08T13:02:09+00:00 in the text [13:03:17] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12326 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 11 hours - memory use is 25.88M (peak 25.97M, 0.37% of max, fragmentation 1.18%), connected_slaves is 3, donations is 552, jobs is 0, jobs-adyen is 313, payments-antifraud is 2407, payments-init is 703, pending is 1490, recurring is 1, refund is 0, unsubscribe is 7 https:/ [13:03:17] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:03:25] jgleeson: which is basically now AFAICT [13:03:54] hmm yeah that looks like now [13:04:03] lemme double check ejegg|away pushed that change out [13:08:17] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12436 10000 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.10M (peak 26.13M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 556, jobs is 0, jobs-adyen is 315, payments-antifraud is 2425, payments-init is 708, pending is 1499, recurring is 1, refund is 0, unsubscribe is 7 https:/ [13:08:17] ikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:11:01] still coming in - we're approaching the critical threshold for the mail queue again :-/ [13:13:05] Emperor: yeah can see them coming in. the previous fix hasn't fixed them. just looking into why [13:13:17] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12540 10000, pending is 1515 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.29M (peak 26.33M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 565, jobs is 0, jobs-adyen is 315, payments-antifraud is 2454, payments-init is 717, recurring is 1, refund is 0, unsubscribe is 7 ht [13:13:17] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:13:34] do we have a bit of time to figure this out or do you need us to do something immediately Emperor ? [13:14:10] if it's the latter we might have to switch something bigger off [13:14:16] which will have sideeffects [13:14:51] jgleeson: I can keep binning them for now [13:16:00] jgleeson: and can carry on doing so while you work (though obviously repeated bulk-email-deletion makes me a bit nervous!); it is Friday afternoon, though, so I'd like us to be confident of a solution in the next couple of hours if poss? [13:16:43] (I think the p.age threshold is 4000 in the queue, I can keep an eye on it and keep it below that, I think) [13:16:45] thanks Emperor, yeah mass email deletion isn't great I agree [13:16:56] we're on a call now working through it [13:17:05] fr-tech if anyone else is around feel free to jump on [13:17:22] jgleeson: OK, I'll keep on top of the mail pileup for now then [13:17:30] meet.google.com/ozf-kznh-wif [13:17:33] thanks Emperor ! [13:18:15] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12554 10000, pending is 1522 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.35M (peak 26.41M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 569, jobs is 0, jobs-adyen is 315, payments-antifraud is 2472, payments-init is 723, recurring is 1, refund is 0, unsubscribe is 7 ht [13:18:15] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:18:44] looks like that's also related to a paypal issue :-/ [13:23:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12591 10000, pending is 1531 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.47M (peak 26.53M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 572, jobs is 0, jobs-adyen is 318, payments-antifraud is 2483, payments-init is 727, recurring is 1, refund is 0, unsubscribe is 7 ht [13:23:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:23:33] (03Restored) 10Jgleeson: Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/777830 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:25:06] (03CR) 10Damilare Adedoyin: [C: 03+2] Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/777830 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:25:38] (03Merged) 10jenkins-bot: Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/777830 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:25:47] fr-tech we're gonna push out the code change originally put forward to quiet down the paypal failures as the config change doesn't seem to be working anymore [13:26:00] no idea why but we can dig in more when we turn off the hose [13:28:01] (03PS1) 10Jgleeson: Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778508 (https://phabricator.wikimedia.org/T305553) [13:28:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12658 10000, pending is 1545 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.59M (peak 26.64M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 581, jobs is 0, jobs-adyen is 318, payments-antifraud is 2512, payments-init is 736, recurring is 1, refund is 0, unsubscribe is 7 ht [13:28:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:29:05] (03CR) 10Jgleeson: [C: 03+2] Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778508 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:29:30] (03Merged) 10jenkins-bot: Downgrade logging level for Paypal IPN failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778508 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:31:47] ok that fix is out, let's see how we get on [13:32:30] queue currently 865 fr-tech-failmail [13:33:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12729 10000, pending is 1556 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.73M (peak 26.79M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 587, jobs is 0, jobs-adyen is 320, payments-antifraud is 2532, payments-init is 742, recurring is 1, refund is 0, unsubscribe is 7 ht [13:33:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:34:13] Emperor: I *think* we've switch off the hose [13:34:19] jgleeson: now 942, so still going up [13:34:38] Emperor: we just deployed it and I see errors in the logs but no corresponding emails [13:34:53] 970... [13:35:03] we can keep waiting a bit, see if it tails off [13:35:23] I'm fairly confident it will [13:36:21] jgleeson: do you have access to the system that makes the emails? frpig1001.frack.eqiad.wmnet I think? [13:37:22] I don't Emperor [13:37:30] are they still going up? [13:37:41] yes; 1023 [13:38:00] oh. I don't see them coming into my inbox [13:38:04] are you blocking them somehow? [13:38:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12781 10000, pending is 1563 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.84M (peak 26.91M, 0.38% of max, fragmentation 1.18%), connected_slaves is 3, donations is 588, jobs is 0, jobs-adyen is 325, payments-antifraud is 2544, payments-init is 745, recurring is 1, refund is 0, unsubscribe is 7 ht [13:38:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:38:19] (I'm using that as an indicator as to whether the fix worked or not) [13:38:34] jgleeson: they're queuing up on mx1001 because google is saying "too much mail" [13:39:01] jgleeson: and then when the queue gets to large I'm discarding them from the mail queue [13:39:14] can you let them through please [13:39:29] just in case it's new failures related to the patch [13:39:42] we shouldn't be getting failure emails WRT the original issue [13:39:44] jgleeson: Not being funny, but no. Google is refusing to have them from us. [13:39:57] jgleeson: I can fish out the body of one from the queue and email it to you? [13:40:12] ok cool that would help thanks [13:40:34] fr-tech this is feeling like a major issue now [13:40:48] jgleeson: will do; give me a mo [13:40:52] thanks! [13:41:32] (want to try and make sure I find the most recent one) [13:42:24] Emperor: need help? [13:43:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12811 10000, pending is 1588 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 26.96M (peak 27.40M, 0.39% of max, fragmentation 1.18%), connected_slaves is 3, donations is 595, jobs is 0, jobs-adyen is 325, payments-antifraud is 2578, payments-init is 756, recurring is 1, refund is 0, unsubscribe is 7 ht [13:43:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:43:35] jhathaway: got one [13:46:18] jhathaway: you should have mail now [13:46:27] checking [13:47:14] not seeing anything Emperor [13:47:14] oh, sorry, sent it to jhathaway not jgleeson [13:47:18] ha! [13:47:19] sorry, I am stupid [13:47:32] I did get it though! [13:47:51] jgleeson: you should have mail now [13:48:15] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12862 10000, pending is 1592 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 27.06M (peak 27.40M, 0.39% of max, fragmentation 1.18%), connected_slaves is 3, donations is 598, jobs is 0, jobs-adyen is 335, payments-antifraud is 2591, payments-init is 759, recurring is 1, refund is 0, unsubscribe is 8 ht [13:48:15] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:49:08] OK thanks Emperor i can confirm that there is still an issue [13:49:17] ok we will deploy a *bigger* fix now [13:49:24] sorry about this and thanks for sending that over [13:50:04] fr-tech, lowering the logging down to Logger::notice() still sends failmails so I'm going to temporarily switch off logging entirely for paypal ipncalls until we can figure this out [13:50:07] jgleeson: um, can I bin the current lot of queued mails again? we're going to hit p.age threshold again shortly [13:50:15] yes bin them Emperor [13:50:19] ack [13:52:31] jgleeson: the reason I was asking about access to the frpig servers is it's presumably possible that they have a backlog of mails to send? [13:52:37] (03PS1) 10Jgleeson: Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778513 (https://phabricator.wikimedia.org/T305553) [13:53:15] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12882 10000, pending is 1601 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 27.16M (peak 27.40M, 0.39% of max, fragmentation 1.17%), connected_slaves is 3, donations is 601, jobs is 0, jobs-adyen is 337, payments-antifraud is 2603, payments-init is 764, recurring is 1, refund is 0, unsubscribe is 8 ht [13:53:15] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:53:26] (03CR) 10Damilare Adedoyin: [C: 03+2] Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778513 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:53:54] (03Merged) 10jenkins-bot: Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778513 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:54:03] Emperor: I do have access to frpig1001 [13:54:24] although I'm not familiar with digging into mail exchange stuff [13:54:57] jgleeson: exim -bpc will give you the queue count (probably needs sudo) [13:55:25] (03PS1) 10Jgleeson: Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778515 (https://phabricator.wikimedia.org/T305553) [13:56:31] (03CR) 10Jgleeson: [C: 03+2] Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778515 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:57:43] (03Merged) 10jenkins-bot: Remove logging temporarily for paypal IPN validation failures [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778515 (https://phabricator.wikimedia.org/T305553) (owner: 10Jgleeson) [13:58:10] Emperor: pushing out new fix now [13:58:15] * jgleeson crosses fingers [13:58:21] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12919 10000, pending is 1609 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 27.25M (peak 27.40M, 0.39% of max, fragmentation 1.18%), connected_slaves is 3, donations is 605, jobs is 0, jobs-adyen is 337, payments-antifraud is 2623, payments-init is 768, recurring is 1, refund is 0, unsubscribe is 8 ht [13:58:21] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:58:24] 👍 [13:58:35] (currently 399 fr-tech-fail in the queue) [13:59:36] ok it's live [13:59:49] I can see the effects in the logs but will wait on your side [13:59:56] now 517, let's see [14:02:11] 677 [14:03:04] oh crap I can see another error is triggering them now [14:03:21] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 12955 10000, pending is 1614 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 12 hours - memory use is 27.28M (peak 27.40M, 0.39% of max, fragmentation 1.18%), connected_slaves is 3, donations is 607, jobs is 0, jobs-adyen is 337, payments-antifraud is 2631, payments-init is 770, recurring is 2, refund is 0, unsubscribe is 8 ht [14:03:21] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:03:21] wow ok this one really doesn't want to go away [14:03:56] sorry :-/ [14:03:57] Emperor: I need to relocate and I'm gonna be away for about 15 minutes but once I get back I'll continue. sorry about this [14:04:15] jgleeson: NP, I'll zot the queue if it gets near the p.age threshold, otherwise just keep an eye [14:04:25] we know why it's still failing but it's gonna take time to understand how to fix it as the quick hacks arent working [14:08:16] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13048 10000, pending is 1622 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 27.49M (peak 27.55M, 0.39% of max, fragmentation 1.17%), connected_slaves is 3, donations is 608, jobs is 0, jobs-adyen is 337, payments-antifraud is 2640, payments-init is 772, recurring is 2, refund is 0, unsubscribe is 8 ht [14:08:16] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:13:21] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13094 10000, pending is 1640 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 27.57M (peak 27.68M, 0.39% of max, fragmentation 1.17%), connected_slaves is 3, donations is 615, jobs is 0, jobs-adyen is 339, payments-antifraud is 2664, payments-init is 781, recurring is 2, refund is 0, unsubscribe is 8 ht [14:13:21] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:18:21] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13155 10000, pending is 1646 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 27.69M (peak 27.75M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 620, jobs is 0, jobs-adyen is 342, payments-antifraud is 2677, payments-init is 786, recurring is 2, refund is 0, unsubscribe is 8 ht [14:18:21] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:23:22] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13200 10000, pending is 1655 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 27.79M (peak 27.92M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 622, jobs is 0, jobs-adyen is 346, payments-antifraud is 2686, payments-init is 788, recurring is 2, refund is 0, unsubscribe is 8 ht [14:23:22] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:24:11] (03PS1) 10Jgleeson: Comment out exception tiggering Paypal IPn validatior failmails. [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778522 [14:24:17] ok damilare that patch should STOP them once and for all ^^^ [14:24:30] also I'll rejoin the call [14:25:00] (03PS1) 10Damilare Adedoyin: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 [14:25:10] oohh [14:25:20] (03CR) 10jerkins-bot: [V: 04-1] Comment out exception tiggering Paypal IPn validatior failmails. [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778522 (owner: 10Jgleeson) [14:25:20] (that patch stop exception being thrown at the end at the higher level) [14:25:41] argh Ci [14:25:50] I'm on the call now [14:26:15] I had a similar patch that should work like yours [14:26:20] (03CR) 10jerkins-bot: [V: 04-1] Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 (owner: 10Damilare Adedoyin) [14:26:42] (03PS2) 10Jgleeson: Comment out exception tiggering Paypal IPn validatior failmails. [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778522 [14:28:18] joining now [14:28:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13244 10000, pending is 1666 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 27.89M (peak 27.99M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 626, jobs is 0, jobs-adyen is 350, payments-antifraud is 2699, payments-init is 792, recurring is 2, refund is 0, unsubscribe is 8 ht [14:28:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:28:25] (03PS2) 10Damilare Adedoyin: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 [14:28:56] (03CR) 10jerkins-bot: [V: 04-1] Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 (owner: 10Damilare Adedoyin) [14:30:18] hey jgleeson damilare I'm awake now [14:30:18] (03PS3) 10Damilare Adedoyin: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 [14:31:25] if i can help with anything [14:32:00] (03CR) 10Jgleeson: [C: 03+2] Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 (owner: 10Damilare Adedoyin) [14:32:14] just pushing out a patch we think we fix it cstone [14:32:27] we're talking on meet.google.com/ozf-kznh-wif [14:32:27] (03Merged) 10jenkins-bot: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 (owner: 10Damilare Adedoyin) [14:33:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13323 10000, pending is 1672 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.07M (peak 28.07M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 627, jobs is 0, jobs-adyen is 351, payments-antifraud is 2705, payments-init is 793, recurring is 2, refund is 0, unsubscribe is 8 ht [14:33:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:38:00] (03PS1) 10Jgleeson: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778527 [14:38:22] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13372 10000, pending is 1685 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.16M (peak 28.25M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 635, jobs is 0, jobs-adyen is 352, payments-antifraud is 2736, payments-init is 801, recurring is 2, refund is 0, unsubscribe is 8 ht [14:38:22] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:41:36] FWIW, queue currently has 301 fr-tech-failmail messages in [14:41:51] Emperor: are they still growing? [14:42:19] ooh, still 301 [14:42:29] Let me give it a couple of minutes to be sure :) [14:42:42] 302. pah. [14:42:55] ok we'e pushing out ANOTHER patch lol [14:42:57] sorry [14:43:02] :) [14:43:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13492 10000, pending is 1697 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.42M (peak 28.44M, 0.40% of max, fragmentation 1.17%), connected_slaves is 3, donations is 637, jobs is 0, jobs-adyen is 361, payments-antifraud is 2748, payments-init is 803, recurring is 2, refund is 0, unsubscribe is 8 ht [14:43:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:43:38] (03CR) 10Jgleeson: [C: 03+2] Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778527 (owner: 10Jgleeson) [14:44:48] (03Merged) 10jenkins-bot: Little modificaiton on paypal validation to fix failmail mess [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778527 (owner: 10Jgleeson) [14:48:22] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13643 10000, pending is 1704 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.64M (peak 28.78M, 0.41% of max, fragmentation 1.17%), connected_slaves is 3, donations is 642, jobs is 0, jobs-adyen is 365, payments-antifraud is 2764, payments-init is 808, recurring is 2, refund is 0, unsubscribe is 8 ht [14:48:22] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:49:11] Currently 0 fr-tech-failmail messages in the queue \o/ [14:50:04] ok that's good news. it looks like paypal stopped hitting us and we also just deployed a patch which will hopefully prevent any further failure emails [14:50:16] thanks for all your help on this Emperor [14:51:14] jgleeson: no worries, glad to be of assistance :) [14:53:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13719 10000, pending is 1712 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.78M (peak 28.87M, 0.41% of max, fragmentation 1.17%), connected_slaves is 3, donations is 644, jobs is 0, jobs-adyen is 365, payments-antifraud is 2772, payments-init is 810, recurring is 2, refund is 0, unsubscribe is 8 ht [14:53:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [14:58:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13805 10000, pending is 1724 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 28.97M (peak 29.01M, 0.41% of max, fragmentation 1.17%), connected_slaves is 3, donations is 650, jobs is 0, jobs-adyen is 368, payments-antifraud is 2791, payments-init is 816, recurring is 2, refund is 0, unsubscribe is 8 ht [14:58:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:03:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13852 10000, pending is 1732 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 13 hours - memory use is 29.08M (peak 29.13M, 0.42% of max, fragmentation 1.19%), connected_slaves is 3, donations is 657, jobs is 0, jobs-adyen is 375, payments-antifraud is 2810, payments-init is 823, recurring is 2, refund is 0, unsubscribe is 8 ht [15:03:18] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:05:43] (03CR) 10Ejegg: "This will discard all IPNs PayPal sends us while their service is broken and return a 200 OK to PayPal meaning they will not re-send them." [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778523 (owner: 10Damilare Adedoyin) [15:06:33] damilare / jgleeson: that solution looks risky to me ^^^ [15:06:40] crap! [15:06:43] just seen that comment [15:06:47] ok we wll walk that back [15:07:20] did I botch the settings deploy earlier? Sorry if the failmails kept sending [15:07:37] ejegg|away: were in here if you want to join https://meet.google.com/ozf-kznh-wif?pli=1 [15:07:52] actually ejegg|away [15:07:54] i have to do a couple things here [15:07:58] I don't think it will send back 200 [15:08:11] as $valid will be false [15:08:15] just double checking [15:08:20] oh ok [15:08:20] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13865 10000, pending is 1744 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 12 keys, up 152 days 14 hours - memory use is 29.12M (peak 29.21M, 0.41% of max, fragmentation 1.17%), connected_slaves is 3, donations is 659, jobs is 0, jobs-adyen is 376, payments-antifraud is 2825, payments-init is 826, recurring is 2, refund is 0, unsubscribe is 8 ht [15:08:20] nga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:09:15] oh shoot, are queues not being consumed? [15:09:26] argh, did I leave them off after last night's crm deployt? [15:09:27] no [15:09:28] yeah ejegg|away we were trying to figure that out too [15:09:34] damn damn [15:09:46] ejegg|away: it looks like something went wrong on the jobs deploy but also that config setting for the paypal change didn't work either [15:09:52] it is on the server though, civi1001 [15:10:07] the jobs change is on the server? [15:10:46] ugh, no it's not [15:10:53] really sorry folks [15:10:57] sorry the failmail change [15:11:01] not the consumers [15:11:05] I missed the 'turn jobs back on' step [15:11:19] we saw the revert of the turn them off though? [15:11:42] ahh okay im confuse [15:11:43] I guess I didn't get the update/rsync right [15:12:01] bah, looks like i rsynced without updating first [15:12:27] !log re-enabled fundraising scheduled jobs [15:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:33] ok, really gotta |away now [15:13:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: jobs-paypal is 13929 10000, pending is 1753 1500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 10 keys, up 152 days 14 hours - memory use is 28.19M (peak 29.36M, 0.41% of max, fragmentation 1.21%), connected_slaves is 3, donations is 671, jobs is 0, jobs-adyen is 379, payments-antifraud is 2472, payments-init is 0, recurring is 2, refund is 0, unsubscribe is 8 http [15:13:18] a.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:13:57] fr-tech, I'm gonna roll back the changes we added to quiet down failmail [15:14:04] or walkback to more specific [15:15:22] (03PS1) 10Jgleeson: Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778240 [15:16:08] (03CR) 10Damilare Adedoyin: [C: 03+2] Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778240 (owner: 10Jgleeson) [15:16:38] (03Merged) 10jenkins-bot: Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778240 (owner: 10Jgleeson) [15:17:17] Emperor: we're gonna have to roll back some of the code we added earlier to stop the failure emails. [15:18:38] jgleeson: do you expect more mail-spam, then? [15:19:28] (03PS1) 10Jgleeson: Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778531 [15:19:57] Emperor: it's possible but I can't be sure. We're still working on it but once of the patches we pushed out has unwanted side effects [15:20:17] (03CR) 10Jgleeson: [C: 03+2] Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778531 (owner: 10Jgleeson) [15:20:43] (03Merged) 10jenkins-bot: Revert "Little modificaiton on paypal validation to fix failmail mess" [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778531 (owner: 10Jgleeson) [15:21:47] ok the latest patch is reverted [15:21:58] jgleeson: OK, cool. I have to hard stop at 16:00 UTC (about 40 mins away), FWIW [15:22:11] (so if we think the risk period extends beyond then, I should try and find a handover) [15:23:28] ok thanks for the heads up Emperor [15:24:31] hkgbugkvidntihnuevlilngh [15:24:54] I hope that wasn't something you didn't want in a public IRC log... [15:25:17] oh no worries just the second half of my yubikey attempt [15:27:03] :) [15:27:13] ejegg|away: for when you get back. For whatever reason that NullLogStream fix did not work. I can see the config on frpig1001but it's not stopping the failmail. [15:27:28] jgleeson: FWIW, no queued-up failmail on mx1001 ATM [15:28:22] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 13139 9500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 6 keys, up 152 days 14 hours - memory use is 9.39M (peak 29.36M, 0.21% of max, fragmentation 1.83%), connected_slaves is 3, donations is 117, jobs is 0, jobs-adyen is 2, jobs-paypal is 24, payments-antifraud is 0, payments-init is 0, pending is 0, refund is 0, unsubscribe is 10 https://icinga.wik [15:28:22] g/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:29:33] great Emperor. The doors are open though so we might get some [15:29:48] currently solutionizing on a call [15:29:55] I don't like that word [15:31:47] * Emperor neither ;-p [15:33:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 13229 9500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 152 days 14 hours - memory use is 9.38M (peak 29.36M, 0.18% of max, fragmentation 1.54%), connected_slaves is 3, donations is 12, jobs is 0, jobs-adyen is 0, jobs-paypal is 16, payments-antifraud is 2, payments-init is 0, pending is 2, refund is 0, unsubscribe is 10 https://icinga.wiki [15:33:18] /cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:38:18] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 13285 9500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 152 days 14 hours - memory use is 9.44M (peak 29.36M, 0.17% of max, fragmentation 1.51%), connected_slaves is 3, donations is 5, jobs is 0, jobs-adyen is 0, jobs-paypal is 17, payments-antifraud is 1, payments-init is 1, pending is 0, refund is 0, unsubscribe is 10 https://icinga.wikim [15:38:18] cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:39:47] ACKNOWLEDGEMENT - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 13285 9500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 152 days 14 hours - memory use is 9.44M (peak 29.36M, 0.17% of max, fragmentation 1.51%), connected_slaves is 3, donations is 5, jobs is 0, jobs-adyen is 0, jobs-paypal is 17, payments-antifraud is 1, payments-init is 1, pending is 0, refund is 0, unsubscribe is 10 Jeff_Green k [15:39:47] progress https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [15:53:14] (03CR) 10Wfan: "not sure should we also add the rfdqc ?" [wikimedia/fundraising/dev] - 10https://gerrit.wikimedia.org/r/777496 (owner: 10Ejegg) [16:11:58] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM, 10FR-AutoTY-Email, 10FR-Email: From Name in Auto Thank You Emails needs update - https://phabricator.wikimedia.org/T304909 (10KHaggard) 05Open→03Resolved Now that we can edit this ourselves, I'll resolve this tic... [16:15:34] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10DStrine) [16:15:45] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10DStrine) p:05Triage→03High [16:19:16] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10DStrine) [17:22:51] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10jgleeson) Paypal Response ` 2022-04-08T14:43:05+00:00 [INFO] Preparing to send POST request to https://ipnpb.paypal.com/cgi-bin/webscr 2022-04-08T14:43:05+00:00 [DEBUG]... [17:53:25] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10Cstone) Data hunting from Dallas: 01 Apr 02 Apr 03 Apr 13274 200 04 Apr 13499 200 16 403 05 Apr 15059 200 06 Apr 17348 200 14353 403 07 Apr 15356 200 08 Apr... [18:26:05] 10Fundraising Sprint Fibonachos, 10Fundraising-Backlog: Manage paypal through the weekend - https://phabricator.wikimedia.org/T305717 (10Dwisehaupt) For clarity, the numbers above are the return code from apache for requests to the smashpig_http_handler. The previous month (March) had ~820k 200's and 5 403's. [18:43:20] RECOVERY - check_redis on frqueue1003 is OK: OK: REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 7 keys, up 152 days 17 hours - memory use is 6.06M (peak 29.36M, 0.13% of max, fragmentation 1.82%), connected_slaves is 3, donations is 11, jobs is 0, jobs-adyen is 42, jobs-paypal is 0, payments-antifraud is 1, payments-init is 1, pending is 0, recurring is 7361, refund is 0, unsubscribe is 1 https://icinga.wikimedia.org/cgi-bin/i [18:43:20] info.cgi?type=2&host=frqueue1003&service=check_redis [19:34:16] AndyRussG: logging off now but we found some other stuff out on the call so I'd recommend when you get back try and catch the last 30 minutes of the recording as we start stepping into the log streams code locally and then Dami came up with a great idea about how to switch them off in config. [19:34:22] bye for now fr-tech o/ [19:34:30] Have a good weekend [19:35:02] I'll be back abit later XenoRyet but if I don't talk to you before I go next week, I hope you're dad continues to get better! [19:49:22] hi jgleeson|away, I've pushed up the update. Please review when you're back [20:15:07] was there more info or config validation needed from me to help with the smashpig issue? [20:49:59] 10Fundraising-Backlog, 10fundraising-tech-ops: Fundraising access request for Natalie Ngu - https://phabricator.wikimedia.org/T305588 (10Dwisehaupt) [20:51:35] 10Fundraising-Backlog, 10fundraising-tech-ops: Fundraising access request for Natalie Ngu - https://phabricator.wikimedia.org/T305588 (10Dwisehaupt) SSL client certificate created and sent via email. Password sent via SMS. CiviCRM account created and set with random password. Notification email sent with passw... [21:04:17] fr-tech: interesting that we have seen 2 spates of db contention failmail. i don't see anything on the db but i'll watch actively when then next run is scheduled that was tossing the errors [21:05:02] there is a known issue with the recurring one somestimes dwisehaupt lets see if ican find the task [21:16:56] i figured out my ssh nonsense I had misplaced the matching .pub and it thats what it needed [21:18:45] ah. interesting. [21:28:44] thanks damilare ! [21:29:11] family just left. kids gone to bed. ahhhhhh bliss [21:29:30] cue the paypal failmails [21:30:04] just normal failmails :P [21:32:17] ha! [21:32:43] fr-tech I'm gonna revert out those hacky inline code comments we added earlier today to try to stop the failmail [21:32:52] thanks jgleeson [21:36:52] dwisehaupt: do you have a link handy to how to set up the civi cert I am failing at finding the page [21:38:36] (03PS1) 10Jgleeson: Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778593 [21:39:33] fr-tech that patch ^^^ will remove of hacky stuff from earlier [21:39:43] s/of/our/* [21:40:32] cstone: https://collab.wikimedia.org/wiki/Fundraising/Engineering/SSL_Client_Authentication#Installing_the_Certificate_on_Your_System [21:41:09] thanks dwisehaupt [21:41:24] (03CR) 10Cstone: [C: 03+2] Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778593 (owner: 10Jgleeson) [21:41:53] (03Merged) 10jenkins-bot: Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] - 10https://gerrit.wikimedia.org/r/778593 (owner: 10Jgleeson) [21:42:09] thanks cstone ! [21:43:00] (03PS1) 10Jgleeson: Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778594 [21:43:54] (03CR) 10Jgleeson: [C: 03+2] Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778594 (owner: 10Jgleeson) [21:44:18] (03Merged) 10jenkins-bot: Revert temporary code comments to stop paypal failmail [wikimedia/fundraising/SmashPig] (deployment) - 10https://gerrit.wikimedia.org/r/778594 (owner: 10Jgleeson) [21:48:11] ill clean up those database contention ones once it clears out the recurring queue [21:51:00] I just had a look over the localsettings update damilare push up and I think we might need to tweak it slightly. It looks like we need to get that bit ready before we make a new smashpig code deployment so I'll pull it down locally and see if I can confirm if it works as expected [21:53:21] I knew i was reliant on my terminal color coding but man its hard without it haha need to get that over to new comp [21:59:42] ha yeah was that the neo pink one? [21:59:48] neon* [21:59:52] i have like 6 different ones [22:00:02] one for civi one for redis one for smashpig one for payments etc [22:00:17] ahh lol [22:00:27] civi is pink yeah haha [22:18:59] oh man [22:19:10] I think I know why ejegg|away's config fix didn't take effect earlier [22:19:36] just gonna try it to confirm [22:19:50] but it looks like it was missing the 'logging:' top level key [22:20:35] ooh [22:21:18] actually that's not quite right, the logging: key is there [22:21:30] but they arent the same? [22:21:31] but the log streams config lines don't sit under it [22:22:08] https://github.com/wikimedia/wikimedia-fundraising-SmashPig/blob/561cc9a00e98b59c8f0c6d99fcc09ca4be015772/config/provider-defaults.yaml#L6 [22:22:26] cstone: see the way logging: is the top level key and the streams are children of that [22:22:45] yeah [22:23:07] however in the paypal/main.yaml the log streams lines aren't nested within in 'logging:' they are on the same level [22:23:14] so I think they're being ignored [22:24:03] 10Fundraising-Backlog, 10fundraising-tech-ops: Fundraising access request for Natalie Ngu - https://phabricator.wikimedia.org/T305588 (10Dwisehaupt) [22:25:22] 10Fundraising-Backlog, 10fundraising-tech-ops: Fundraising access request for Natalie Ngu - https://phabricator.wikimedia.org/T305588 (10Dwisehaupt) SSH public key added. mysql accounts created. mysql grants created and applied. Added credentials to mysql config files and verified mysql database access. Verifi... [22:26:28] that's it [22:26:31] just confirmed it [22:27:04] adding in ejegg|away's change with the additional top-level 'logging' key replaces the failmail log stream with the null one [22:27:32] when testing out damilare's update it wasn't working and then I all of sudden realised is didn't have any indentation [22:27:45] WHY did my eyes not see this 10 hours ago [22:27:51] sigh [22:28:18] there was a lot of chaos jgleeson [22:28:36] you didn't see it cause it literally wasn't there :P [22:29:00] nice find! [22:29:22] did it never work then and paypal just got their shit together at convienent times both times? [22:30:00] i guess the recurrings would tail off too [22:30:31] and yesterday it looks like we sent back no 403s to Paypal based on the logs analysis dallas posted here https://phabricator.wikimedia.org/T305717#7841784 which made it LOOK LIKE it was fixed but instead it seems Paypal just didn't fall over [22:30:46] yes cstone that looks like the explanation [22:30:57] so complicated [22:31:39] much stress [22:31:56] much stress indeed [22:32:10] * jgleeson says that in my Doge internal voice [22:32:58] ok so I'll rejig the original fix and push that up [22:33:16] then we can push that out with the code clean ups and enjoy a failmail free weekend hopefully [22:33:25] dstrine: we figured it out. the weekend is safe [22:34:36] jgleeson: ahh congrats! [22:34:43] I'm back-ish btw as needed [22:34:59] also congtrats cstone dwisehaupt :) [22:35:06] I take no credit here haha [22:35:39] jgleeson: very cool! congrats! However I am a little bummed that you should be well into your weekend already [22:36:04] all around team effort with marathon calls and lots of good learnings [22:36:10] def learned A LOT [22:36:12] well that's good [22:36:47] cstone: jgleeson is it written down anywhere? [22:36:51] :) [22:37:10] good point. we should add the fix to the ticket for posterity [22:37:18] I'll summarise a bit in my EOD also [22:38:08] jgleeson: cstone perfection is the enemy don't be afraid to dump a bunch of notes in a relatively appropriate wiki page [22:38:20] i was going to do more log investigation and update the ticket too with that [22:38:28] wiki mean "quick" just saying... [22:38:30] just like start/stop times of the recurring messages and fail mails [22:39:42] cstone: jgleeson but yeah the phab task is good too. The phab should help us understand what got changed on prod at a given time [22:40:17] Many more people reference our tasks in the future. we often have to backtrack for DR and analytics. [22:40:54] Payments pulls data on how often things fail on us so they have data to yell at PSPs :) [22:42:50] there should be about 13k paypal recurrings at 10am utc on the 9th [22:43:07] fr-tech so to confirm, nothing I should no on this now? apologies for taking a while to get back, was stuck in traffic for 2 hrs in an Uber with 3 teenagers...... [22:47:16] fr-tech could someone double check the latest update to process-control smashpig config before I push it out pls [22:47:22] AndyRussG: ^^^ [22:48:02] oki yee one sec [22:48:17] thanks much [22:48:43] thank u jgleeson! [22:49:57] indentation!!?!?!! noooooooo........... [22:51:05] ha [22:51:21] that's all is was missing [22:51:42] jgleeson: the change looks good to me [22:51:48] wow oki that's just insaaaaane [22:51:58] very happy it was found :) [22:52:11] congrats again [22:52:12] thanks AndyRussG I'll get that out [22:52:28] okiii thx I'm around for anything else too [22:55:30] I'm glad you all figure things out and learned some things. I'm about to sign off in a few minutes. I think everyone here is in an earlier timezone than me. I hope you all have good, chill weekends. [22:56:25] cya dstrine thx same :) [22:56:38] jgleeson: or would you like me to do the deploy? [22:57:28] it just left the servers AndyRussG! [22:57:44] have a good one dstrine [22:57:49] ahh wat it ran away, just like that? where did it go to? [22:57:55] :) [22:58:01] were our servers not good enuf for it? [22:58:40] thx much!!!!!!!!!! [22:58:49] thanks jgleeson AndyRussG !! [22:59:38] this train is shaking my desk haha [23:00:41] thanks AndyRussG cstone wfan damilare dwisehaupt for the top notch team work on all that Paypal headache [23:01:13] and ejegg|away for dialing in on his day off earlier! [23:02:01] ah thx and congrats also damilare (sorry I didn't see you were still about in backscroll.... apologies!) [23:03:55] 🎊 [23:04:37] ah congrats also wfan a [23:04:40] AndyRussG: I'm pretty sure he's asleep [23:04:46] also didn't see you were still around! :) sorry!!!!!!!!!! [23:04:51] or watching netflix [23:04:54] ha [23:04:59] as u should be jgleeson ;p [23:05:35] I will be shortly. probably watch a bit of youtube to help me along [23:06:03] midnight there ahhh [23:06:36] Good night and have a great weekend [23:14:21] have a good weekend and next week all, see you the one after! bye for now [23:42:45] cya jgleeson|away wfan :)