[00:26:49] things are still looking good and steady at this point. [00:45:22] AndyRussG: cstone: eileen: I made that continuation event [00:45:36] ejegg Hi! just was writing now... Still need a couple minutes here [00:45:36] thanks ejegg [00:45:37] https://meet.google.com/gue-ieob-trn [00:45:42] k, cool [02:58:43] 10Fundraising-Backlog, 10fundraising sprint Wireless Zipline: Review 2021 en6C impression rates - https://phabricator.wikimedia.org/T296803 (10AndyRussG) [03:13:13] fr-tech is anyone looking at those thank_you failmails? [03:15:14] hmmm haven't done so [03:15:27] k, i'm taking a look [03:22:14] hm, they seem to be sporadic, and not killing a whole run of the TY mail sender [03:22:45] I think we can set those no_thankyou fields back to null to make 'em try again [03:23:33] no deadlocks [03:25:42] k, updated, hopefully they go out next run [03:34:52] hmm... we may have had some geoip update issues. [03:35:22] specifically in codfw. [04:17:44] PROBLEM - check_log_messages on frav1002 is CRITICAL: CRITICAL: ipset_error 1 [=1] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frav1002&service=check_log_messages [04:22:44] RECOVERY - check_log_messages on frav1002 is OK: OK https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frav1002&service=check_log_messages [04:39:44] ok. temp workarounds in place for now. more investgation to happen tomorrow. [07:30:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1245 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [07:40:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1250 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [07:42:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1269 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [07:45:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1462 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [07:47:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1470 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [07:50:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1674 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [07:52:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1495 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [07:55:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1699 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [07:57:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1683 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [08:00:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1911 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [08:00:57] AndyRussG, dstrine, ejegg|away, jgleeson: please see icinga ^ [08:01:10] (Also any reason alerts need to repeat every few minutes) [08:02:14] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1612 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [08:03:47] RhinosF1: thanks!! [08:05:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1826 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [08:06:26] I'm inclined not to wake up fr-tech SREs, who are the ones who would probably look at that, at least initially, I think. The user-facing donation flow is fully decoupled from the db [08:07:00] AndyRussG: do you have access to downtime Icinga at for a few hours [08:07:04] At least * [08:07:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1540 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [08:07:26] * RhinosF1 doesn't see the value in them going off this much (and will happily create a task) [08:07:34] I imagine they also do have phone alerts set up for anything they'd see as critical [08:07:37] * RhinosF1 gets pinged by every critical alert [08:08:00] RhinosF1 ohhh [08:08:24] I think I don't have the ability to do that tho [08:08:32] I can see if an SRE is around [08:09:07] I'm just about to go to sleep btw ahhh [08:09:19] It's 2 am here [08:09:36] I pinged Luca as they around [08:10:00] Oki [08:10:14] PROBLEM - check_mysql on frdev1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2041 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [08:10:19] Thx! [08:12:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1698 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [08:14:32] 10fundraising-tech-ops, 10observability: check_mysql on fr* is extremely spammy - https://phabricator.wikimedia.org/T296811 (10RhinosF1) [08:14:42] https://phabricator.wikimedia.org/T296811 [08:15:16] Done for 12 hours [08:17:06] Cool RhinosF1 thx! [08:27:13] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline, and 3 others: Civi: EOY Auto Thank You Email Receipt - New content is ready for coding - https://phabricator.wikimedia.org/T290253 (10TomaszGorski) Hi there, thank you... [08:32:18] RECOVERY - check_mysql on frdb1002 is OK: Uptime: 1333738 Threads: 27 Questions: 125410227 Slow queries: 835 Opens: 418995953 Flush tables: 1 Open tables: 228 Queries per second avg: 94.029 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1762 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [08:57:31] jgleeson: are you looking at those thank you fail mails? [09:22:16] 10Wikimedia-Fundraising-CiviCRM: Thank you fail mail - https://phabricator.wikimedia.org/T296813 (10Eileenmcnaughton) [09:22:51] ok - I did some diagnosis - https://phabricator.wikimedia.org/T296813 [09:25:39] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Thank you fail mail - https://phabricator.wikimedia.org/T296813 (10Eileenmcnaughton) [09:26:31] eileen: there has been replag all morning [09:26:39] https://phabricator.wikimedia.org/T296811 [09:27:31] RhinosF1: makes sense - the server is being absolutely hammered right now [09:28:06] eileen: icinga got muted because it was so spammy. Andy didn't think it was worth waking up over. [09:29:47] yeah - it's possibly a bit borderline - since it is the first day of fund-raising banners but I think waiting for am EST seems fine - it seems to be causing few fails in our queue processing [09:30:10] Ok :) [09:30:18] but it's probably only 3-4 hours now til Jeff comes online [09:30:28] Ack [09:31:34] the impact seems to be that so far 67 people have failed to get failure emails & I think the UI data is a bit laggy in some cases - I might go onto slack & warn donor relations of that [09:32:34] although - looking at the overall queues - we seem to be less behind that we would be overall at this point in the process [10:25:13] RECOVERY - check_mysql on frdev1001 is OK: Uptime: 1340093 Threads: 14 Questions: 101237872 Slow queries: 1096427 Opens: 474184728 Flush tables: 1 Open tables: 270 Queries per second avg: 75.545 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 297 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdev1001&service=check_mysql [13:38:59] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Thank you fail mail - https://phabricator.wikimedia.org/T296813 (10jgleeson) Since the time you posted this ticket and now, some 4 hours later, the count is 82 and the failmail appears to have calmed down also so maybe the replication delays have gone away? [13:59:14] 10Fundraising-Backlog: Adyen ApplePay Issuer Unavailable - https://phabricator.wikimedia.org/T296845 (10jgleeson) [14:00:07] 10Fundraising-Backlog: Adyen ApplePay Issuer Unavailable - https://phabricator.wikimedia.org/T296845 (10jgleeson) [14:45:43] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Thank you fail mail - https://phabricator.wikimedia.org/T296813 (10Ejegg) We can also reset that no_thank_you to NULL so they get their TY emails. [16:01:09] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline, and 3 others: Civi: EOY Auto Thank You Email Receipt - New content is ready for coding - https://phabricator.wikimedia.org/T290253 (10TomaszGorski) Hi everyone, just t... [16:04:51] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Thank you fail mail - https://phabricator.wikimedia.org/T296813 (10Ejegg) Last night I reset the no_thank_you reason on 8 that had come in since Nov 30th. This morning I saw a total of 63 since Nov 30th (there were also some from last year) and I reset th... [16:11:25] 10Fundraising-Backlog: payments fr-tech-dev server should serve something when ports are not forwarded - https://phabricator.wikimedia.org/T296860 (10Ejegg) [16:12:52] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline: Question about blocked contacts in Civi and how they sync to Acoustic - https://phabricator.wikimedia.org/T293587 (10nisrael) Hi @Ei... [16:23:32] fr-tech I'm going to shuffle around some Ingenico audit files to see if we can clear out that directory a bit [16:29:30] 10Fundraising-Backlog: DAF import not importing individuals - https://phabricator.wikimedia.org/T296861 (10MDemosWMF) [16:29:59] 10Fundraising-Backlog: DAF import not importing individuals - https://phabricator.wikimedia.org/T296861 (10MDemosWMF) p:05Triage→03High [16:32:49] 10Fundraising-Backlog, 10fundraising sprint Wireless Zipline: DAF import not importing individuals - https://phabricator.wikimedia.org/T296861 (10DStrine) [16:33:11] 10Fundraising-Backlog, 10fundraising sprint Wireless Zipline: DAF import not importing individuals - https://phabricator.wikimedia.org/T296861 (10MDemosWMF) I have also adde the record logs from her original import below: {F34801929} [17:37:42] 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Wikimedia-production-error: TypeError: Cannot read properties of undefined (reading 'top') - https://phabricator.wikimedia.org/T296863 (10DStrine) [17:46:49] 10Fundraising-Backlog, 10Wikimedia-Fundraising-Banners, 10MediaWiki-extensions-CentralNotice, 10Wikimedia-production-error: TypeError: Cannot read properties of undefined (reading 'top') - https://phabricator.wikimedia.org/T296863 (10Pcoombe) a:03Pcoombe Thanks. It looks like the errors are coming from d... [18:00:12] 10Fundraising-Backlog: Production of Upsell Thank You Email in Remaining Languages - https://phabricator.wikimedia.org/T296868 (10CDenes_WMF) [18:17:29] PROBLEM - check_mysql on frdb1002 is CRITICAL: Slave IO: Yes Slave SQL: No Seconds Behind Master: (null) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [18:18:31] jgleeson|away, ejegg: lag is back ^ [18:20:43] dwisehaupt / Jeff_Green ^^^ [18:20:47] thanks RhinosF1 [18:21:17] ejegg: thanks yes. i'm aware and working to get traffic off it. [18:22:18] PROBLEM - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2984 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [18:24:58] ACKNOWLEDGEMENT - check_mysql on frdb1002 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2984 Dwisehaupt working to pull traffic off host. https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [18:42:47] read traffic is off frdb1002 now. i'll diagnose what is up with it. [18:44:23] 10Fundraising-Backlog: Auto Recurring Failure Email - Afrikaans - https://phabricator.wikimedia.org/T296871 (10CDenes_WMF) [18:52:18] 10Fundraising-Backlog: January Summary Email - New language Afrikaans - https://phabricator.wikimedia.org/T296872 (10CDenes_WMF) [19:53:47] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline, and 3 others: Civi: EOY Auto Thank You Email Receipt - New content is ready for coding - https://phabricator.wikimedia.org/T290253 (10Eileenmcnaughton) @TomaszGorski i... [20:17:40] 10Fundraising-Backlog, 10Wikimedia-Fundraising-Banners, 10MediaWiki-extensions-CentralNotice, 10Wikimedia-production-error: TypeError: Cannot read properties of undefined (reading 'top') - https://phabricator.wikimedia.org/T296863 (10Pcoombe) 05Open→03Resolved This was a reoccurence of T281547 (or at l... [20:22:18] RECOVERY - check_mysql on frdb1002 is OK: Uptime: 1376338 Threads: 18 Questions: 139355888 Slow queries: 978 Opens: 509720330 Flush tables: 1 Open tables: 200 Queries per second avg: 101.251 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 479 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1002&service=check_mysql [20:41:10] 10Fundraising-Backlog, 10fr-donorservices: Amazon donations TY email doesn’t have the donor's name - https://phabricator.wikimedia.org/T296881 (10SHust) [20:44:31] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline, and 3 others: Civi: EOY Auto Thank You Email Receipt - New content is ready for coding - https://phabricator.wikimedia.org/T290253 (10Cstone) @TomaszGorski they needed... [21:04:52] cstone: thanks for fixing that hebrew one - I just can't seem to win against rtl [21:21:50] 10Fundraising Sprint Visual C Saw, 10Fundraising-Backlog, 10fundraising sprint Universal Cereal Bus, 10fundraising sprint Wireless Zipline, and 3 others: Civi: EOY Auto Thank You Email Receipt - New content is ready for coding - https://phabricator.wikimedia.org/T290253 (10TomaszGorski) @Cstone Thank you,...