[00:49:00] 10DBA, 10Commons, 10MediaWiki-File-management, 10MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), and 4 others: Address "image" table capacity problems by storing pdf/djvu text outside file metadata - https://phabricator.wikimedia.org/T275268 (10Krinkle) @Xover it seems likely that the column overflow issue corr... [04:46:27] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) @KartikMistry the task says this needs to be done the week of 21st of June, which is already gone (a... [04:46:31] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) [05:20:25] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10KartikMistry) >>! In T284644#7179837, @Marostegui wrote: > @KartikMistry the task says this needs to be done the... [05:26:50] marostegui: Morning, an update regarding the image table mess. last week the first major patch landed in production and caused several issues which got fixed, I merged the important ones for the next step which goes live this week, I already enabled the thing in beta cluster and made sure that it works fine [05:27:10] meaning if all goes well, next week I'll flip the switch in production and start migrating [05:27:36] tell me if that conflicts with some of your plans [05:29:18] "important ones" -> "important next patches" [05:34:01] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10KartikMistry) [05:39:52] Amir1: oh nice, that's good news. we have the DC switchover this week, so please ping me next week once you are ready to start. just to double check everything is ok and we can afford that [05:40:58] yeah, it's going to be a lot of writes off the table, I'm slightly worried it'll cause replication lag [05:41:19] the default batch size is 200 files 😱 [05:43:53] Amir1: can that be changed? [05:44:00] In case we find it is too much/too little? [05:45:29] I think so, let me try [05:45:52] the total number of files responsible for the mess is around 3M [05:47:57] oh and for now I will fix pdf, which is 80% of the table, there is 10% left which is djvu mess, that's a bit more complicated [05:50:44] well that already a lot :) [05:54:19] yeah but I will focus on djvu too just a bit later [05:54:34] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) So this needs to go to extension1 + testwiki (s3) right?. As you mentioned on irc, you'd like this t... [06:28:22] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) [06:28:41] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) Self note: x1 runs RBR, so this needs to be executed directly on the primary master with replication... [06:43:15] oh and also marostegui The patch for this needs review T284888, do you know who I can bug for it? PET? [06:43:17] T284888: IndexPager::buildQueryInfo (contributions page unfiltered) LEFT JOIN ores_classification needs tuning - https://phabricator.wikimedia.org/T284888 [06:43:40] Amir1: yeah, PET is probably the best team to look for that [06:44:19] this also addresses T284419#7140933 [06:44:19] T284419: IndexPager::buildQueryInfo (contributions page unfiltered) query needs tuning - https://phabricator.wikimedia.org/T284419 [06:44:28] I came to the same conclusion as Tim independently [06:44:34] they are basically the same issue [06:44:53] I'll bug Tim and Daniel then [06:45:31] <3 [08:10:19] 10Data-Persistence-Backup, 10database-backups, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['db1171.eqiad.wmnet'] ` The log c... [08:32:59] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) [08:33:55] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) Cleanup of s5 old backup sources should be done, and if everything went ok, instances should be no longer on icinga, tendril, grafana, but please double check! [08:34:49] 10Data-Persistence-Backup, 10database-backups, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1171.eqiad.wmnet'] ` and were **ALL** successful. [08:36:54] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [08:36:56] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [08:36:58] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [08:37:31] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) 05Open→03Resolved Thanks! All good (also checked zarcillo) [08:44:18] 10Data-Persistence-Backup, 10database-backups, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [09:27:10] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10jcrespo) [09:27:12] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10jcrespo) [09:27:16] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10jcrespo) [09:27:36] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10jcrespo) 05Stalled→03Open [09:28:12] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10jcrespo) [09:28:14] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10jcrespo) [09:28:16] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10jcrespo) [09:28:38] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10jcrespo) 05Open→03Resolved a:05jcrespo→03Kormat There should be no more s3 servers with 10.1 available, that I am aware, but feel free to double check. [09:49:07] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) The last purge run took 113h (4d 17h). The current purge is currently at 2.3%... [12:01:40] 10Data-Persistence-Backup, 10database-backups, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [12:02:31] 10Data-Persistence-Backup, 10database-backups, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) db1171:s3 removed, db1171:s2 moved to db1139, db1171 setup now with s7 and s8 in buster after reimage. [13:28:56] marostegui: https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?viewPanel=39&from=now-12h&orgId=1&to=now&var-datasource=codfw%20prometheus%2Fops&var-cluster=appserver&var-method=GET&var-code=200 [13:29:02] This looks really bad [13:29:13] Amir1: OMG!!!!!!! [13:29:14] :) [13:29:17] It is me warming up codfw [13:29:25] ugh [13:29:33] you scared the *cough* out of me [13:29:41] Amir1: my apologies for marostegui. he's very rude to people sometimes. [13:29:45] marostegui: <£ [13:29:50] er, well. <3 [13:29:50] haha [13:30:19] Amir1: I started a bit on friday and not I am going full warming up mode [13:31:02] haha, don't worry kormat. I'll just make several more schema change tickets to take my revenge [13:31:18] That's awesome cause I will assign them to kormat! [13:31:23] 🥀 [13:31:27] Perfect loop [13:31:30] marostegui: As long as it's not the site going down, I don't mind [13:33:05] hahaha I like your preferences [20:23:56] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1129 - https://phabricator.wikimedia.org/T285715 (10Marostegui) p:05Triage→03Medium Can we get a rma for this failed disk? Thanks [20:35:16] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1129 - https://phabricator.wikimedia.org/T285715 (10wiki_willy) a:03Cmjohnson [20:37:37] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1129 - https://phabricator.wikimedia.org/T285715 (10wiki_willy) Hi @Cmjohnson - just a heads up, there's only a couple more months before the warranty expires on this host. Thanks, Willy [21:38:26] 10DBA, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Legoktm) Unfortunately, my patch to just ignore x2 didn't really work. spicerack gets the list of core_dbs by querying `A:core-db and A:db-role-mast... [21:54:49] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle) >>! In T282761#7180211, @Kormat wrote: > The last purge run took 113h (4d 17... [22:05:55] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle) [22:09:11] 10DBA, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Legoktm) I live hacked this onto cumin1001 for now: ` diff --git a/spicerack/mysql_legacy.py b/spicerack/mysql_legacy.py index a69cc74..be423e9 100... [22:19:07] 10DBA, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Krinkle) >>! In T285519#7178376, @Legoktm wrote: > Ack, thanks for all the input. For next week we'll just ignore x2, it'll stay RW in both DCs thro... [22:57:48] so I basically hacked spicerack to add "not A:db-section-x2" to its selector for core dbs [22:58:45] Krinkle and I discussed why x2 is a "core db" in -operations, I snipped out the relevant lines to https://phabricator.wikimedia.org/P16734 [22:59:16] I think the hack I did is good enough for the switchover tomorrow, but we'll need something else in the long run [23:00:13] given 1) spicerack expects to be aware of *all* databases that are A:db-core 2) x2 is a core db, 3) x2 is special and not like the other core dbs [23:00:18] so...one of those needs to change