[01:13:10] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10tstarling) So is this resolved? Runtime was less than 24 hours, right? Either it shou... [04:33:57] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10Marostegui) [04:47:47] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10Marostegui) [04:54:00] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Marostegui) >>! In T282761#7191214, @aaron wrote: > In the long run, with active-acti... [05:23:37] 10DBA, 10CheckUser, 10Patch-Needs-Improvement: Create index for cu_agents in cu_changes table - https://phabricator.wikimedia.org/T147894 (10Aklapper) a:05Huji→03None Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May2... [05:52:45] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10Marostegui) [06:17:28] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10Marostegui) [06:36:44] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [06:41:42] 10Blocked-on-schema-change, 10DBA: Schema change for renaming page_timestamp index on revision table to rev_page_timestamp - https://phabricator.wikimedia.org/T283499 (10Marostegui) I think this can go [08:41:24] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [08:49:46] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [09:05:48] marostegui: btw, now 16M rows cleaned up (up to 2014), I'm cleaning up by year because then it makes the queries much faster ("if you want the bad index, I give you the bad index") [09:40:59] 10Blocked-on-schema-change, 10DBA: Schema change for renaming page_timestamp index on revision table to rev_page_timestamp - https://phabricator.wikimedia.org/T283499 (10Ladsgroup) YAY [10:53:45] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758 (10Nemo_bis) I had no idea this dump existed, I think, so I may be missing something. >>! In T183758#7191399, @bd808 wrote: >The only additio... [11:41:18] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758 (10Urbanecm) >>! In T183758#7193364, @Nemo_bis wrote: > I had no idea this dump existed, I think, so I may be missing something. > >>>! In T1... [12:15:58] Amir1: The total was around 40M you said, right? [12:16:33] yup [12:16:49] sweet [12:16:50] let me get the new numbers [12:17:05] so far up to 2018 is cleaned [12:17:13] oh that's fast [12:17:39] because it's split by time now, take advantage of the bad index [12:17:50] hahaha [12:19:16] 25M rows cleaned now [12:20:59] So it might be done by Monday? [12:21:15] oh it'll be mostly done by two or three hours from now [12:23:11] Once that is done, do you mind creating a task to optimize it? [12:23:16] So we can at least do eqiad before the switch back [12:24:24] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [12:25:06] Sure [12:25:49] thanks [12:26:07] just the thing is that there around fifty different wikis that would benefit from this (to a varying degree) [12:26:24] sure, I am happy to wait till all of them are done [12:30:29] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [12:40:27] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) Thanks for the ping - this needs some thought from the DB side. We have some of our misc db masters on row A - db1159 m1 A6. Affected services: bacula (... [12:41:13] 10DBA: Rename dbstore1004 to db1183 and place it on s7 - https://phabricator.wikimedia.org/T284622 (10Marostegui) p:05Medium→03High @Kormat let's give this some higher priority, as we might be able to use db1183 to replace one of the systems at T286032 [12:43:00] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) @Bstorm @nskaggs please see above - we might need to depool the affected clouddb* hosts. [12:43:59] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) dbproxy1013 is the active proxy for m2. I will depool it next week. [12:46:14] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) p:05Unbreak!→03Medium >>! In T282761#7191906, @tstarling wrote: > So is t... [12:51:41] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [13:00:29] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) Hi @Marostegui thanks for the feedback. > Will this stop traffic on all switches at the same time? Or do you plan to d... [13:05:05] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10fgiunchedi) With my observability and swift maintainer hats on, I think we're ok to tolerate a network blip, specifically: * ms-be... [13:08:30] 10DBA: Rename dbstore1004 to db1183 and place it on s7 - https://phabricator.wikimedia.org/T284622 (10Marostegui) [13:08:35] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) [13:08:55] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) >>! In T286032#7193704, @cmooney wrote: > Hi @Marostegui thanks for the feedback. > >> Will this stop traffic on al... [13:11:24] 10DBA: Move db1124 and db1125 to misc services temporarily - https://phabricator.wikimedia.org/T286042 (10Marostegui) [13:11:43] 10DBA: Move db1124 and db1125 to misc services temporarily - https://phabricator.wikimedia.org/T286042 (10Marostegui) p:05Triage→03High [13:12:26] 10DBA: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10Marostegui) [13:12:59] 10DBA: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10Marostegui) Please use this host (once reimaged) to replace db1128 in m5: T286032#7193722 [13:15:20] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) @cmooney do you know when you'll know how long this change can take? [13:18:20] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [13:19:39] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10BBlack) Traffic-related bits: * dns1001 will need a manual depool so that it doesn't have knock-on effects on all of the other clus... [13:20:34] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10ayounsi) @Marostegui "as the standby host is on row A too" that sounds like SPOF to me and should be moved to a different row. Due... [13:21:29] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) >>! In T286032#7193787, @ayounsi wrote: > @Marostegui "as the standby host is on row A too" that sounds like SPOF to me... [13:23:49] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) [13:23:56] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [13:24:02] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10Marostegui) 05Open→03Resolved This is all done [13:27:51] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) @Marostegui Our only real option to test is on new switches due to be installed under T277340. We are working with DC-Ops... [13:29:19] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) Ok, I think what we can do from our side is to get the replacement hosts ready but without failing over things to them,... [13:29:49] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10jcrespo) Speaking on behalf of: ` dbprov1001 ms-backup1001 db1116 ` That could cause ongoing backup runs to fail, but that is "norm... [13:48:30] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Current status: - pc1: 26.7% (of 33.3%) done after 12.8h => est 16h total - p... [13:49:16] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [14:52:29] 10DBA: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by kormat@cumin1001 for hosts: `dbstore1004.eqiad.wmnet` - dbstore1004.eqiad.wmnet (**PASS**) - Downtimed host on Icinga - Found physical host... [15:00:59] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10fgiunchedi) [15:03:02] 10DBA: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by kormat@cumin1001 for hosts: `dbstore1004.eqiad.wmnet` - dbstore1004.eqiad.wmnet (**FAIL**) - **Failed downtime host on Icinga (likely already re... [15:15:49] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [15:16:52] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [15:31:58] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle) >>! In T282761#7191906, @tstarling wrote: > So is this resolved? Runtime was... [16:02:37] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10cmooney) [16:10:52] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Bstorm) @Andrew Just a heads up that cloudcontrol1003 is in the list. It might be fine and will catch up, but it also could crash r... [16:12:59] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Legoktm) [16:13:41] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Bstorm) @Ottomata One of the cloudbs is clouddb1021. FYI. I understand you likely won't be using it that late in the month, but I w... [16:14:13] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Legoktm) lists1001 is a SPOF currently, we'll probably just announce a downtime when we get closer to the actual time [16:16:44] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758 (10bd808) `lang=irc [10:54] < Nemo_bis> most of it seems maps stuff (...snip...) [14:56] < bd808> Nemo_bis: it makes sense that most of i... [16:19:05] 10DBA, 10Patch-For-Review: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts: ` db1183.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202107021618... [16:35:40] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Ladsgroup) I can draft an announcement for downtime of lists.wikimedia.org, maybe we can use the time to increase its capacity (mor... [16:51:42] 10DBA, 10Patch-For-Review: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1183.eqiad.wmnet'] ` Of which those **FAILED**: ` ['db1183.eqiad.wmnet'] ` [16:53:24] 10DBA, 10Patch-For-Review: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10Kormat) Current status: - Host is renamed - Needs a partitioning scheme configured for the reimaging - Needs a role assigned, and hiera host vars set. And then these final steps need to be... [16:54:19] 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops, 10cloud-services-team (Kanban): Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10nskaggs) Impacted clouddb's will be clouddb1013, clouddb1014, clouddb1021. I believe interrupting traffic on 2 of 4 of the "web" r... [20:03:03] 10Data-Persistence-Backup, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Translate, 10Security-Team, and 4 others: Aggregategroups Action API module allows deleting translatable page metadata for any group without trace (CVE-2021-36129) - https://phabricator.wikimedia.org/T282932 (10sbassett) [20:14:29] 10Data-Persistence-Backup, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Translate, 10Security-Team, and 4 others: Aggregategroups Action API module allows deleting translatable page metadata for any group without trace (CVE-2021-36129) - https://phabricator.wikimedia.org/T282932 (10sbassett) ...