[04:42:18] I am going to start disconnecting eqiad -> codfw replication in s1-s8 [04:42:28] Orchestrator will show lag until I also clean up heartbeats [05:42:18] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [05:42:21] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) 05Open→03Resolved a:03Marostegui This is all done and all hosts are now in Orchestrator [05:42:28] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [05:56:51] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [05:58:37] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [06:07:00] 10DBA, 10Data-Services: Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T284390 (10Marostegui) a:03Kormat [06:07:22] 10DBA, 10Data-Services: Prepare and check storage layer for shiwiki - https://phabricator.wikimedia.org/T284928 (10Marostegui) a:03Kormat [06:07:47] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10Marostegui) a:03Kormat [06:07:56] kormat: ^ good morning to you! :p (let me know if you cannot take care of it, and I will do it next week) [06:23:57] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [06:33:43] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [06:41:18] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [06:53:46] Amir1: I am going to optimize ruwiki.logging dewiki.logging [06:53:51] on an eqiad host [07:50:13] Amir1: https://phabricator.wikimedia.org/P16751 [08:17:03] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) @KartikMistry as we discussed via IRC a few days ago, I have added the column and the index in `test... [08:17:12] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) [08:21:18] 10Blocked-on-schema-change, 10DBA, 10ContentTranslation, 10Language-Team (Language-2021-April-June): Update cx-notification-log table in Production - https://phabricator.wikimedia.org/T284644 (10Marostegui) a:03Marostegui [08:35:38] marostegui: 👍 [08:50:05] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [08:56:35] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Current status of the purge jobs after 14h: - pc1: 32% done - pc2: 31% done -... [09:17:13] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [09:21:06] 10DBA, 10Data-Services: Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T284390 (10Kormat) Sanitization is in place, running full private data check now. [09:21:16] 10DBA, 10Data-Services: Prepare and check storage layer for shiwiki - https://phabricator.wikimedia.org/T284928 (10Kormat) Sanitization is in place, running full private data check now. [09:21:19] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10Kormat) Sanitization is in place, running full private data check now. [09:22:55] marostegui: hurm. how long should `check_private_data.py` take on s5? [09:23:14] kormat: it can take a while, it has dewiki and cebwiki [09:23:17] which are big [09:23:44] hmmm. i'm asking because it completed in only a few minutes on both sanitarium hosts [09:23:52] 1m16s for a run on db1154 [09:24:28] that fits with how long the last nightly run took, too [09:24:52] so you are saying it took too long or too little? [09:25:01] it seems suspiciously quick [09:25:45] no, it used to take a lot longer on labsdb* cause they had lots of load, right now as we've single instances, it is a lot quicker [09:26:03] s5 has like 20 wikis or something only? [09:26:09] yeah [09:26:11] and most of them are probably empty (as they are new) [09:26:23] note that i'm running this on db1154/db2094, rather than the clouddb* hosts [09:26:30] yeah [09:26:32] i'll try it on a clouddb host too, just to be sure [09:26:43] yeah, I normally run it on clouddb hosts too, better be safe [09:28:30] marostegui: my notes say that for 10.1 we needed to create the grants for the views on clouddb; can you confirm that is no longer needed for 10.4? [09:28:43] kormat: you still need to do that [09:28:49] 🤬 [09:28:59] and the "_p" database [09:30:04] updated my notes, sadly. [09:36:56] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T284390 (10Kormat) a:05Kormat→03None Private data check was clean. `banwikisource_p` and grants created on wikireplicas. Ready for #cloud-services-team to create t... [09:37:11] marostegui: Thanks. I will start the clean up then [09:38:06] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for shiwiki - https://phabricator.wikimedia.org/T284928 (10Kormat) a:05Kormat→03None Private data check was clean. `shiwiki_p` and grants created on wikireplicas. Ready for #cloud-services-team to create the views. [09:39:02] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10Kormat) a:05Kormat→03None Private data check was clean. `dagwiki_p` and grants created on wikireplicas. Ready for #cloud-services-team to create the views. [09:42:20] marostegui: having everything in orchestrator is 👌 [09:42:26] \o/ [10:25:36] marostegui: let me know if things in s6 look weird [10:26:33] Amir1: wilco [10:27:19] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s8 eqiad master is done ` root@db1104.eqiad.wmnet[wikidatawiki]> set session sql_log_bin=0; Query OK, 0 rows affected (0.000 sec... [10:28:04] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [10:47:41] maxlog went up, I reduced the batch size and increased the sleep time [10:49:54] Yeah, looks like db2089:3316 had some lag [10:50:38] The graph doesn't show it, but MW complained a bit [10:51:12] https://logstash.wikimedia.org/goto/7a233d1e412f519332349ede33733795 [10:56:30] :((( [10:56:41] Should I reduce the batch size more? [10:57:05] yeah it looks like it [10:59:01] yeah [10:59:15] let's reduce it a bit, especially if this will be running during the weekend [11:01:24] now it deleted 100 rows in each batch (from 1000) and waits ten seconds in between [11:01:43] That's going to take some time then XD [11:01:47] But yeah, let's see how that goes [11:04:31] I reworked the query, hope it reads less rows now [11:05:35] yup, I made the query more explicit [11:05:41] now it's pretty decent [11:15:31] marostegui: so to recap, I reworked the query and now it's 1000 rows/10s sleep, no mw complain or bad maxlag has been seen so far [11:16:08] Oh sweet [11:32:27] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) [11:32:36] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [11:32:38] 10Blocked-on-schema-change, 10DBA: Extend iwlinks.iwl_prefix to VARBINARY(32) - https://phabricator.wikimedia.org/T277123 (10Marostegui) 05Open→03Resolved All done [11:44:54] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) And indeed: ` Jul 1 09:44:18 mwmaint2002 mediawiki_job_purge_parsercache_pc1... [12:47:25] marostegui: soooo, rows read again went up, I checked and it seems it picks up the wrong index. Is it okay if I run ANALYZE TABLE logging; on master and replicas? [12:47:49] Amir1: let's do that on eqiad hosts [12:48:03] and then check [12:48:08] do you have the query somewhere? [12:48:17] wait a sec [12:49:13] `delete from logging where log_type = 'review' and log_action = 'approve-a' limit 1000;` [12:49:35] * kormat blindly runs that everywhere [12:50:18] kormat: ignore the where part when running :D [12:50:50] but of course. otherwise it wouldn't be everywhere, i guess [12:50:54] it should pick up `log_type_action` index but it picks up `type_time` [12:52:08] Amir1: checking the plan before and after the anazlye [12:53:34] Amir1: Will get back to you once the analyze finishes, give it 15-20 minutes [12:53:46] Thanks. Sorry for the mess. [12:56:01] I can force the index in running select but I couldn't find a way to force it in delete statements, it seems it's not possible [12:57:45] yeah, index hints are only valid for selects [12:58:03] let's see what changes if anything after the analyze [12:58:44] is there a way for you to delete based on PK? that should speed things up a lot [13:01:42] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) 05Stalled→03Open [13:01:44] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [13:01:46] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) [13:01:48] 10DBA: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [13:03:48] I tried it but then I need to write a mw maintenance script from scratch, get it reviewed, merged and backported. [13:05:41] nah, after the analyze it keeps picking log_type_time [13:05:55] oh wait, that's the good one [13:06:21] let me see what's the behaviour in codfw (I was testing an eqiad host) [13:06:32] ah no that's the bad index [13:07:09] I can check the optimize tracer, but we won't be able to change anything [13:07:44] another thing I can try is to rebuild the table and see what happens, but if we have to rebuild the table on each codfw host...that's going to be a bit of a pain (and we won't be able to rebuild it on the master anyways) [13:29:42] yeah.. let's not do that right now [13:29:46] I will find a way [13:39:03] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [13:53:38] Amir1: https://www.youtube.com/watch?v=dMjQ3hA9mEA [14:19:52] kormat: I like what this movie is about, really similar to our infrastructure [14:20:41] lolsob [14:21:49] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [14:22:07] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [14:22:09] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) 05Stalled→03Open [14:22:34] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [14:22:36] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) 05Stalled→03Open [14:27:42] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [14:36:33] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [15:56:25] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1129 - https://phabricator.wikimedia.org/T285715 (10Cmjohnson) Ticket opened with Dell [15:57:12] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [16:08:23] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [16:08:38] 10DBA: DB maintenance work to do while eqiad is passive (June 2021) - https://phabricator.wikimedia.org/T285139 (10Marostegui) [16:08:43] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) 05Open→03Resolved All done [17:17:07] marostegui: btw that didn't work, now I did something else, if it picks up the timestamp index, I give it a timestamp condition, first running it on log_timestamp like '2013%' and so on [18:59:13] 10DBA, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Legoktm) Most of the spicerack confusion and trouble is that x2 matches `A:db-core` even though it's more like parsercache. If it didn't match that... [20:33:03] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10aaron) In the long run, with active-active multi-DC, the parsercache DELETEs should n... [22:17:41] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758 (10bd808) >>! In T183758#7098705, @Urbanecm wrote: > Hello, > > just a friendly reminder, I noticed this is still in the scratch volume. Do w... [22:23:36] 10Data-Persistence-Backup, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Translate, 10Security-Team, and 3 others: Aggregategroups Action API module allows deleting translatable page metadata for any group without trace - https://phabricator.wikimedia.org/T282932 (10sbassett) [22:28:59] 10DBA, 10DiscussionTools, 10Performance-Team, 10Editing-team (Tracking): Post-deployment: evaluate impact on site performance - https://phabricator.wikimedia.org/T280606 (10Krinkle) [22:29:06] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), and 2 others: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle)