[04:44:26] 10DBA, 10Toolhub, 10User-bd808: Discuss database needs with the DBA team - https://phabricator.wikimedia.org/T271480 (10Marostegui) >>! In T271480#7225145, @bd808 wrote: > * toolhub: user with CRUD rights on all tables in the schema. Access from the kubernetes cluster. Do you have some IPs or a range for th... [05:11:37] 10DBA, 10Toolhub, 10User-bd808: Discuss database needs with the DBA team - https://phabricator.wikimedia.org/T271480 (10Marostegui) @bd808 one more question I thought I asked before, but I didn't (sorry!), what would be the impact if the section goes read-only? (ie: maintenance) [06:54:25] image pdf cleanup is now 75% done, I think it'll be done before next week [06:55:06] Ooooh!!! [06:55:13] Can't wait to optimize it on eqiad [07:11:56] 10DBA, 10serviceops, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10dcaro) Ack, I'll be there :+1: [07:38:01] 10DBA, 10serviceops, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10fgiunchedi) Excellent, thank you @Andrew and @dcaro ! We're on for **Tues Aug 3rd at 9 UTC** [07:56:45] marostegui: oh btw, I'll be asking you to drop two indexes from flaggedrevs table next week. The patch stopping reading from it has been merged but not deployed yet. [08:02:35] Amir1: do you have the name of it? I can double check if it is being used [08:05:52] yeah but it's still being used hopefully will be stopped being read later this week [08:09:26] sure :) [08:58:24] 10DBA, 10SRE, 10ops-eqiad: Broken RAM on db1127 - https://phabricator.wikimedia.org/T286763 (10Marostegui) p:05Triage→03Medium [09:30:32] Amir1: do you have the name of it so I can add it to my already increasing list of .37 work [09:31:43] RhinosF1: indexes using fr_quality https://phabricator.wikimedia.org/T277883#7250494 [09:36:21] Added [10:51:56] 10DBA: Move db1124 and db1125 back to test-cluster section - https://phabricator.wikimedia.org/T286329 (10Marostegui) [10:52:11] 10DBA: Move db1124 and db1125 to misc services temporarily - https://phabricator.wikimedia.org/T286042 (10Marostegui) [11:11:40] 10DBA: Failover m1 master (db1159) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [11:11:52] 10DBA: Failover m1 master (db1159) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) p:05Triage→03Medium [11:13:16] 10DBA: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [11:13:54] 10DBA: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [11:15:57] 10DBA: Rename dbstore1004 to db1183 and place it on m5 - https://phabricator.wikimedia.org/T284622 (10Marostegui) 05Open→03Resolved We are going to take db1183 as a floating host to upgrade mX hosts (first m2: T287852 ), so eventually db1128 (m5 master) will go to s7. Closing this as fixed [11:16:01] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) [11:16:02] 10DBA, 10Infrastructure-Foundations, 10SRE, 10Traffic, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Marostegui) [11:16:11] 10DBA: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [11:19:18] jynus: dumps days from db1117 is tuesday right? [11:19:51] 2021-07-27 [11:20:10] yeah [11:20:12] yeah, I mean next one [11:20:15] ah [11:20:16] it wil lbe tomorrow? [11:20:24] I need to stop db1117:3322 [11:20:28] tonight [11:20:29] for around 1h or so [11:20:33] ah cool, I have time then [11:20:51] I think it starts at 0h UTC [11:21:51] 10DBA, 10Infrastructure-Foundations, 10SRE, 10Traffic, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Majavah) [11:21:56] cool, thanks [11:22:09] 10DBA, 10Infrastructure-Foundations, 10SRE, 10Traffic, and 2 others: Switch buffer re-partition - Eqiad Row A - https://phabricator.wikimedia.org/T286032 (10Majavah) [11:22:44] in any case, backups are never a blocker- if it fails, it fails, no biggie [11:23:00] :) [11:23:32] 10DBA, 10Analytics, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Majavah) [11:23:34] it will be detected and I can retry when it comes backup [11:27:37] I am going for lunch, if you have a moment please check the patch I sent your way for formal ok/sanity check. I will be reimaging later one backup source (although maybe not today) [11:28:09] checking [11:28:22] no rush, will be out for some time [11:29:22] he: !log restarting gerrit primary server on gerrit1001 [13:02:06] 10DBA: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [13:02:19] 10DBA: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [13:21:32] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [13:21:42] ^ planning to do that failover on thursday at 08:00 AM UTC [13:25:33] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10MoritzMuehlenhoff) >>! In T287852#7251796, @Marostegui wrote: > @MoritzMuehlenhoff @dpifke @Krinkle @bd808 @hn... [13:29:46] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [13:30:23] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [13:30:54] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [14:43:25] 10DBA, 10Toolhub, 10User-bd808: Discuss database needs with the DBA team - https://phabricator.wikimedia.org/T271480 (10bd808) >>! In T271480#7251095, @Marostegui wrote: >>>! In T271480#7225145, @bd808 wrote: >> * toolhub: user with CRUD rights on all tables in the schema. Access from the kubernetes cluster.... [14:51:23] 10DBA, 10Toolhub, 10User-bd808: Discuss database needs with the DBA team - https://phabricator.wikimedia.org/T271480 (10bd808) >>! In T271480#7251102, @Marostegui wrote: > @bd808 one more question I thought I asked before, but I didn't (sorry!), what would be the impact if the section goes read-only? (ie: ma... [14:58:39] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10dpifke) No objection for xhgui. [15:13:35] 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, 10SRE, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10bd808) iegreview and scholarships should handle the maintenance without major issue. [15:44:17] !log remove s2 from db1139 T287230 [15:44:18] jynus: Not expecting to hear !log here [15:44:18] T287230: Upgrade s2 to Buster + MariaDB 10.4 - https://phabricator.wikimedia.org/T287230 [16:56:41] I am reading this wiki page and it says nothing specific about orchestrator https://wikitech.wikimedia.org/wiki/MariaDB/Decommissioning_a_DB_Host#Remove_host_from_tendril_and_zarcillo does it need some actionable when removing an instance? [17:09:55] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10phuedx) [17:10:55] Cross-posting from -operations: o/ Who's the best person to talk to about increasing $wgMaxUserDBWriteDuration for a specific wiki. Context: https://phabricator.wikimedia.org/T287859. tl;dr is that creating a SecurePoll poll on votewiki is triggering a DBTransactionSizeError AND the Board Election starts in ~1 day [17:19:42] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Marostegui) There's not much we can do here as #DBA unfortunately as this is more MW - n... [17:28:25] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10phuedx) @aaron: @Reedy mentioned that you might be able to help out with this. [17:33:45] marostegui: what wouldn't be generous? [17:34:06] Is there like phuedx suggested a reasonable time that the config could be changed too? [17:34:47] RhinosF1: Not sure what you mean, but having a 3 seconds timeout for writes is already quite generous in my mind [17:34:55] Oh [17:34:59] Other way round [17:35:02] :) [17:52:41] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10LSobanski) (reposting my IRC comment so it doesn't get lost) There is the immediate pro... [18:02:00] jynus: good question, I'll take a note and check that [18:07:25] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 3 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10phuedx) [18:09:52] marostegui, RhinosF1: In case you missed the update in -operations, ^ the above task (DBTransactionSizeError) is no longer a UBN! as the election has been delayed by two weeks to give us a good amount of time to make the correct fixes [18:10:06] Thank you for your time and attention [18:10:14] Ok phuedx [18:10:36] phuedx: great! thanks for the heads up [18:15:12] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 3 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10phuedx) p:05Unbreak!→03High From #wikimedia-operations in IRC: ` 19:03:08 10DBA, 10ops-eqiad: Degraded RAID on db1175 - https://phabricator.wikimedia.org/T287137 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson Ticket created with Dell Create Dispatch: Service Tag: DYV8773 [18:29:12] 10DBA, 10SRE, 10ops-eqiad: Broken RAM on db1127 - https://phabricator.wikimedia.org/T286763 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson Dispatch created with Dell, You have successfully submitted request SR1066677487. [18:45:02] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: db1170 mysql process crashed - https://phabricator.wikimedia.org/T286888 (10Cmjohnson) a:05Jclark-ctr→03Cmjohnson A dell ticket for a new DIMM has been submitted. You have successfully submitted request SR1066678833. [18:45:40] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Krinkle) (Untagging library as this isn't a bug with the rdbms library itself. It is rep... [19:02:58] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10phuedx) Looking at the most recent error that I can find, there seem to be a handful of... [19:18:13] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Tnegrin) Hi everybody -- I wanted to clarify this comment: >>! In T287859#7252963, @phu... [20:32:02] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Niharika) [20:58:38] PROBLEM - MariaDB sustained replica lag on db1163 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [21:07:40] RECOVERY - MariaDB sustained replica lag on db1163 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [21:31:15] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Reedy) >>! In T287859#7253352, @Tnegrin wrote: > Hi everybody -- I wanted to clarify thi... [21:34:52] PROBLEM - MariaDB sustained replica lag on db1163 is CRITICAL: 4.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [21:47:34] RECOVERY - MariaDB sustained replica lag on db1163 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [22:04:26] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Reedy) https://doc.wikimedia.org/cover-extensions/SecurePoll/ is now live {F34574120 si... [22:05:41] PROBLEM - MariaDB sustained replica lag on db1163 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [22:09:18] RECOVERY - MariaDB sustained replica lag on db1163 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [22:21:59] 10DBA, 10MediaWiki-extensions-SecurePoll, 10Performance-Team, 10Platform Engineering, and 2 others: Creating an election with "all wikis" can give a DBTransactionSizeError - https://phabricator.wikimedia.org/T287859 (10Legoktm) `$wgMaxUserDBWriteDuration` only affects web requests IIRC, a maintenance scrip... [22:40:06] PROBLEM - MariaDB sustained replica lag on db1163 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [22:47:24] RECOVERY - MariaDB sustained replica lag on db1163 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104