[01:09:27] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:10:19] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 15.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:11:03] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:11:57] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:36:59] I am going to start disconnecting codfw -> eqiad [06:47:20] Replication codfw -> eqiad disabled and GTID enabled [06:47:25] Maintenance can be resumed [08:48:43] jynus: let me know when I can stop db1217:3323 (m3 backup source) for like 1h [08:54:31] Amir1: can you start taking care of the db* at https://phabricator.wikimedia.org/T335838 with the automated reboots? [09:01:24] marostegui: sure. Let me make coffee and afterwards I get to it [09:01:31] sure! [09:26:26] marostegui: any time [09:26:32] excellent thanks [09:32:17] jynus: with the reboots, it would automatically might try to reboot backup sources, it won't do it if a backup is running so is it fine if I move forward with it in s6 for now? [09:35:14] Amir1: your script check if the host is already rebooted right? I just rebooted db2124 in s6 [09:35:19] As I needed to migrate it anyways [09:35:31] marostegui: yeah yeah, it checks the kernel version [09:35:38] sweet [09:37:12] marostegui: in case I'll be ooo and you need to do it, the code for it is in cumin /home/ladsgroup/software2/dbtools/auto_schema/rolling_restart.py [09:37:30] (for future) [09:37:31] good, ideally it should be a cookbook at some point :) [09:37:38] yeah. I know [09:43:02] I accidentally solved a fifteen year old ticket https://phabricator.wikimedia.org/T17218#8819711 [09:43:23] xDDDDD [09:45:40] marostegui: I think we should split dbs out of this to its ticket: https://phabricator.wikimedia.org/T335838 [09:45:54] I want to use https://phabricator.wikimedia.org/P33282 [09:46:35] Amir1: sure, go for it :) [09:46:42] Awesome [09:46:53] I just updated one, so please refresh :) [09:47:07] Amir1: once done remove the DBA tag please [09:51:39] Done and done [09:51:51] thanks! [09:55:21] please wait until tonight to reboot es1022, es2022, es1025 and es2025, they are about to finish their backups [09:55:33] yeah, from my sid enot planning to reboot esX yet [09:55:49] yup, for now, core hosts only [11:30:24] Amir1: when decommisioning hosts remember to remove the host from zarcillo (tables: servers, instances, section_instances) [11:30:44] yeah on it [11:30:45] https://wikitech.wikimedia.org/wiki/MariaDB/Decommissioning_a_DB_Host [11:30:52] following this, great doc [11:31:22] Ah cool I forgot it was there [11:34:15] marostegui: in the example patch, there is changes to dhcpd: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638352/2/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200 I can't find that for db1111 but the dns is in netbox repo according to codesearch https://codesearch.wmcloud.org/search/?q=db1111&files=&excludeFiles=&repos= [11:34:41] yeah, that is gone [11:34:44] let me update the doc [11:36:08] thanks [11:36:22] done [11:38:40] Awesome. Thanks [11:56:36] Also made this https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2FDecommissioning_a_DB_Host&diff=2073173&oldid=2073156 as now we have two DCOps in codfw as well [12:12:57] marostegui: to double check, zarcillo is now db1215? [12:13:05] no, still db1115 [12:13:12] I am planning to switch it next week [12:13:24] okay [13:19:34] I am stressing db2139 to see if the memory errors happens again [13:22:13] Amir1: I wonder if I can handle s4 backup sources restart or coordinate to do all media backup reboots at the same time [13:22:30] can you script detect already rebooted hosts? [13:22:43] jynus: yeah it does detect them [13:22:58] It's automatic so if you're doing it manually, you can leave it as is :P [13:23:41] ok, so I will do those first so we reboot all mediabackup - related hosts in a relatively small window between today and tomorrow [13:24:19] no big deal if done at another time, but that way I minimize downtime [13:25:10] sure [14:14:33] In testing superset as a possible replacement for quarry I'm running into an issue when I add a large number of databases, after about the 90th db searches will silently not run. Before I go too much further I would like to verify that I understand our DB layout correctly. Is https://quarry.wmcloud.org/query/73378 an accurate picture of our replicas? And I would normally add over 900 datasources to superset? [14:29:26] Rook: not sure many people here will have a lot of experience with superset [14:29:53] but maybe Re:production db structure, this can help: https://replag.toolforge.org/ [14:30:07] I apologize I don't mean for superset to be relevant to the question [14:30:34] also: https://wikitech.wikimedia.org/wiki/MariaDB#Sections_and_shards [14:31:48] more here: https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Service_architecture_by_layers [14:32:58] "In addition each cluster can be accessed by the name of its Wikimedia production shard which follows the format s${SHARD_NUMBER}.{analytics,web}.db.svc.wikimedia.cloud (for example, s1.analytics.db.svc.wikimedia.cloud hosts the enwiki_p database). ": https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database [14:33:52] So we would expect to add 967 dbs to any tool that we were to use to explore data? [14:34:58] so I don't have the context, but I would add the 8 sections only [14:37:35] This shows the section distribution: https://quarry.wmcloud.org/query/73530 [14:37:47] So if I wanted to connect to wikidatawiki.analytics.db.svc.wikimedia.cloud I could do that through one of the s[1-8] shards that would contain it and other databases, perhaps all listed as schemas? [14:40:20] so that's the part I don't have a context to suggest something concrete. But if the question is: "Can I access all databases using section names" The answer is yes- you could connect to s8.analytics.db.svc.wikimedia.cloud and then "use wikidatawiki_p" [14:41:48] That sounds hopeful. I'll see what I can learn from the above an implement. Thank you for the direction :) [14:43:30] sorry I cannot help more, I don't have context or know much about superset [14:46:17] in summary- sections are its physical distribution, although generally users (end users or tool developers) shouldn't know or be aware of them [14:46:30] *shouldn't need to [14:46:35] Rook: these might be useful https://orchestrator.wikimedia.org/ and https://noc.wikimedia.org/db.php [14:47:59] Generally people querying quarry only need one wiki. Sometimes a couple (which is now broken) so loading all of wikis in every query might not be feasible [16:08:17] apparently db2184 wasn't using gtid and now has a duplicate key issue [16:13:33] I was going to reload the data anyway, but do you know if there is a way to check replicas for this issue? [16:25:40] I think I have some loops somewhere, but I'm not next to my laptop now [16:25:46] I can check tomorrow [16:29:04] if it's urgent, I can take care of it but I'm in a meeting right now [16:31:34] Hello. I'm trying to see which user_property is TranslationNotifications using for https://github.com/wikimedia/mediawiki-extensions-TranslationNotifications/blob/master/includes/SpecialTranslatorSignup.php#L223-L232 [16:32:02] they are not exposed in the replicas, and https://www.mediawiki.org/wiki/Manual:$wgDefaultUserOptions does not mention them [16:35:42] hauskater: I can check tomorrow if that works, I'm not next to my laptop [16:36:02] and hola btw [16:36:28] marostegui: sure, unless anyone else is able to see if these are indeed in the user_properties table :) [16:36:48] sure :) [16:42:04] hauskater: anyone with deployment rights should be able to answer you [16:50:43] sorry I was in a meeting, hauskater give me a bit [16:50:59] thanks Amir1 [16:53:23] hauskater: https://phabricator.wikimedia.org/P47439 would this answer your question? [16:53:30] let me see [16:54:29] Amir1: yep, any idea if these are grouped at https://github.com/wikimedia/mediawiki-extensions-TranslationNotifications/blob/master/includes/SpecialTranslatorSignup.php#L54 or it's a different config? [16:55:11] I have no clue, you need to ask someone who is familiar with the extension. [16:55:32] I guess language team? https://github.com/wikimedia/mediawiki-extensions-TranslationNotifications/graphs/contributors [16:55:49] #wikimedia-language I think [16:55:50] thanks Amir1 - I was thinking on asking for some of those fields to be exposed in the replicas [16:56:58] this helps think about it :) [16:58:50] yw. Let me know if I can help more [17:03:12] :)