[06:56:53] Hi, I have pooled db2136 (s4) with weight 1 (the min) to collect some queries - it is running 10.11 [06:57:05] if you see something wrong please depool it and ping me [06:57:14] I won't let this host be pooled outside of working hours [07:01:05] es5 backup completed at 0 hours, but es4 is still ongoing [07:02:49] at 0 hours? [07:02:54] ah you meant midnight? [07:03:10] 2024-06-26T00:23:41+00:00 [07:03:21] It took 15h 37m [07:04:53] which is still less than the 38h 34m + 6h 19m that would have taken pooled [07:05:17] 45 hours [07:07:53] but the alert of long running backups will complain this week (and probably next week) [07:09:18] I'm not liking mydumper performance wise because it doesn't use concurrency smartly [07:12:03] compare: https://grafana.wikimedia.org/goto/-iD2h5QIg?orgId=1 [07:12:45] with https://grafana.wikimedia.org/goto/K6xo25QIR?orgId=1 [07:13:53] asuming it will finish soon and they are repooled, when would you want to schedule the same for eqiad, marostegui? [07:14:04] jynus: yeah, let's do that [07:14:39] the difference is crazy to be honest [07:14:44] between a pooled and not pooled host [07:14:56] ha ha ha "Me: when?" -> "You: Yes" [07:15:04] asking because of the switch maintenance [07:15:09] sorry, I didn't read the when [07:15:13] just the "would you want" XD [07:15:26] arnaudb: when is the maintenacne happening for the eqiad hosts? [07:15:32] For the es that is [07:15:32] we can try to sqeeze them today or maybe friday [07:15:37] thursday [07:16:02] jynus: I am happy either way, today or friday [07:16:21] I am a bit unconfortable with the ticking clock for thursday [07:16:21] Probably better today so we don't leave them depooled for the weekend [07:16:27] exactly [07:16:32] yep [07:16:34] that's the other bad options [07:16:49] next maintenance comes on tuesday, so that leaves plenty of time [07:17:18] We can also do it after the switch maintnance [07:17:26] we cannot [07:17:35] or at least I won't be able to [07:17:45] the finish after I leave [07:17:47] Then why don't we try now? [07:17:55] And if it doesn't finish on time, we go for friday? [07:18:06] I mean, we can always kill an ongoing dump [07:18:17] that's true, no hard commitment [07:18:21] even if it takes 24h it will be before the switch [07:18:23] if it goes long, we abort [07:18:37] yeah, that's the thing it should take 14h-ish [07:18:44] then I think we are pretty okay [07:18:45] 👌 [07:18:52] but as we have seen, one may take longer [07:19:01] so then let's do that [07:19:21] let me repool the one that is finished on codfw [07:19:25] ok [07:19:30] arnaudb: when are you planning to switch es6 master? [07:19:36] and start the eqiad ones asap, monitor the status this time tomorrow [07:19:39] tomorrow morning during maintenance window [07:19:50] shouldn't affect es4/es5 [07:19:54] arnaudb: ok, I am out tomorrow but i can try to be around for that one [07:19:59] arnaudb: so let's be on time please [07:20:11] my alarm clock is already set x) [07:20:17] ok good [07:21:43] The plan is to depool es1022.eqiad.wmnet and es1025.eqiad.wmnet on my side, es4 and es5 for hopefully 24 hours [07:22:05] ^ arnaudb one last green light on that ? [07:22:24] in a few minutes [07:22:48] if it goes for long, we kill the process tomorrow morning and repool [07:22:54] sure jynus no problem with this [07:23:32] I will first repool es2025 [07:23:52] ack [07:24:43] jynus: I am confused with: shouldn't affect es4/es5 [07:24:55] But you are indeed dumping es4 and es5, aren't you? [07:25:21] yeah, I mean that maintenance on es5/es6 [07:25:28] ah ok yeah [07:25:32] won't affect my dumps on es4/es5 [07:25:35] sorry for the confusion [07:25:50] no problem just making sure we are all aligned [07:25:53] *es6/es7 [07:26:00] ^ not touching those myself [07:26:06] yeah, all clear now [07:26:07] only dumping the old, read only sections [07:26:21] yeah, perfect, that is why I wanted to express my intentions [07:26:30] to make sure everybody understood correctly! [07:26:35] :) [07:27:10] it doesn't help the multiple codes and server names :-D [07:28:22] https://phabricator.wikimedia.org/P65448 [07:31:27] great! [07:31:29] also by getting rid of this we will avoid the issues with es & dumps on day 1 [07:31:39] 1 of the month [07:32:05] repool looking great, load at this time is low and host was warmed up (wasn't restarted) [07:40:55] Hi! It would be great if someone could take care of this task: https://phabricator.wikimedia.org/T368066#9912866 [07:41:45] The meta_p.wiki table does not contain an entry for btmwiki and that can confuse tools which use to find the shard it is hosted on for UNION queries. [07:42:42] CountCount2: checking [08:00:53] marostegui: arnaudb: one thing I noticed is that the original weight distribution of eqiad and codfw was different- may need adjusting after backups complete [08:01:10] My guess is you will want to leave the read only ones with the same weight on all hosts [08:01:29] jynus: they aren't? [08:01:33] they should be [08:01:37] (this is unrelated to my backups, just something I noticed as different) [08:01:55] codfw wasn't, please have a look [08:02:20] jynus: I see [08:02:23] I will get that fixed now [08:02:26] thanks for letting me now [08:02:27] know [08:02:37] not a a hurry but comminicating now that I realized [08:03:09] jynus: which ones are you repooling? [08:03:12] so I don't touch those [08:03:14] please note es2022 has not finished yet its backups (but it should soon) [08:03:20] I repooled es2025 [08:03:30] ok I will take es2024 and es2021 [08:04:52] I am going away soon for the doctor but dumps can be monitorized on the dashboard as usual (they are regular bacckups from the dashboard perspective) [08:05:00] jynus: good luck :* [08:07:20] oh, the host I chose, es1025 is a "master" [08:07:29] so I will use another host for backup [08:07:51] jynus: if it is a big deal I can swap it now [08:07:54] it is just one command [08:08:12] If it is not a big deal, yes [08:08:21] as the dump user is already configured there [08:08:25] sure no problem [08:08:31] give me a sec [08:10:21] jynus: done, es1023 is the new master [08:10:31] thank you, and sorry for the extra work! [08:10:36] not at all! [08:10:47] jynus: es1025 remains pooled, you want me to depool or you do it? [08:10:58] I was about to do it myself [08:11:01] great [08:11:54] done, will check that everything looks ok mw-wise and then will start the backup process [08:12:04] iok! [08:12:58] There is mention of es, but I doubt this is related to this: "Argument 1 passed to ExternalStoreDB::getDomainId() must be of the type array, bool given" [08:14:13] or maybe it is (?) seems it started recently [08:17:23] Please keep an eye on it, maybe it was just temporary, only 15 errors in total: https://logstash.wikimedia.org/goto/48163247d6bfe11d8342f6956534cdaa [08:22:18] ok, looking good, this is the status as I am away for a while: https://phabricator.wikimedia.org/T363812#9925149 [08:23:46] if all is ok we will revisit the status tomorrow morning [08:25:05] thanks for all the help, see you later [10:20:00] The codfw one finally finished, it took 24h 25m [10:20:19] I will soon repool es2022 [13:14:24] I'm about to run the cookbook sre.wikireplicas.add-wiki for btmwiki (T368066) [13:14:25] T368066: Prepare and check storage layer for btmwiki - https://phabricator.wikimedia.org/T368066 [13:14:40] following the docs at https://wikitech.wikimedia.org/wiki/Add_a_wiki#Maintain_views [13:19:41] "ERROR : Zone analytics.db.svc.wikimedia.cloud. does not exist" [13:19:52] I think that is caused by the zone being moved in cloudvps [13:25:12] yes, that's the issue. this patch should fix it https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049933 [13:26:11] dhinus: +1 [13:26:17] taavi: thanks :) [13:28:38] is the cookbook idempotent? can I just re-run it? [13:30:40] should be [13:31:36] I'll wait for it to complete, then start it again [13:32:12] END (PASS) < lol, not really :P [13:47:22] it's creating 3 more DNS that are not related to the new wiki https://phabricator.wikimedia.org/P65482 [13:47:46] huh, those are private wikis [13:50:36] dhinus: I think the wmcs-wikireplica-dns script just doesn't check that the wikis it creates records for are public. in theory that's fine since they don't get an actual replica database, but it'd be nice to fix regardless [13:50:36] the script gets the list from https://noc.wikimedia.org/conf/dblists/s5.dblist [13:51:02] taavi: I'll create a task [14:01:17] created T368538 [14:01:18] T368538: [wikireplicas] wmcs-wikireplica-dns.py creates DNS records for private wikis - https://phabricator.wikimedia.org/T368538 [14:01:56] the second run of the cookbook completed successfully