[08:31:04] Amir1: any chance i can get some multi-instance reboots in today? [08:31:14] (i'm here for the morning) [08:31:33] https://phabricator.wikimedia.org/T303174#7871371 [08:31:38] kormat: how can I help with that? [08:32:01] Amir1: the dbmaint map says you're currently running schema changes against s2/s3/s6 [08:32:15] these block all remaining multi-instance core reboots [08:32:19] s3 should be done now [08:32:31] (or soon) [08:32:40] repool is almost done [08:32:45] the rest are still ongoing [08:32:47] ok. that'll allow one. [08:36:44] Amir1: can i get you to not start any schema changes tomorrow morning, so i can get the rest done? [08:37:12] sure, the user timestamp fixes ones are done almost anyway [08:37:28] the img ones will finish by end of the day and only s1 would be left [08:37:47] probably already known, but there was a production issue today with db1144:3314 [08:38:14] jynus: define 'production issue'? i depooled it, rebooted it, and it's currently being repooled [08:38:48] mmmm, wikiadmin complained about it, so maybe a script kept trying to connect to it [08:39:08] https://logstash.wikimedia.org/goto/48097cc0c6f3af90f3ee9de8c87977d4 [08:40:28] yep, looks like exactly that: maintenance/migrateLinksTable.php [08:40:44] the other things is clouddb2001-dev maybe still having an old password? [08:41:48] do we even run that? [08:42:21] no idea, it is reported as s11 [08:43:28] jynus: no it was reported at https://phabricator.wikimedia.org/T303800 so no idea [08:43:32] `role(wmcs::openstack::codfw1dev::db)` [08:44:09] we do not own that host, no [08:44:43] jynus: I fixed that for wikiuser [08:45:01] ah, then I can maybe create a task for wikiadmin? [08:45:08] for wmcs [08:45:55] it also needs to upgrade away from stretch [08:46:32] kormat: s3 should be done now [08:47:47] Amir1: should I create a task with both things for cloud, or just leave it be? [08:48:27] I think that upgrading the clouddb hosts is on their radar, not sure about the -dev though [08:48:45] Sorry, I thought about something completely different [08:48:45] T301719 already exists [08:48:45] T301719: Upgrade clouddb2001-dev to debian Bullseye - https://phabricator.wikimedia.org/T301719 [08:48:55] Thanks taavi [08:49:02] sobanski: mostly concerned aboit the noise on db logs [08:49:12] Amir1: thanks. proceeding with db1154. [08:49:33] I can comment there about the password or grant issue [08:50:14] sobanski: eqiad clouddb1xxx hosts and clouddb2001-dev are used for completely different purposes (not a great naming scheme, I know), so probably not included in the wiki replicas upgrade plans [08:50:27] we haven't changed wikiadmin password for years now so if it's not fixed there meaning it didn't work for years now [08:50:38] I see [08:51:49] do we really need the labstestwiki? [08:52:02] it's been quite a hassle to maintain [08:52:12] anotherhing I can do, alternatively, is skip cloudweb2001-dev from the default logs view [08:52:30] so we can safely ignore it from the DBError dashboard [08:52:40] +1 from my side [08:53:05] I was mostly trying to avoid non-useful noise from the error logs [08:53:16] Amir1: good question, at least it's the only way for us to currently manage codfw1dev ldap accounts [08:53:24] then doing that [08:54:43] taavi: didn't have an edit in the last thirty days (last time I checked) [08:55:15] I know [08:55:35] I think I did it: https://logstash.wikimedia.org/app/dashboards#/view/87348b60-90dd-11e8-8687-73968bebd217 [08:57:34] the other issue I reported already has a ticket filed and is being worked by Amir with other mw hackers (connection handling by maintenance tasks) [09:05:32] kormat: db1155 is sanitarium master, I think you can just do it, it's not pooled [09:05:52] doesn't intefer with the schema changes [09:06:11] alrighty [09:06:34] s/sanitarium master/sanitarium [13:06:55] we have a few key people out today, so the onfire meeting has been posponed [13:07:46] jynus: YM the 16:00-17:00 UTC meeting today? [13:16:55] yep, leo I think moved it to tomorrow already? [13:17:12] ah, no sorry [13:17:16] that's another meeting [13:17:26] it has been cancelled and probably will happen in 2 weeks time [13:22:09] this is what I mentioened about stable UIDs I learned about last week, and that could (don't know) be interesting for dbas, although probably not as critical as for swift/backups: https://wikitech.wikimedia.org/wiki/UID#reserved_UIDs_%26_GIDs [13:30:39] yeah, swift is moving to 902, was previously 130, was previously not-static [13:38:20] (I've updated that doc to note the old uid/gid) [14:05:17] https://ceph.io/en/news/blog/2022/v17-2-0-quincy-released/ <-- Ceph 17 includes per-user and per-bucket rate limits for RGW (S3/Swift layer)