[05:16:58] <marostegui>	 why do we have two s5 hosts depooled?
[05:17:05] <marostegui>	 db1096:3315 and db1161
[08:26:30] <kormat>	 db1161 was depooled this morning: https://phabricator.wikimedia.org/P25876
[08:27:18] <kormat>	 db1096:3315 was depooled yesterday: https://phabricator.wikimedia.org/P25582
[08:27:38] <kormat>	 marostegui: best guess is that the schema change was running against db1096 when i killed it yesterday (during dbctl issues)
[08:37:40] <kormat>	 good news: most of the db reboots are done. bad news: all the ones left are the painful ones.
[08:43:00] <kormat>	 i'd love to have a way to pause a schema change to do other maintenance. as it is an entire section is blocked for a day, and i end up twiddling my thumbs.
[09:17:39] <marostegui>	 let's make sure db1096 is repooled during the day then
[09:17:48] <marostegui>	 so we have all hosts ready for the long weekend 
[10:11:38] <marostegui>	 Amir1: for when you wake up, I am repooling db1109
[10:57:26] <Emperor>	 Hm, our swift container listings don't contain Etag, and rclone ignores 'hash'
[10:58:16] <marostegui>	 ok, I guess I have to repool db1096:3315 then
[10:58:39] <Emperor>	 I've punted https://forum.rclone.org/t/swift-sync-checksum-calls-head-on-every-object-so-is-very-slow/30322 at upstream to see what they think (since they seem quite responsive)
[12:28:04] <Emperor>	 godog: do you know if we have any chunked objects in ms? cf upstream Q at https://forum.rclone.org/t/swift-sync-checksum-calls-head-on-every-object-so-is-very-slow/30322/2
[12:38:06] <Emperor>	 (I think from grobbling around inside the swiftrepl code we currently (assume we) don't, BICBW)
[13:34:53] <kormat>	 marostegui: hey, sorry, i was afk for a lot longer than expected
[13:36:56] <kormat>	 marostegui: right now the full list of depooled hosts is: db1132 (which you're working on), and db1179 (s3)
[13:37:35] <marostegui>	 yep, db1132 can be ignored
[13:37:41] <marostegui>	 db1179 I think is coming from Amir1's schema change
[13:38:49] <kormat>	 confirmed, yeah re: db1179
[13:41:23] <Amir1>	 Good morning. Everything is marostegui's fault
[13:43:52] <Amir1>	 I actually checked all terminated schema changes yesterday and repooled the ones that were not repooled. I must have missed that one
[13:43:56] <Amir1>	 Thanks
[13:44:44] <kormat>	 i have a tiny shell script on cumin1001 to show depooled hosts: `~kormat/bin/list-depooled all`
[13:45:19] <Amir1>	 kormat: oh nice, can you push it somewhere 🥺
[13:45:29] <kormat>	 i did. to cumin1001. :P
[13:45:52] * Amir1 trouts kormat 
[13:46:05] <Amir1>	 btw, what sections do you need it to be stopped?
[13:47:23] <kormat>	 Amir1: for example, i'd like to reboot db1154. it's in: s1/s3/s5/s8.
[13:47:56] <Amir1>	 hmm, I see
[13:48:01] <kormat>	 which is basically impossible to do without stopping schema changes
[13:52:53] <Amir1>	 okay, I won't  do any on Sunday/Monday, it should be easy for you to pick them up
[13:53:06] <Amir1>	 it's hard to stop a running one
[14:01:19] <kormat>	 Amir1: monday is a holiday here
[14:47:31] * Emperor shaves yaks
[15:09:36] <kormat>	 Amir1: refreshLinkRecommendations.php has been running against db1120 for >1h since it was depooled
[15:27:15] <Amir1>	 kormat: I think I created a ticket for that long time ago
[15:27:18] <Amir1>	 let me double check
[15:27:28] <kormat>	 https://phabricator.wikimedia.org/T299021
[15:27:47] <Amir1>	 A comment there would be amazing :P
[15:29:02] <kormat>	 on it
[15:29:32] <kormat>	 Amir1: also, i think we have to establish some guidelines for maintenance scripts. "Either you check every 30mins for a depool, or you don't mind if we kill the connection."
[15:31:33] <Amir1>	 the underlying problem is that mw's connection manager is trying to handle both "one-minute-top webrequest" queries and "let's rewrite half of our database in one run" queries
[15:31:45] <Amir1>	 and does a terrible job at both
[15:32:00] <Amir1>	 but since it prioritizes the former, the latter suffers
[15:32:42] <Amir1>	 I think I can make it work by splitting this and making script use a different connection manager but that will take time