[09:50:22] <arnaudb>	 https://phabricator.wikimedia.org/T357100 is a duplicate (edited the description)
[09:56:26] <marostegui>	 yeah, usually happens after reimages
[10:17:50] <jinxer-wm>	 (PuppetDisabled) firing: Puppet disabled on ms-backup2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=backup&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled
[10:18:27] <Emperor>	 jynus: is that expected / does downtime need extending?
[10:18:51] <jynus>	 oh, I must have forgotten reenable it after maintenance, fixing
[10:19:20] <jynus>	 actually, wait, that shouln't be disable
[10:19:51] <jynus>	 yeah, the mistake was disabling it on the first place
[10:20:04] <jynus>	 I must have confuse it with a backup host
[10:25:38] <jynus>	 no, I know what it was, I was about to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/995188 but then the network maintenance blocked me
[10:26:52] <jynus>	 should be fixed now
[10:28:09] <Emperor>	 ta :)
[10:32:51] <jinxer-wm>	 (PuppetDisabled) firing: (2) Puppet disabled on ms-backup1001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=backup&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled
[10:33:07] <jynus>	 ^that's outdated
[10:33:20] <jynus>	 as I just ran it
[10:52:50] <jinxer-wm>	 (PuppetDisabled) resolved: Puppet disabled on ms-backup2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=backup&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled
[11:48:01] <Amir1>	 Noting that I'm running schema change on s5 and s3 
[11:48:18] <Amir1>	 (https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance)
[11:48:23] <Amir1>	 Changing PK only
[11:53:02] <marostegui>	 "only"
[11:58:52] <Amir1>	 and "only" pagelinks, one of the biggest tables in every wiki :P
[13:21:58] <Amir1>	 marostegui: maybe I'm miscalculating things, but for s3, just changing pk is dropping 150 to 160GB. https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=db1150&var-datasource=thanos&var-cluster=mysql&viewPanel=28&from=1707439729207&to=1707484861318 
[13:22:10] <Amir1>	 Same is showing up for db1140 as well
[13:24:43] <jynus>	 if it is a smaller PK and has a significant amount of secondary indexes total index size can be reduced a lot
[13:25:16] <jynus>	 although one should be careful, alerts technically do the equivalent to an optimize, so it should be checked in the long term, with more fragmentation
[13:25:22] <jynus>	 *alters
[13:25:32] <marostegui>	 Yeah, I was going to say just that
[13:25:37] <marostegui>	 That it might be "temporary"
[13:28:25] <jynus>	 I think Amir1 will know that, but sharing this here so everybody can learn: https://www.slideshare.net/jynus/query-optimization-with-mysql-80-and-mariadb-103-the-basics#129
[13:29:07] <Amir1>	 yeah optimize has some impact too
[13:29:29] <Amir1>	 The drop of the old columns will be large as well
[13:31:10] <Amir1>	 The other thing I have in mind is that s3 has a couple of really large botpedias that have a pretty large pagelinks table. arzwiki, warwiki, etc.
[13:31:47] <jynus>	 let me share a random though too, if that helps (feel free to ingnore)
[13:33:21] <jynus>	 s3 (or whatever is the default db) used to have lots of small objects, but they were rarelly accessed (new wikis, or event-focused wikis). At some point years ago I thought of creating an s0 section with very very low bandwith dbs, on a VM even
[13:33:43] <jynus>	 and when they get enough activity, move them to real hardware
[13:33:56] <jynus>	 that way object overhead was minimized
[13:34:11] <jynus>	 but of course it is not a big win, and wasn't a high priority
[13:34:28] <jynus>	 but wanted to throw it to you as an old idea
[13:35:18] <Amir1>	 let me think about it
[13:35:27] <Amir1>	 s3 itself doesn't have that many replicas
[13:35:39] <jynus>	 well, yeah, the size is not that big
[13:35:57] <jynus>	 but there is some overhead when backing them up and recovering them
[13:36:10] <jynus>	 because thousands of small objects
[13:36:33] <jynus>	 again, this is not even a suggestion
[13:36:44] <jynus>	 just something I thought about thinking, but never got to :-D
[13:37:26] <jynus>	 As far as I know s5 keeps being the least loaded db at the moment
[13:38:15] <jynus>	 yep, s5 is half the size of s3
[13:38:25] <Amir1>	 s5 has the smallest size in terms of total storage, like 400-500GB
[13:38:42] <Amir1>	 but s3 probably has the lowest number of replicas, let me double chekc
[13:38:59] <jynus>	 so in a way it was done, but differently, splitting it into s5
[15:33:48] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on restbase1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:38:48] <jinxer-wm>	 (PuppetFailure) firing: (3) Puppet has failed on restbase1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:39:30] <Emperor>	 I'm not sure we need an email and an IRC message for every puppet failure?
[15:41:13] <urandom>	 ugh... is it going to continue repeating?
[15:41:35] * urandom investigates
[15:43:48] <jinxer-wm>	 (PuppetFailure) firing: (5) Puppet has failed on restbase1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:48:48] <jinxer-wm>	 (PuppetFailure) firing: (7) Puppet has failed on restbase1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:58:48] <jinxer-wm>	 (PuppetFailure) firing: (7) Puppet has failed on restbase1034:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:01:50] <urandom>	 It should be good now.
[16:03:32] <urandom>	 or not...
[16:03:53] <urandom>	 or not, because it's not just restbase1034...
[16:03:56] * urandom sighs
[16:18:53] <urandom>	 ok, now it should be good.
[16:19:45] <urandom>	 these were recently put up by dcops —server refreshes for restbase— and got caught in some puppet 5-7 limbo I guess
[16:20:09] <urandom>	 imaged as 7 and then given role insetup::data-persistence maybe?
[16:54:18] <jinxer-wm>	 (PuppetFailure) resolved: (2) Puppet has failed on restbase1039:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure