[04:47:52] <marostegui>	 https://mariadb.org/documentation-as-pdf/
[07:06:55] <kormat>	 marostegui: have you seen https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/787528 ?
[07:35:16] <Emperor>	 OK, so DC team have kindly upgraded the firmware on ms-be2040, and it still won't PXE-boot. Still fails at exactly the same point (loads debian-installer/amd64/linux, loads debian-installer/amd64/initrd.gz, probes EDD OK).
[07:36:33] <Emperor>	 Any further suggestions? We have 8 of these Dell PowerEdge R730xd that are due to be upgraded, and if none of them will PXE-boot, I'm a bit screwed. Even assuming the slightly-less ancient kit (we have 5 different hardware specs on the to-be-upgraded pile) will work :-/ 
[07:42:11] <Emperor>	 (plus the codfw ms cluster is now down one host since ms-be2040 is a brick)
[07:42:29] <kormat>	 Emperor: how sure are you that they don't boot? e.g. can you ping them?
[07:43:32] <Emperor>	 kormat: I have the shiny HTML5 console to look at
[07:43:43] <marostegui>	 kormat: I didn't know. That's good, although I normally disable notifications when doing reimages, so the warn or even critical doesn't page
[07:43:55] <kormat>	 Emperor: sure. i'm just thinking that maybe the _console_ is broken, but maybe it does actually boot
[07:44:28] <Emperor>	 kormat: well, also the reimage cookbook is chuntering through Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' 92/120 and will shortly time out
[07:44:31] <kormat>	 marostegui: this + the downtime that the cookbook does should be sufficient
[07:44:38] <kormat>	 Emperor: ahh. ok.
[07:45:28] <kormat>	 Emperor: i'd blame debian, personally. ;)
[07:45:48] <Emperor>	 I mean I guess I can send it round again with the HTML5 console disconnected against the slim possibility that's confusing the installer, but...
[07:46:48] <kormat>	 i've never heard of anyone using the html5 console here, so that's unusual...
[07:47:08] <kormat>	 seems like a very long shot, but i guess it's worth trying?
[07:47:50] <Emperor>	 well, I'll wait the remaining few minutes for the reimage cookbook to time out then give that a go. I will be _very_ surprised if it helps, but you never know, and frankly I'm out of other ideas.
[07:48:25] <kormat>	 we're into the goat-sacrificing step of troubleshooting
[07:50:22] <kormat>	 Emperor: something else you could try is reimaging to buster
[07:50:32] <kormat>	 that might at least tell you if it's bullseye-specific
[07:51:44] <Emperor>	 Mmm; again, seems low-likelihood, given how early in the process it's failing. I'm just sending it round on bullseye without any console connected, which'll take a while to fail
[07:51:53] <Emperor>	 (timeout is 20 minutes)
[07:53:05] <Emperor>	 [do we have effective smart remote power? If so I could try turning it actually off for a few minutes]
[07:53:34] <kormat>	 Emperor: just what's built into the mgmt interface
[08:07:13] <Emperor>	 so if I want to try actually turning the whole thing off and on again, I hae to ask DC folk nicely?
[08:07:43] <kormat>	 yes. though: you can turn off the main chassis, and you can reboot the mgmt interface. which isn't exactly the same thing, but it's pretty close.
[08:09:22] <kormat>	 Emperor: btw, i had an issue yesterday where a machine i was reimaging didn't output anything on console for a good 5+ mins after pxe started. j.bond speculated that the installer console settings weren't correct (presumably the initial kernal console params)
[08:09:39] <kormat>	 e.g. it could be that the kernel is booting, and then hanging, and because its console settings are wrong you aren't seeing anything
[08:10:05] <Emperor>	 Mmm
[08:11:44] <kormat>	 the existing console settings are probably specified in modules/install_server/files/tftpboot/bullseye-installer/pxelinux.cfg/ttyS1-115200 (or the ttyS0 version..)
[08:13:33] <Emperor>	 settings there look essentially the same as the stretch-installer (which presumably is how this system was installed as stretch).
[08:16:24] <Emperor>	 trying a cold reset of the BMC (and then power off)
[08:18:34] * kormat nods
[08:18:46] <kormat>	 (i am just spitballing here)
[08:21:05] <moritzm>	 another thing worth testing is to update the NIC firmware, https://phabricator.wikimedia.org/T286722 is for a different Broadcom NIC model, but it's also a 10G card
[08:21:41] <moritzm>	 and the symptoms are similar, the card worked fine with an older distro, but the combination of never-updated-NIC-firmware along with the 5.10 kernel failed
[08:21:58] <kormat>	 💡 iinteresting
[08:22:58] <moritzm>	 unfortunately I don't know where/if we have docs describing the NIC firmware update, but it's something that does not require DC ops, Jaime only did it yesterday for one of the backup servers
[08:24:09] <moritzm>	 https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Dell_Documentation#NICs
[08:25:46] <Emperor>	 I'd sort-of expect the kernel to boot if that were the issue
[08:26:16] <kormat>	 such optimism
[08:27:22] <Emperor>	 trying again after resetting the BMC and leaving the chassis off for a while
[08:29:44] <nemo-yiannis>	 godog: Hey! Just checked today, tegola with the new container is pretty stable. I think we can stop copying files.
[08:30:37] <Emperor>	 moritzm: that wikitext page says "With the NIC model you can download the driver from the Netbox shortlink" I'm not sure what it means by that - our netbox system knows where to find NIC firmware updates?
[08:31:38] <kormat>	 Emperor: netbox has a link to the dell config page for the server
[08:31:55] <kormat>	 (which.. TIL)
[08:36:29] <kormat>	 so, at a guess, this is the latest firmware for the 10G nic: https://www.dell.com/support/home/en-uk/drivers/driversdetails?driverid=npnt5&oscode=naa&productcode=poweredge-r730xd
[08:40:45] <Emperor>	 console com2 has something on this time
[08:41:11] <Emperor>	 suggesting it has got into the installer, and then something went wrong
[08:41:31] <Emperor>	 ah
[08:41:49] <Emperor>	 complaining about non-free firmware files to operate the NIC /o\
[08:42:03] <Emperor>	 installer is sitting at the "Load missing firmware from removable media" prompt
[08:42:52] <Emperor>	 Do our installer images not have non-free firmware on?
[08:43:22] <Emperor>	 bnx2x/bnx2x-e2-7.13.21.0.fw 
[08:44:33] <moritzm>	 so we're using the default d-i images, but when we kludge the firmware tarball into it
[08:45:02] <moritzm>	 but actually, now that you mention bnx2x fw in specific
[08:46:25] <Emperor>	 I tried "<yes>" at the load non-free firmware prompt, no joy (it just gave me the same thing again), so I tried "<no>", and the installation is at least progressing...
[08:46:41] <moritzm>	 this might actually be caused by the fix for https://phabricator.wikimedia.org/T306148
[08:46:48] <moritzm>	 the background is:
[08:46:51] <kormat>	 science 🧪!
[08:47:13] <moritzm>	 these Broadcom cards have optional firmware for some features we don't use
[08:47:42] <moritzm>	 the base operation works just fine without it (and IIRC the modules are also not currently packaged in firmware-nonfree)
[08:48:04] <moritzm>	 so this triggered an interactive prompt in T306148
[08:48:05] <stashbot>	 T306148: clouddb1021 missing network firmware bnx2x/bnx2x-e2-7.13.21.0.fw in Debian 11 Bullseye - https://phabricator.wikimedia.org/T306148
[08:48:19] <moritzm>	 let me revert the patch and then re-attempt the installation of ms-be1040
[08:48:58] <moritzm>	 can't wait for this crap to finally resolved with shipping the firmware in default install media
[08:49:04] <Emperor>	 moritzm: do you want me to do something to abort the current install?
[08:50:35] <Emperor>	 firmware> yeah, I broadly agree with Steve M's blogpost on the subject
[08:50:56] <moritzm>	 wrt "it just gave me the same thing again), so I tried "<no>", and the installation is at least progressing."
[08:51:20] <moritzm>	 -> that actually means that my patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/784259 didn't work as expected
[08:51:50] <Emperor>	 moritzm: at least you have some useful data from my pain :)
[08:51:57] <moritzm>	 but the upside is that manually connecting to the mgmt and choosing "No" is an adequate workaround to get this server installed
[08:52:11] <moritzm>	 the full story is:
[08:53:03] <moritzm>	 in the past before Bullseye missing firmware was simply silently failing to load in d-i
[08:53:49] <moritzm>	 but with various current GPUs not even be able to render a framebuffer for graphic d-i in the absence of AMD firmware
[08:53:56] <Emperor>	 workaround> I ahve 7 more of this class to reimage; but they all need doing one-at-a-time and then waiting for swift to sort itself out, so I can do all them them thus if necessary.
[08:54:08] <moritzm>	 hw-detect introduced this https://tracker.debian.org/news/1245038/accepted-hw-detect-1145-source-into-unstable/
[08:54:31] <moritzm>	 and this now detects that firmware is required and prompts for it
[08:54:40] <moritzm>	 but for this specific NIC model type
[08:54:47] <moritzm>	 that's only half of the story
[08:54:56] <Emperor>	 ah, yes
[08:55:07] <moritzm>	 since it _does_ work perfectly fine without firmware 
[08:55:44] <moritzm>	 but still the metadata in the kernel refers to the optional firmware and thus prompts the prompt we're seeing
[08:55:53] <Emperor>	 so we need some sort of "don't worry about these missing firmwares" knob to twiddle
[08:56:16] <moritzm>	 yes, that's what https://gerrit.wikimedia.org/r/c/operations/puppet/+/784259 was supposed to do
[08:56:31] <moritzm>	 but it seems I either made a mistake or something else is needed
[08:57:37] <Emperor>	 you're sure it's not a cached old cfg or somesuch?
[08:57:41] <moritzm>	 I'll poke at this later when I'm doing with the Ganeti update, but in the interim let's simply select the "no" prompt as a workaround until a proper fix it found
[08:57:49] <Emperor>	 moritzm: +1
[08:58:21] <moritzm>	 the DHCP config gets written out by Puppet, it should be up-to-date
[08:58:34] <moritzm>	 mayb the syntax is difrernet, I'll poke at hw-detect later
[08:58:53] <moritzm>	 or maybe it's simply broken in d-i and noone noticed :-)
[08:59:02] <Emperor>	 always a possibility :)
[08:59:26] <moritzm>	 on the bright side
[09:00:04] <Emperor>	 I'm now wondering if I missed this set of failures yesterday and the BIOS upgrade wasn't necessary to get it to this point.
[09:00:14] <Emperor>	 I guess I can try another host later
[09:00:31] <moritzm>	 I'm pretty sure we got misled
[09:00:31] <kormat>	 Emperor: it seems most likely to me that you've ran into 2 separate issues
[09:00:48] <moritzm>	 so on the bright side we might not need firmware updates
[09:00:56] <Emperor>	 that would be good
[09:01:03] <moritzm>	 soo many colourful hardware errors to run into :-)
[09:01:10] <moritzm>	 never gets boring
[09:01:18] <Emperor>	 The Sanger's kit was all "the text console is hopeless, always try HTML5"
[09:01:45] <Emperor>	 also, you have to think to try ^L at the text console before you get the error message
[09:02:29] <Emperor>	 If this hosts finishes installing OK and swift looks alright, I can try an eqiad host to see if it'll reimage without the f/w upgrade
[09:02:37] <moritzm>	 John is currently working on automating firmware updates via Spicerack, so hopefully we can at some point simply run these via a cookbook (or even fold them as a regular step into the reimage cookbooks)
[09:02:44] <moritzm>	 ack, sounds good
[09:03:09] <Emperor>	 if not, Willy is expecting bad news from me...
[09:06:16] <Emperor>	 Huh, happy 12-year anniversary of my Erdős number
[09:08:31] <volans|off>	 moritzm: could it be tab vs space in the 'boolean false' part?
[09:10:22] <godog>	 nemo-yiannis: ack! {{done}} and created T307184 for followups
[09:10:23] <stashbot>	 T307184: Followups for Tegola and Swift interactions  - https://phabricator.wikimedia.org/T307184
[09:11:39] <nemo-yiannis>	 thanks godog 
[09:15:53] <godog>	 sure np
[09:18:37] <moritzm>	 volans|off: maybe, I'll have a closer look in a bit
[09:20:44] <Emperor>	 godog: puppet is failing on ms-be2040 post-reimage something about xfs labelling not working...
[09:21:38] <Emperor>	 /dev/sda4 on /srv/swift-storage/sdb4 type xfs (rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
[09:22:07] <Emperor>	 is probably not quite right (likewise /dev/sdb4 is mounted at /srv/swift-storage/sda4) - worth a reboot to see if they come back the right way round, or has something gone badly wrong here?
[09:23:26] * Emperor tries a reboot
[09:24:38] <godog>	 Emperor: yeah worth a reboot, I'm guessing the first puppet run labelled them one way and then post-reboot they came back swapped
[09:27:01] <Emperor>	 looking better post-reboot let's see if puppet completes now
[09:28:18] <Emperor>	 yes.
[09:28:40] <Emperor>	 it'll be interesting to see how long it takes to re-populate the swift partitions on the SSDs
[09:30:03] <godog>	 reasonably fast IME, in the order of a couple of hours IIRC
[09:42:17] <kormat>	 marostegui: poke re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/775330
[10:01:45] <Emperor>	 ms-be1040 (eqiad host, same vintage) gets to the same firmware-needed prompt; let's see if the install works
[10:03:29] <kormat>	 well that's concerning: https://phabricator.wikimedia.org/P27009
[10:05:49] <Emperor>	 server on strike
[10:06:27] <kormat>	 ok, it claims it's on now, but there's zero output on the console
[10:07:54] <marostegui>	 kormat: maybe try a hard reset
[10:08:13] <Emperor>	  /topic all hardware is terrible
[10:08:35] <kormat>	 marostegui: oh, good idea, trying.
[10:10:54] <kormat>	 Emperor: 💯
[10:11:30] <kormat>	 marostegui: any idea how long until a hardreset takes effect?
[10:11:36] <kormat>	 coz i'm still staring into the void
[10:12:56] <moritzm>	 should usually take at most a minute
[10:13:19] <kormat>	 ok. it's been ~5. maybe a `racadm racreset` for good measure?
[10:14:39] <kormat>	 trying it
[10:14:45] <moritzm>	 yeah, if that fails taht needs a dc ops ticket
[10:22:42] <kormat>	 feh, nothing. dc ops it is.
[10:23:07] <Emperor>	 worth tyring a poke at the web-IPMI? It has more buttons...
[10:23:26] <kormat>	 Emperor: what kind of buttons?
[10:23:57] <Emperor>	 Depends a bit, but often resetting bits of the IPMI system
[10:25:19] <Emperor>	 ugh, ms-be1040 came up with drives mounted in the wrong place, let's see if a reboot helps
[10:27:45] <moritzm>	 Emperor: the d-i setting itself seems just fine to me, but we could try https://gerrit.wikimedia.org/r/c/operations/puppet/+/787704/ on the next swift with such a Broadcom NIC?
[10:29:17] <Emperor>	 moritzm: certainly.
[10:32:21] <kormat>	 Emperor: oh! https://phabricator.wikimedia.org/T307198#7890645
[10:33:12] <Emperor>	 fsck it, the reimage didn't get the ownership right
[10:54:05] <Emperor>	 where do reimage logs end up? The changes I made in https://gerrit.wikimedia.org/r/c/757025 obviously aren't working, but it's difficult to know why...
[11:02:04] <volans|off>	 Emperor: for cookbooks logs see https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Logs, but late_command is executed by d-i, so will not be there
[11:02:32] <Emperor>	 volans|off: Mmm, it's the late_command output (if any) I'd like to see - it _ought_ to be working, but clearly isn't
[11:02:35] <volans|off>	 you can check the logs in the d-i env if they have something, before d-i completes
[11:03:05] <volans|off>	 late command is run via d-i preseed/late_command
[11:03:28] <Emperor>	 ah, it's in /var/log/installer
[11:03:36] <Emperor>	 /tmp/late_command: line 57: stat: not found
[11:03:37] <Emperor>	 FFS
[11:04:07] <Emperor>	 busybox has stat, though.
[11:04:26] <Emperor>	 does d-i have a non-standard busybox or something?
[11:05:05] <volans|off>	 at which stage is d-i? are you in the installer environment?
[11:05:11] <volans|off>	 the new OS is in /target
[11:05:18] <volans|off>	 the chroot
[11:06:31] <Emperor>	 volans|off: late_command
[11:06:34] <volans|off>	 see https://www.debian.org/releases/stable/amd64/apbs05.en.html
[11:07:03] <volans|off>	 B.5.1.
[11:08:28] <Emperor>	 volans|off: late_command has /target available to it, but I thought was running busybox sh, so should have busybox utils available to it?
[11:09:12] <volans|off>	 it's run inside the chroot of the new OS AFAIK, not busybox
[11:09:42] <volans|off>	 all the commands that have in-target 
[11:09:54] <volans|off>	 should be run inside the chroot, but I'm no d-i expert, sorry
[11:10:11] <volans|off>	 your patch doesn't have in-target AFAICT
[11:10:30] <Emperor>	 volans|off: I don't want it running in the target, I want the installer to mount a filesystem and call stat on the contents
[11:11:18] <Emperor>	 and then I call in-target groupadd/useradd based on the outcome
[11:12:52] <Emperor>	 it's successfully mounting the FS OK, but somehow isn't finding a "stat" to call, which I don't understand because isn't the installer shell busybox which has a "stat" builtin?
[11:13:32] <volans|off>	 maybe PATH is not set there and you need the full path?
[11:13:48] <volans|off>	  /usr/bin/stat I guess
[11:14:02] <volans|off>	 but I see other commands working fine
[11:14:04] <volans|off>	 like ip
[11:14:10] <Emperor>	 leg excerpts at https://phabricator.wikimedia.org/T300057#7890740
[11:14:31] <Emperor>	 Isn't the point of busybox that they're all builtins?
[11:24:19] <moritzm>	 the commands offered by busybox are all controlled by build flags, it's probably the udeb missing stat?
[11:25:01] <Emperor>	 I wonder if there's any other way of solving this problem then :-/
[11:25:08] <moritzm>	 the whole concept of udebs is really moot at this point, simply using the default debs would reduce so much complexity
[11:25:37] <moritzm>	 and the days of installer hardware needing to squeeze out a few kilobytes are also over for a long time...
[11:26:39] <moritzm>	 one workaround would be to install coreutils in the late install script and then use stat from there?
[11:27:37] <Emperor>	 https://salsa.debian.org/installer-team/busybox/-/blob/master/debian/config/pkg/udeb#L315 <-- confirms that the busybox udeb doesn't build stat :(
[11:29:04] <moritzm>	 meh 
[11:29:08] <Emperor>	 moritzm: and then 'swiftuid=$(/target/usr/bin/stat -c '... ?
[11:29:20] <Emperor>	 worth a try, I guess
[11:29:52] <moritzm>	 yeah, it's not elegant, but then none of the late install script handling swift UIDs is pretty to begin with :-)
[11:33:43] <Emperor>	 I'll put a patch together after lunch, then.
[11:53:44] <moritzm>	 I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/787704, let me know if you still the prompt for the next reimage
[12:04:49] <Emperor>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/787717 +1 perhaps?
[12:13:48] <Emperor>	 godog: ms-be1040 has a bunch of very unhappy xfs partitions :( e.g. sdj1 won't mount, and xfs_repair says to try and mount it before re-attempting xfs_repair
[12:14:00] <Emperor>	 (and offers -L to discard the log with scary warnings about data loss)
[12:14:13] <Emperor>	 at least 4 partitions in this state
[12:18:12] <Emperor>	 dunno if this is latent corruption from stretch we just missed, damage from upgrade or what
[12:22:49] <Emperor>	 godog: do you have a feel if it's worth trying xfs_repair -L ? No media issues reported in kernel.log
[12:26:08] <godog>	 Emperor: no idea tbh but if the partition isn't mountable anyways then might as well try
[12:34:46] <Emperor>	 OK, will give it a go.
[12:37:00] <godog>	 Emperor: are the disks in the expected order though? not related to the corruption but relevant if we're re-formatting of course
[12:49:09] <Emperor>	 godog: oh, no, they were on one reboot, but seem out again. Argh, this is getting very tedious
[12:49:54] * Emperor much prefers UUID-based mounting
[12:51:27] <Emperor>	 this is all a mess :(
[12:51:49] <Emperor>	 once xfs_repair has done its thing I'll reboot again again and see if the disks come back more sensibly.
[12:52:42] <Emperor>	 godog: I notice ms-be2040's disks are still mixed up too, and that's after I rebooted it once to get sd{a,b} right again :(
[12:56:33] <godog>	 Emperor: yeah I've seen that happen too, it is unfortunate alright
[13:27:08] <kormat>	 marostegui: sounds like db1164 is going to be out of service for a while, do we need to do any rebalancing in the meantime? T307198
[13:27:08] <stashbot>	 T307198: db1164 fails to POST/boot/etc - https://phabricator.wikimedia.org/T307198
[13:27:13] <Emperor>	 reboot hasn't fixed it, different set of drives permuted
[13:28:26] <marostegui>	 kormat: no, it should be fine
[13:28:35] <kormat>	 marostegui: ok cool, thanks
[13:33:27] <Emperor>	 godog: I've rebooted ms-be2040 4 times now, and each time the drives don't come up in the correct order, and it's a different permutation each time.
[13:36:50] <godog>	 Emperor: mmhh which drives permute ? the ssd or hdd or a mix ?
[13:37:00] <Emperor>	 hdds
[13:37:14] <Emperor>	 other than the initial post-install reboot when the SSDs were wrong, they have remained correct
[13:38:32] <Emperor>	 we've had m->j, j->i, i->k, k->m ; f->g, g->h, h->f, l->m, m->l ; and d->e, e->c, c->d, n->m, m->n, l->k, k->l on the last few reboots
[13:41:59] <godog>	 Emperor: ack, IME one/two reboots are sufficient to fix the order, though nothing is immediately wrong due to labels, can wait next week I think
[13:42:19] <Emperor>	 godog: it maybe that with current kernels the order is never going to be stable
[13:44:07] <godog>	 that's certainly possible too
[13:50:03] <Emperor>	 (in the mean time, going to try another codfw backend upgrade, since that cluster is happy)
[14:07:49] <Emperor>	 moritzm: I'm afraid ms-be2041's installer has still got to the non-free firmware prompt under Detect network hardware
[14:08:36] <moritzm>	 meh
[14:08:47] <moritzm>	 then the setting itself is probably broken
[14:10:01] <Emperor>	 once the install's finished, you're welcome to the logs from it :)
[14:16:21] <moritzm>	 I think I'll just axe the setting, it only affects a handful of hosts, so we can treat it as a known bug and with bookworm the whole firmware mess will have vanished
[14:17:02] <moritzm>	 even if we track it down and land a fix, it would need a backport to bullseye's d-i accepted and would only be avalable in the subsequent bullseye point release
[14:17:18] <moritzm>	 doesn't really seem worth it
[14:17:20] <Emperor>	 Mmm, it's not the most annoying thing about reimaging these systems :-/
[14:30:58] <Emperor>	 swift uid/gid are set right, though.
[14:33:15] <Emperor>	 godog: ms-be2041 reimaged OK, but its hdds are in a jumbled order again
[14:33:44] <godog>	 siiiigh
[15:32:34] <Emperor>	 (in other news, we have about 2 billion objects in the ms- cluster)
[15:56:03] <Emperor>	 godog: ms-be2042 is repeatedly putting its SSDs in the "wrong" place, which is making puppet fail, which is stopping the reimage from completing.
[15:56:14] <Emperor>	 On reboot #3 to try fix this :-/
[15:59:58] <Emperor>	 #4
[16:00:41] <Emperor>	 it's starting to look like this system is going to consistently put the SSDS the other way round from the installer 
[16:04:28] <Emperor>	 which means puppet will never work because it wants to label the partitions but they're not where it expects them to be
[16:10:07] <Emperor>	 and, indeed, it's trying to mkfs on /dev/sdc1 but there's already a fs there
[16:15:06] <Emperor>	 xfs_admin -L swift-sda3 /dev/sda3 keeps failing because /dev/sda3 is already labelled swift-sdb3
[16:16:11] <Emperor>	 puppet is never going to work here, and I've rebooted 6 times now 
[16:17:35] <Emperor>	 and there's a chunk of data in these filesystems already
[16:18:17] <Emperor>	 I guess I could try stopping swift, unmounting the filesystems and adjusting the labels.
[16:24:46] <Emperor>	 done so, puppet now runs to completion. But this approach is v. v. fallible and doesn't scale
[16:25:57] <Emperor>	 I remain confused about why puppet seems to mind mis-labelled hdds less than the SSDs
[16:26:25] <Emperor>	 I mean /dev/sdn1 here is labelled swift-sdl1 and puppet isn't catching fire about that
[16:33:30] <Amir1>	 I'm making a couple of schema changes on db1156 live (T276292)
[16:33:31] <stashbot>	 T276292: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292
[18:24:35] <Emperor>	 I've left ms-be1040 doing a bunch of xfs_repair in a tmux
[18:26:07] <Emperor>	 (and Ack'd the icinga alert until Tuesday)
[18:39:16] * Emperor will try and leave it alone for the rest of the weekend
[18:42:14] <jynus>	 Emperor: if it helps, I broke backup1002 too :-(