[08:03:05] <wikibugs>	 10CAS-SSO, 10Infrastructure-Foundations: Migrate CAS to Bookworm - https://phabricator.wikimedia.org/T357748 (10MoritzMuehlenhoff)
[08:04:26] <wikibugs>	 10CAS-SSO, 10Infrastructure-Foundations: Move CAS to Java 17 - https://phabricator.wikimedia.org/T357749 (10MoritzMuehlenhoff)
[12:31:43] <moritzm>	 in the public homer repo, policies/cr-labs.yaml enables a puppetserver_group, where is the membership of puppetserver_group defined? can't find it in the private or public Homer repos and searching for it in Netbox also yields no results
[12:35:33] <taavi>	 moritzm: it's automatically generated from all netbox devices with that name prefix
[12:36:36] <taavi>	 there's a script https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ to refresh those definitions
[12:43:21] <moritzm>	 ah, thanks!
[12:43:58] <moritzm>	 I'll run that (I noticed that some of the codfw1dev cloud test servers failed to connect to our new puppetserver2003)
[12:44:49] <topranks>	 yeah it's come up before 
[12:45:32] <topranks>	 I wonder should we create a cronjob/systemd-timer or something to execute it nightly?
[12:47:17] <moritzm>	 if we complement if with some status output (e.g. mailing a diff if there is one like we do for public hosts diff-scan), that sounds useful to me
[12:47:40] <moritzm>	 for now I'll add a note to the Puppet docs on wikitech
[12:49:10] <topranks>	 moritzm: I see your run failed actually 
[12:49:24] <topranks>	 JobTimeoutException: Task exceeded maximum timeout value (300 seconds)
[12:49:55] <moritzm>	 I initially ran it as dry-run only because I was cruious what happens under the hood
[12:49:56] <topranks>	 do you want to run it again?  or I will try?  if it keeps doing the same I'll try to work out what the options are 
[12:50:00] <moritzm>	 currently re-trying to real
[12:50:04] <topranks>	 ok
[12:50:33] <topranks>	 this is what it does:
[12:50:34] <topranks>	 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-extras/+/refs/heads/master/customscripts/capirca.py
[12:52:16] <moritzm>	 thx
[12:55:19] <moritzm>	 the non-dry-run also failed, hitting a timeout: https://paste.debian.net/hidden/aae7408c/
[13:37:02] <XioNoX>	 yeah that's getting problematic
[13:37:35] <XioNoX>	 we probably went over a certain threshold in term of host or data in the DB
[13:37:42] <XioNoX>	 howpfully the netbox upgrade will help here
[13:37:53] <XioNoX>	 usually running it again will fix it
[13:37:59] <volans>	 the panacea for all sins
[13:38:00] <volans>	 :D
[13:38:47] <XioNoX>	 it will fix lots of issues I'm sure, not sure how many new ones it will bring :)
[13:38:51] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Spicerack: spicerack.redfish needs to know about Jobs as well as Tasks - https://phabricator.wikimedia.org/T357764 (10Volans) p:05Triage→03Medium
[14:20:48] <Emperor>	 Hi. unless I'm missing something, the sre.hardware.upgrade-firmware cookbook is documented as rebooting the host, but if I do an idrac-only update (-c idrac) it seems to not in fact do so. Is that expected?
[14:21:51] <volans>	 idrac updates don't need to reboot the host
[14:21:54] <volans>	 only the bmc
[14:22:30] <Emperor>	 Ah, that wasn't entirely clear; I don't suppose there's a "reboot it anyway" option?
[14:29:27] <volans>	 why would you :D
[14:29:57] <Emperor>	 because the raid-deletion job doesn't work after a firmware update unless the host is rebooted first.
[14:30:09] <Emperor>	 (no, I don't know why I just observe that this is the case)
[14:30:50] <Emperor>	 I can stick it into the convert_disks cookbook myself, it just feels less DRY
[14:31:24] <volans>	 in case of existing hosts it does call sre.hosts.reboot-single
[14:31:42] <Emperor>	 not in all cases (this is an existing host)
[14:32:14] <volans>	 no, what I mean is that when it does hte reboot (bios or driver upgrade), it just calls the sre.hosts.reboot-single
[14:32:25] <Emperor>	 oh I see what you mean, sorry
[14:32:38] <volans>	 so if you need a reboot you can just add that 
[14:34:14] <volans>	 that said we could add an option to force a call self._reboot() with idrac updates too
[15:54:26] <hashar>	 volans: hi, how do you deploy debmonitor nowadays?  I am wondering whether operations/software/debmonitor/deploy Gerrit repo is still any relevant ?; )
[15:55:03] <volans>	 was recently migrated to deb package
[15:55:15] <volans>	 so that repo will be archivable I think
[15:55:42] <hashar>	 oh a debian package of course
[15:55:52] <hashar>	 since you have the perms to do so :)
[15:55:58] <volans>	 this is like in the last 2 weeks
[15:56:02] <hashar>	 I guess operations/software/netbox-deploy would be similar if not already?
[15:56:13] <volans>	 no, that's still valid and will stay that way
[15:56:40] <cdanis>	 btw jhathaway moritzm godog -- I think https://gerrit.wikimedia.org/r/1004164 should fix the pcc failures happening for the idp hosts
[15:57:33] <hashar>	 volans: netblox-deploy I think eventually we will have to redo it cause it is rather large, though it is not causing immediate troubles ;)
[15:57:36] <hashar>	 thanks for the confirmations!
[15:58:05] <godog>	 cdanis: gah, my bad! thank you
[15:58:15] <volans>	 hashar: we can totally "reset" it if needed
[15:58:26] <cdanis>	 np godog! easy thing to miss and it's not like there's any automated checking of hosts breaking in pcc
[15:58:52] <hashar>	 volans: yes eventually, but there is zero pressure to do it any time soon it is fine keeping it as is
[16:00:07] <volans>	 basically we don't care of the history of artifacts/ that is surely what's making the size large
[16:05:17] <hashar>	 and we can offload them to LFS :)
[16:10:43] <jhathaway>	 cdanis: thanks, +1ed