[02:10:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) wmf_auto_restart_redis-server.service on idm1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:10:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) wmf_auto_restart_redis-server.service on idm1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:27:12] <moritzm>	 ^ the idm1001 is more cosmetic, no user-visible impact, to be fixed with https://gerrit.wikimedia.org/r/1024092
[07:35:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) wmf_auto_restart_redis-server.service on idm1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:55:44] <wikibugs>	 10Mail, 06Infrastructure-Foundations, 10Znuny: Clean up OTRS/Znuny addresses handles by gsuite - https://phabricator.wikimedia.org/T284145#9743787 (10MoritzMuehlenhoff) >>! In T284145#7218511, @Keegan wrote: > @jbond my utmost apologies for not replying to this earlier! These errors can be ignored, they will...
[07:57:03] <wikibugs>	 10Mail, 06Infrastructure-Foundations, 10Znuny: Clean up OTRS/Znuny addresses handles by gsuite - https://phabricator.wikimedia.org/T284145#9743789 (10LSobanski) 1. Let's review if the new Znuny version enabled removal of unused emails and remove them if possible 2. If not, then let's filter the emails in the...
[11:35:25] <jinxer-wm>	 (SystemdUnitFailed) firing: wmf_auto_restart_redis-server.service on idm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:45:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: wmf_auto_restart_redis-server.service on idm2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:36:37] <wikibugs>	 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9744535 (10MoritzMuehlenhoff)
[12:48:38] <arnaudb>	 hello! I'm coming with a problem that jynus and I both faced today, they are visible here https://phabricator.wikimedia.org/T361087#9744384 and here: https://phabricator.wikimedia.org/T362746#9744589
[12:51:02] <arnaudb>	 since db2155 was reimaging, I'm not sure puppet will be able to resume its normal activity as I had to add --new to retry my run
[14:50:01] <topranks>	 arnaudb: hey just picking up on this 
[14:50:27] <topranks>	 I think puppet will likely be ok as the reimage did not get very far, but I guess we can deal with that when we get to that point 
[14:50:54] <topranks>	 what is the current status?  I think we need to downgrade the firmware on the NIC  in that host to make the reimage work (known issue with the more recent firmware version)
[14:51:29] <arnaudb>	 hey topranks hey I'm using -p7 upon moritzm advice, the server has some issue upon reboot after reimage apparently as I've got a blinking line displayed on ipmi for 700s now
[14:54:57] <topranks>	 hmmm 
[14:55:19] <topranks>	 yeah I see what you mean... were you following the console output of the reimage in general?
[14:55:34] <topranks>	 i.e. did you see if the PXEboot worked, did it go into the debian installer screen with the blue background?
[14:56:45] <arnaudb>	 topranks: I've left IPMI right after the first image rebooted properly
[14:57:12] <arnaudb>	 so I might have missed a few screens :D I've connected again upon seeing retries piling up
[14:57:26] <topranks>	 we should probably give this another reboot now to see what the boot sequence shows 
[14:57:43] <topranks>	 I'm guessing the OS didn't properly install and we need to try again - but worth a manual reboot to get more info first I think 
[14:57:51] <topranks>	 if you are ok for me to do that?
[14:58:35] <arnaudb>	 sure topranks ! go for it
[14:58:46] <topranks>	 ok let's see what happens !
[15:01:27] <arnaudb>	 looks like it worked
[15:01:31] <topranks>	 odd
[15:02:33] <topranks>	 ssh works but doesn't like my pubkey, so seems puppet hasn't set it up 
[15:02:36] <arnaudb>	 thanks topranks :) I hope it'll be able to catch up its replication lag :D
[15:02:40] <topranks>	 what status is the reimage at now?
[15:02:46] <arnaudb>	 it's not fully reimaged 
[15:02:51] <arnaudb>	 cookbook is still running
[15:02:58] <arnaudb>	 I was around the 100+ retry
[15:03:04] <topranks>	 the cookbook is waiting on ssh connection?
[15:03:15] <arnaudb>	 it's signing puppet's cert atm
[15:03:30] <arnaudb>	 (if you want to check out there is a tmux session on my account on cumin1002)
[15:04:12] <arnaudb>	 downtime step has been reached
[15:04:13] <topranks>	 ok...  well that sounds like it is progressing 
[15:04:17] <topranks>	 let's see how it goes 
[15:04:23] <arnaudb>	 yep, will keep you posted!
[15:04:27] <topranks>	 not sure what happened there though 
[15:04:29] <topranks>	 thanks!
[15:04:49] <arnaudb>	 me neither topranks I was not expecting to burn so much time on a reimage :D
[15:29:21] <arnaudb>	 topranks: everything went back to normal, server is catching up on its lag! thanks for the help
[15:31:33] <topranks>	 arnaudb: great!
[15:45:22] <wikibugs>	 10Mail, 06Infrastructure-Foundations, 10Znuny: Clean up OTRS/Znuny addresses handles by gsuite - https://phabricator.wikimedia.org/T284145#9745238 (10Keegan) @MoritzMuehlenhoff I cannot say for sure as I have not worked in this area for several years, but I cannot imagine that the situation has changed.
[21:48:25] <jinxer-wm>	 (SystemdUnitFailed) firing: wmf_auto_restart_prometheus-redis-exporter@6380.service on netbox2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed