[08:27:37] <moritzm>	 JFTR, I've decommed lists1003 so that should reduce noise over the quiet period (confirmed with Lukasz, "Collaboration Services" will pick up the bookworm mailman install on the lists1004 hardware that was bought)
[08:40:33] <XioNoX>	 <3
[08:43:03] <volans>	 <3 that helps!
[09:01:32] <volans>	 moritzm: the motd on cumin1001 is nice but is buried in the rest of the motd... I guess either we put something larger like the one on deploy1002 or it will get missed
[09:01:58] <volans>	 optionally colored too :D
[09:42:38] <moritzm>	 nah, should be fine, there was also a mail to sre-at-large as well
[09:48:46] <slyngs>	 No no add "\033[31;5mPlease use cumin1002/cumin2002 for all cookcooks and Cumin\!\033[0m" 
[09:49:35] <volans>	 as you want, but I can guarantee noone will read that small uncolored line ;)
[09:50:27] <volans>	 anyway we can easily check its usage for cookbooks via https://sal.toolforge.org/production?p=0&q=%40cumin1001&d=
[09:51:02] <volans>	 and for cumin from: sudo tail -n1 /var/log/cumin/cumin.log
[11:46:29] <wikibugs>	 10SRE-tools, 10Dumps-Generation, 10Infrastructure-Foundations, 10serviceops, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10Clement_Goubert) >>! In T271142#9413333, @akosiaris wrote: >>>! In T271142#9382040, @Volans wrote: >> Another...
[12:30:00] <moritzm>	 volans, slyngs: when the two patches for moving the debmonitor client to the new source package are complete, let's build and deploy 0.3.2-2 and roll it out? then we have a clean separation between the new client package and the future server package
[12:30:31] <slyngs>	 Sounds good
[13:00:52] <wikibugs>	 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Netbox: get rid of WMF Production Patches - https://phabricator.wikimedia.org/T310717 (10ayounsi) Submitted  https://github.com/netbox-community/netbox/issues/14554 and  https://github.com/netbox-community/netbox/issues/14555
[13:03:46] <XioNoX>	 for the ganeti hosts listed in https://phabricator.wikimedia.org/T253173#9412420 should I create a dedicated task or people are ok to work from that one?
[13:23:42] <XioNoX>	 volans, moritzm about the motd I opened that task not long ago as its getting out of hands... https://phabricator.wikimedia.org/T352957
[13:37:56] <XioNoX>	 Homer upgraded, tested on *ulsfo* without any issue, hopefully Homer diff emails will be full quiet during the break
[13:39:42] <volans>	 who knows, but finger crossed!
[13:40:54] <wikibugs>	 10homer, 10netbox, 10Infrastructure-Foundations: Homer daily diff failing in codfw - https://phabricator.wikimedia.org/T329823 (10ayounsi) 05Open→03Resolved a:03ayounsi Homer 0.6.5 deployed, I'll re-open if the workaround doesn't workaround.
[14:05:25] <wikibugs>	 10netbox, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10Volans) Thanks for the patch, it would take a bit to do a full pass given the size. I agree...
[14:57:53] <wikibugs>	 10netbox, 10Infrastructure-Foundations, 10IPv6, 10User-jbond: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK) - https://phabricator.wikimedia.org/T253173 (10MoritzMuehlenhoff) >>! In T253173#9412420, @ayounsi wrote: > Those hosts only have a SLAAC v6 IP:  `ganeti[2010,2016,2020].codfw.wmne...
[15:26:29] <wikibugs>	 10SRE-tools, 10Data-Persistence, 10Infrastructure-Foundations, 10Patch-For-Review: Automation to change a server's vlan - https://phabricator.wikimedia.org/T350152 (10Marostegui)
[15:38:56] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704 (10ayounsi)
[15:57:38] <volans>	 jhathaway: on puppetmaster2001 there are a few session-c107960.scope session failed for the gitpuppet user. Should those be cleared out? Is that a leftover from a previous issue or an ongoing one?
[15:57:50] <volans>	 the icinga alert says it's like that since 46 days...
[15:58:30] <jhathaway>	 yeah I waas not aware, but happy to take a look
[15:59:59] <moritzm>	 this should be https://phabricator.wikimedia.org/T199911
[16:00:16] <moritzm>	 there's a toil::system_scope_cleanup class which deploys a workaround
[16:00:44] <moritzm>	 we only enable it on swift and IIRC some backup hosts (where this shows up more often)
[16:01:20] <volans>	 I recall that workaround but I've never seen that needed on the puppetmasters
[16:01:21] <moritzm>	 so let's maybe clear our the current ones manually and if there is a more persistent pattern we can add the class to puppet server as well
[16:01:29] <volans>	 so I ws wondering if there was a different problem here
[16:01:42] <moritzm>	 I think it's the same, but we probably already solved it:
[16:02:03] <moritzm>	 before Jesse enabled the caching mode on the puppet servers we had the sitatuin that they were under high load
[16:02:13] <moritzm>	 which also aligns with the 46 days
[16:02:31] <volans>	 ok then manual clear and wait to see if it re-occurers seems the way to go
[16:02:46] <moritzm>	 so I'd say, let's clear these out manually for now, likely this won't happen again with the current setup
[16:02:56] <moritzm>	 agreed
[16:02:57] <jhathaway>	 ok, I'll clear them out
[16:03:10] <moritzm>	 thx