[15:09:20] <inflatador>	 moritzm Have you ever an alert for "user@499.service" systemd unit failure? uid 499 is the debmonitor user and this seems to happen every night on a few of the wdqs servers
[15:13:38] <inflatador>	 systemd-timedated.service apparently fails at the same time....still investigating but just wondering if you've seen that
[15:16:26] <jhathaway>	 inflatador: we have seen it before, https://phabricator.wikimedia.org/T199911
[15:16:32] <moritzm>	 do you have an example server, can have a look?
[15:16:48] <inflatador>	 wdqs1022, see https://grafana.wikimedia.org/d/000000342/node-exporter-server-metrics?orgId=1&var-node=wdqs1022:9100&from=now-12h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-disk_device=All&var-net_dev=All
[15:18:26] <inflatador>	 these hosts run a slightly different stack than the other wdqs1022 , so it's possible our puppet code is part of the problem
[15:18:33] <inflatador>	 than the other wdqs hosts, that is
[15:18:55] <inflatador>	 more context in https://phabricator.wikimedia.org/T352878
[15:21:06] <moritzm>	 the failure of debmonitor seems rather like natural fallout of the high load which happens at the time, debmonitor runs a daily systemd timer to ingest package data and if the host is under high load by the time, the systemd session will fail
[15:21:14] <moritzm>	 as in the task that Jesse linked
[15:21:33] <volans>	 and we do have an automatic cleanup of those
[15:21:42] <volans>	 that can be opt-in IIRC
[15:23:53] <moritzm>	 yeah, there's a toil class which gets applied to the swift hosts (which run into high load from time to time)
[15:23:57] <inflatador>	 ACK, looks like it's this class https://gerrit.wikimedia.org/r/c/operations/puppet/+/636633/6/modules/profile/manifests/mariadb/dbstore_multiinstance.pp 
[15:24:29] <inflatador>	 I don't see high load in our case, but there could be something else triggering it. It's definitely recurring
[15:25:06] <inflatador>	 Anyway, I'll get a patch up for adding this class. Thanks for y'all's help!
[16:06:16] <inflatador>	 Can someone help me understand why https://gerrit.wikimedia.org/r/c/operations/puppet/+/984620/8/modules/role/manifests/wdqs/test.pp#14 is a style violation ? I based this on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/role/manifests/swift/storage.pp which I presume passed CI?
[16:08:05] <Emperor>	 I don't want to be too shady, but not all of the swift puppetry is best current practice :-/
[16:08:19] <volans>	 it is a violation :)
[16:08:21] <Emperor>	 (also, what does the CI say?)
[16:08:32] <volans>	 as that is not a profile
[17:22:30] <inflatador>	 OK, I think I the toil class imported correctly...if anyone has time to look it's here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/984620
[19:20:26] <moritzm>	 inflatador: looks good, +1d
[19:21:17] <inflatador>	 moritzm excellent, thank you
[22:54:47] <ryankemper>	 I ran the decom cookbook on wdqs100[6-8], but it failed in the middle due to failing to acquire a lock for one of the netbox-change cookbooks (accidentally lost the exact log line). Anyway, diffing https://phabricator.wikimedia.org/T351671#9420135 w/ https://phabricator.wikimedia.org/T351671#9407888, seems like it missed the steps to remove from puppetdb/debmonitor & configure linked switch interfaces
[22:55:49] <ryankemper>	 What's the best way to proceed? Do I need to manually run `sre.network.configure-switch-interfaces` and manually remove from puppetdb/debmonitor or is there a better approach?
[22:57:22] <volans>	 ryankemper: try to rerun the decom
[22:57:50] <volans>	 if it didn't go too far should be able to do its job
[22:58:19] <volans>	 it's currently not fully idempotent as it should be
[22:58:40] <ryankemper>	 volans: should have mentioned that, it refuses to run due to `spicerack.netbox.NetboxError: Server wdqs1008 does not have any primary IP with a DNS name set.`
[23:04:44] <volans>	 ryankemper: in this case yes you can run the switch cookbook
[23:05:19] <volans>	 as for the rest add me to the task and I'll check it tomorrow
[23:05:41] <volans>	 I'd like to fix the cookbook itself
[23:14:44] <ryankemper>	 volans: excellent. as always, thanks for the help!