[08:38:39] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:13:01] moritzm: the homer daily diff is still running on cumin1001, and colliding with the homer daily diff running in cumin1002 [09:15:54] I have no idea what it's used for, does that cause any issues or just some alert spam? [09:17:14] moritzm: it's the "check-homer-diff.timer", better to make sure it never runs on cumin1001 as it's causing spam and unnecessary load on the network devices [09:19:26] feel free to disable then, I wasn't involved in the move of the homer repo yesterday and don't want to make any unrelated changes a day before the quiet period, if it's just some alert spam, then let's just keep it [09:19:56] yeah that's why I'd rather have a 2nd pair of eyes double checking my change :) [09:20:57] moritzm: my understanding of https://github.com/wikimedia/operations-puppet/blob/460a2966ef33c2775e70844f7c8b72860d220606/modules/profile/manifests/homer.pp#L42 is that deleting https://github.com/wikimedia/operations-puppet/blob/460a2966ef33c2775e70844f7c8b72860d220606/hieradata/hosts/cumin1001.yaml#L3 should solve it (as it's already set for cumin1002) [09:21:29] let me check [09:24:10] I think it's simpler to simply set profile::homer::disable: true in hieradata/hosts/cumin1001.yaml, isn't it? [09:24:54] that works for me too [09:25:03] adding a line vs. removing one :) [09:25:24] the code is a little strange since thre is a $disable_homer check within a code path which already checks for $disable_homer? [09:26:21] yeah I worry here that setting disable_homer, will just not explicitely remove it [09:26:42] but ignore the "absent" etc [09:26:58] yeah, the code isn't really written ina way to properly absent it [09:27:01] (I didn't see the first $disable_homer initially [09:27:25] e.g. in https://github.com/wikimedia/operations-puppet/blob/460a2966ef33c2775e70844f7c8b72860d220606/modules/profile/manifests/homer.pp#L32 there's a hard-coded "present" [09:27:40] yeah... [09:28:22] I'll run PCC with a patch removing "profile::homer::diff_timer_interval" for cumin1001 and see if it does the right thing [09:28:40] you need: [09:28:49] profile::homer::diff_timer_interval: ~ [09:29:03] but yeah, keeping it empty shoild also work indeed [09:29:12] noted [09:29:56] I'll make a followup task to fix up the homer class to properly absent the homer resources, will be useful in the future again [09:30:23] good idea! [09:31:43] 10homer, 10Infrastructure-Foundations: Update Homer Puppet classes to allow to absent Homer resources - https://phabricator.wikimedia.org/T353932 (10MoritzMuehlenhoff) [09:33:43] XioNoX: +1d [09:34:05] thx, yeah pcc lgtm too [09:34:14] sorry for not have follow up on this with top.ranks yesterday [09:35:15] and +1 for the plan (disable timer now and properly absent stuff first week of jan. [09:41:14] no pb at all! looks like we're all good [09:44:42] excellent :-) [10:34:27] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ayounsi) [12:38:39] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:38:39] (SystemdUnitFailed) resolved: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed