[08:15:44] arturo_afk, dcaro, dhinus: as the EU-side of WMCS, by any chance could someone of you please have a look at https://phabricator.wikimedia.org/T312557#8207777 and see if you have enough context to feel confident to run the sre.dns.netbox cookbook to unblock the capability to propagate DNS changes made in Netbox? Thanks [08:18:24] volans|off: looking [08:18:27] current diff is https://phabricator.wikimedia.org/P33742 [08:23:03] thanks [08:34:04] volans|off: looking [08:34:44] should be ok [08:34:55] dhinus: do you want to run it? [08:35:16] yep, I've just finished reading the wiki page [08:35:39] it should just be a matter of running 'sudo cookbook sre.dns.netbox' from cumin, right? [08:36:06] yep, referencing the task and such [08:36:08] yep [08:36:10] yep https://wikitech.wikimedia.org/wiki/DNS/Netbox#Update_generated_records [08:36:55] then we should keep an eye for misbehaving cloudvirts/openstack stuff (unlikely) [08:37:12] while you're around, there is an alert for "cloudservices1003 (WMF7225): Device is in PuppetDB but is Decommissioning in Netbox (should be Staged, Active or Failed)" [08:37:50] in other words, servers with a "Decommissioning" status shouldn't be in PuppetDB anymore [08:38:36] XioNoX: where is that? [08:38:58] dcaro: https://netbox.wikimedia.org/extras/reports/puppetdb.PhysicalHosts/ then last run [08:40:28] XioNoX: does it have any equivalent alert on alertmanager/icinga that we can monitor? (/me trying to avoid having to look at one extra alert source) [08:41:25] it seems there was some issue with the decom there https://phabricator.wikimedia.org/T304888 [08:41:37] actually T316285 [08:41:37] T316285: decommission cloudservices1003.wikimedia..org - https://phabricator.wikimedia.org/T316285 [08:41:42] sre.dns.netbox cookbook completed successfully [08:41:56] volans|off: ^ can you validate that's all you needed? [08:42:42] dcaro: they're unfortunately not host specific, for example: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=netbox1002&service=Netbox+report+puppetdb_physical [08:43:42] dcaro: yep, if the changes are propagated it's all good from our side, I hope it doesn't cause issues on the services side [08:44:06] thanks both! [08:44:17] hmm, and it does not even mention which hosts xd, do you mind if we rely on you to ping us if this happens again? (happy to help to improve the monitoring though) [08:45:15] the decom though completed, the host should have been powered off and removed from puppet [08:45:41] so unless it was re-booted later on or the power off didn't actually succeed I'm not sure why it's in puppetdb [08:45:54] but I can't dig into it today as I'm off [08:47:05] 👍 [08:47:21] I was going to ask yes, as it says it was removed from puppetdb [08:47:45] I'm supposed to be off today too xd [08:48:39] dcaro: it's not urgent, if there is a decom task it's fine to mention it there [08:50:24] the host is down (no ssh to it), debmonitor still has the host, but was last updated the day of the decom (during actually, same for puppetdb), so I'm guessing that somehow it failed to remove it. [08:53:44] btw. thanks dhinus :) [08:54:52] indeed, if when you think it's all good you could drop a line in the task that would be great (so that others will know it's now done). Thanks a lot [08:55:11] done 👍 [08:55:32] dcaro: np :) [08:55:34] ah, for the AAAA record xd [08:55:53] I'll leave that for dhinus :) (if you don't mind) [08:56:02] sure [08:57:24] yeah my message was for d.hinus :) [08:57:33] and for the AAAA records [08:58:08] icinga recovery also arrived 👍 [08:58:32] done https://phabricator.wikimedia.org/T312557#8207872 [09:00:15] thanks [10:28:02] This has to be the coolest error I've seen in a long time: ldap.UNWILLING_TO_PERFORM: {'msgtype': 103, 'msgid': 3, 'result': 53, 'desc': 'Server is unwilling to perform', 'ctrls': [], 'info': 'no global superior knowledge'} [11:00:18] <_joe_> lol [15:00:59] I am in the middle of some refactoring, is there an easy way to know which of my roles lack: a cluster definition, cumin alias coverage, owner definition? [19:03:37] number of hosts we are managing with puppet is currently at 2073. more than I thought /last time I checked