[10:16:55] hi, I'm working on removing tech debt for network probes in T291946 and noticed puppetdb-api ATM won't allow traffic from prometheus hosts (where the checks are coming from). I looked at the code and I see there's an 'allowed_hosts' hiera variable. I can add the prometheus hosts there, or e.g. pass in another variable where we have the prometheus hosts already, what do you think ? [10:16:55] T291946: Move service::catalog checks (“monitoring” section) to blackbox exporter and Alertmanager - https://phabricator.wikimedia.org/T291946 [10:29:58] godog: My instincts always say use the existing variable, i.e. the fewer places we need to modify the list of Prometheus host IPs if they change the better [10:32:17] I guess that is offset by how complex / messy doing so might be of course [10:34:50] topranks: yeah I tend to agree, in this case it is hostnames and the list changes on hardware refresh usually [10:35:11] not overly messy I think, mostly another variable to pass in [10:35:39] Cool. I’m far from the authority on this , might be worth [10:35:53] Seeing what some of the others on the team think also [10:36:18] cc jbond ^ as I'm seeing commits from you [10:45:41] which variable do you mean exactly? the cumin hosts have full access to puppetdb in ferm, but services like netbox and unprivileged Cumin only query the puppetdb microservice, you probably mean the latter? [10:47:34] was going to say, the microservice only exposes some data: https://github.com/wikimedia/puppet/blob/production/modules/profile/files/puppetdb/puppetdb-microservice.py [10:52:01] moritzm: I mean profile::puppetdb::microservice::allowed_hosts [10:52:56] but yes the puppetdb-api i.e. microservice [10:54:12] yeah, so if the data exposed by the microservice is fine for prometheus that seems okay to me, but please add John to reviewers, he's most familier with it [10:55:18] will do, what I'm not sure about is between appending prometheus hostnames to that variable vs adding 'prometheus_nodes' to the profile and automatically include that into ferm too [10:55:52] I'm ok with either FWIW, but wanted to check with you before the code review [10:58:26] godog: which facts do you need access to? (just curious) [10:58:55] I agree that passing hostnames is not ideal as they will go stale with time [10:59:40] XioNoX: no facts needed per-se, these are the network probes [10:59:55] essentially the same check_http_... we run from alert hosts nowadays [11:00:44] ahhh, ok, so it's to test availability of the microservice? [11:01:07] yes exactly, availability only [11:01:33] it works now from alert hosts because those get blanket access in ferm iirc [11:27:29] moritzm, topranks: looks like we dodged this months Juniper's bullets. https://phabricator.wikimedia.org/T299129 :) [11:30:04] Cool thanks for going through them XioNoX [11:33:02] *phew* :-) [12:24:43] 10netops, 10Infrastructure-Foundations: Configuration of New Switches Eqiad Rows E-F - https://phabricator.wikimedia.org/T299758 (10cmooney) p:05Triage→03Medium [12:24:57] 10netops, 10Infrastructure-Foundations: Configuration of New Switches Eqiad Rows E-F - https://phabricator.wikimedia.org/T299758 (10cmooney) [13:01:55] 10netops, 10Infrastructure-Foundations, 10SRE: Configuration of New Switches Eqiad Rows E-F - https://phabricator.wikimedia.org/T299758 (10cmooney) Currently waiting on T299759 to be completed to gain console access to these devices and begin the process. [13:05:23] Quick question - not sure if anyone might know. [13:05:27] Turns out a friend of a friend applied for the open role on our team. [13:05:54] This was a few weeks back. They've had no response yet, and through the grapevine my own friend contacted me to ask if I were able to get anyone to let them know on the status. [13:06:12] I guess I could just ask someone in HR? [13:06:25] topranks: in theory hiring manager first [13:06:27] I don't know this person so I'm not trying to vouch for them, but I said I'd see if I could poke someone into responding / confirming status for them. [13:06:29] but dunno who that is right now [13:06:52] heh yeah that would have been Joanna right? would have been the person I asked alright. [13:07:20] Not sure if Leo has access to that now, or if there is someone else better suited. [13:07:52] topranks: I think Lukasz [13:08:18] Ah ok maybe... he co-ordinated the previous call we had so that might make sense. [13:08:24] I'll go annoy him. Thanks! [13:10:37] I can take a look and follow up too if it helps [13:15:26] but i think Lukasz is the point person for the IC hiring pool [13:51:28] godog: fyi puppetdb-api, all that sounds good to me, add me as a revier and will take a look monday. im out today and have very limited connectivity [14:01:14] jbond: ack, will do! thanks [14:21:57] lmata: thanks Lukasz was able to assist cheers :) [20:57:41] jhathaway: hey you around? [20:57:52] topranks: yup [20:58:04] I'm wondering did I imagine it or did I see you mention something about decommissioning something called "sodium" ? [20:58:20] yes, I decommed it earlier today [20:58:30] ok cool. I've no idea what that is exactly. [20:58:40] I believe an issue I'm hitting is due to that though. [20:58:41] it was the old debian mirror server [20:58:46] Ah ok cool [20:59:06] what did I break :) [20:59:27] Our "capirca" tool which builds ACLs for the core routers is failing, the hostname must be referred to somewhere and now it's not finding it in Netbox. [20:59:46] It's fine, if it's definitely gone I should be able to remove the offending entry [20:59:55] give me a minute to dig it out I'll let you know what it is for reference [21:00:06] ok cool, sorry [21:01:36] ah it's no hassle [21:01:39] jsut saw the daily diff email, let me know if I can help [21:01:41] yeah you can see where it is here: [21:01:43] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/homer/public/+/refs/heads/master/policies/cr-analytics.inc#49 [21:02:06] XioNoX: Thanks, I think all is in hand. [21:02:35] If your online, that system "sodium" no longer exists, so my thoughts were to remove the terms referring to it in cr-analytics.inc ? [21:02:43] topranks: yeah removing that should be fine [21:02:58] jhathaway: it got replaced by mirror1001? [21:03:02] yeah [21:03:09] the following rule can go as well, rsync-http-https [21:03:12] perhaps mirror1001 should go there instead [21:03:24] happy to make the commit as well if you would like [21:03:30] Reason I'm looking is I was helping Andrew with something else and homer is failing to run now [21:03:35] https://www.irccloud.com/pastebin/6LYGoM88/ [21:04:20] topranks: ideally use "mirror_group" instead of specific hostnames so it will be dynamic if more hosts are added (or if this one is re-imaged) [21:04:28] hmm, we do need inbound ssh connectivity to mirror1001 [21:04:45] as debian uses an ssh forced key to push updates [21:04:55] XioNoX: +1 that makes sense. [21:04:56] so maybe it is a little more complex [21:05:15] debian pushes don't come from our analytics networks [21:05:30] ah, right [21:05:41] taavi: thanks, forgot this was only from analytics [21:06:11] yeah, but the existing rule must be there for a (maybe no longer valid) reason [21:06:11] unsure what to do here tbh [21:06:14] in that case both of the sodium rules should be okay to pull [21:06:27] The existing rule is no longer of any consequence - as the target machine doesn't exist. [21:06:46] true :) [21:06:48] Whether some other rules should be amended, or new ones added, based on that bit of work I can't say for sure. [21:06:49] yeah I think the ssh piece was a mistake [21:06:59] But I'm of the opinion there is no harm removing those existing terms for now. [21:07:00] +1 on removing it then [21:07:05] I agree [21:07:20] And we can review if other modifications are needed Monday (if they are that problem already exists we're not making it worse0 [21:07:26] ok thanks [21:08:15] adds arguments toward https://phabricator.wikimedia.org/T298087 :) [21:09:32] "You do not have permission to view this object." :( [21:10:32] yep 100% XioNoX [21:12:11] taavi: it's restricted to "WMF-NDA" because security related discussions, the main topic is about if we still need or not the dedicated analytics vlan (and thus its ACLs [21:12:17] ) [21:13:52] https://gerrit.wikimedia.org/r/c/operations/homer/public/+/756057 [21:15:55] topranks: you get the medal for the most detailed commit messages :) [21:16:07] haha [21:16:27] My biggest problem is trying to say less I think :) [21:17:10] haha it's great though! especially when referencing old CRs/tasks later on [21:25:33] Thanks for the help guys that change has gone through now :)