[03:16:18] How to get this off phone [12:12:59] taavi: ah, Rook mentioned "And recently the permissions thing was repaired so anyone should be able to deploy magnum to a cloud vps project." [12:13:00] hmm, so is it currently available to public or no? [12:20:08] https://phabricator.wikimedia.org/T333874 should make magnum usable without special permissions. Though I don't know if anyone aside from myself has tried to deploy it. I'm waiting for someone from that ticket (though it could be anyone) to try deploying magnum before I believe it is available [12:21:04] I've tried running the magnum.tf parts of https://github.com/toolforge/tf-infra-test on my cloud VPS project, did not work [12:21:26] What error did you get? [12:21:36] timeouts. tbh, it seems any /v1/clusters endpoints are unreachable for me [12:21:40] https://www.irccloud.com/pastebin/h3qfujEu/ [12:22:29] the terraform itself gave this error: https://www.irccloud.com/pastebin/24yQPjyQ/ [12:26:58] simple curl to that endpoint: [12:27:02] https://www.irccloud.com/pastebin/UZPDx1CZ/ [12:27:51] Neat, is your code somewhere that I could see it? [12:28:35] I can publish it. Though, just to check, do I need to be connected to some kind of VPN to access these endpoints? (I'm not currently) [12:29:17] I don't immediately know. As I've always been running tf from something inside cloudVPS. I don't feel like you have to do that, but I could well be wrong [12:29:41] Indeed it's a good test for me to run [12:32:20] ooo! Maybe it doesn't [12:32:28] Rook: this is my terraform: https://gist.github.com/procii/ac00e0c211e4c30fe2c54d1c9f4d6685 [12:32:34] I seem to be able to deploy other stuff, but so far I'm not seeing the cluster [12:34:23] Rook: the issue here is that our current firewall rules block access to the magnum api from the general internet, unlike the rest of the openstack apis which are available for terraform use etc. it's technically a very easy fix in puppet, but I thought we agreed to fix the prometheus scraping issues before onboarding more magnum users [12:34:41] proc: I think you found a problem with it. Could you open a ticket with the openstack-magnum tag describing what you've found? [12:36:57] taavi: I didn't realize we had agreed to that, if so we should have delayed repair of T333874. regardless it should be fixed, and needs a ticket for such. There is a fair amount of pressure to have it deployable, though if you want to bring it up, I would encourage you to [12:36:58] T333874: Permission error while trying to create magnum cluster - https://phabricator.wikimedia.org/T333874 [12:42:10] already tracked as T325466. the last time expanding magnum use came up on -admin (2023-06-15) I raised that as a blocker and there seemed to be a general agreement for that [12:42:10] T325466: Exclude Magnum instances from metricsinfra monitoring - https://phabricator.wikimedia.org/T325466 [12:59:21] I think I might see our difference in view. My understanding is that we're looking to have magnum available, but not in horizon or announced, thus people who look around phabricator and the like can discover it for themselves and try it out, giving bug reports like above. Though lacking announcement and a tab in horizon it wouldn't be widely used. When we are comfortable with releasing it we would add it into production horizon and send [12:59:22] an announcement. Though we are not there, and have no real timeline for that. In my mind that means it is a work in progress and only those sufficiently curious should try it (and report on bugs) being in some early release stage. Where when we add it to horizon it is publicly available and supported. [13:12:48] hmm, I see. if the main issue was that projects would be getting fake alerts, then I agree that it'd be fine to just say it's an experimental thing that will have issues. however the main issue I have with magnum nodes being included in the prometheus config is that it totally ruins the data for infrastructure alerting purposes. for example I'd [13:12:48] like to use prometheus to detect if a large number of VMs in separate projects start having connectivity issues (which usually indicates an infrastructure problem), but I can't do that since magnum regularly creates vms that appear to have no connectivity [13:28:31] It sounds like you're near to getting the monitoring setup though a patched deb. I suspect if brought up as a problem it could get whatever attention it needs to get the monitoring setup. At which point there is no known problems that magnum would introduce in a supported offering. If there are any resources that would assist in getting the monitoring setup, I would be happy to encourage them to be sent your way [14:39:06] Rook: I got the config generation scripts updated, and installed a patched deb to one of the prometheus hosts. that's now working fine and correctly filtering out magnum instances. I'll ask Filippo on monday to upload the patched deb to apt.wm.o, after that's done we can update the rest of the prometheus hosts and call it done [14:40:28] oh, and I need someone to review and merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/936373/ and https://gerrit.wikimedia.org/r/c/operations/puppet/+/936376 [14:42:18] !log tools.lexeme-forms deployed 78711ad373 (l10n updates: ms) [14:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL