[07:33:20] greetings [07:45:15] FYI I was chatting with I/F and they can probably nuke the whole maps project on cloud VPS, that would free 4 instances, 18 vCPU, 36GB RAM, 4 volumes for a total of 13TB, stay tuned :) [07:45:43] neat [07:53:07] morning! [08:14:43] Hey :) komla I suppose the email for the bullseye servers is automated, but just for accuracy's sake (I can also respond to the email if it's more convenient for you to keep track), I'm not an admin of deployment-prep in anything but a db line somewhere :D [08:16:50] claime: that's based on who has admin rights on the cloud vps project [08:17:03] volans: I figured [08:17:11] and for beta it's a lot of people :D [08:17:26] ok so everyone who's an admin for deployment-prep gets one then [08:17:30] Just wanted to make sure :P [08:17:33] keep you on your toes [08:17:47] collective you, that is [08:18:19] yes, you weren't singled out or made benevolent dictator of deployment-prep overnight [08:18:30] don't worry :D [08:19:02] I mean stranger things have happened [08:19:52] lol [08:27:40] FYI, I'll disable the Debian mirror on mirrors.wikimedia.org later, all puppetised Cloud VPS instances should have moved to deb.debian.org, but if there's any manual hacks in VMS,these will fail to "apt-get update" [08:27:57] claime: hmm, though might be interesting to drop deployment-prep owneship... I mean, give you the opportunity to have a greater organization impact ;) [08:29:09] dcaro: I love you but also please don't talk to me or my son ever again [08:29:17] hahahaha [08:29:24] <3 [10:32:49] * dcaro lunch [11:05:02] anyone for a quick sanity check of https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/318 ? [11:05:09] first project deletion for me [11:05:09] the abovementioned mirror shutdown is now live [12:15:01] volans: lgtm! [12:27:51] thx [13:00:51] volans: isn't the maps project serving tiles for external projects? Or were you talking about a different 'maps'? [13:01:26] I'm double checking the actual activities on the hosts, and from my quick look it seems somehow used [13:01:33] I was about to comment on task [13:02:07] I think dj was involved most recently, maybe ping him on the task as well. [13:02:30] yeah I checked the list of admins and it's loong [13:03:34] also, regarding project deletion I'm not 100% sure that tofu clears up things /in/ a project when deleting the project. dcaro do you remember? Don't you still have to manually delete dns domains and instances &c? [13:05:49] for the one I already did I checked on horizon, the only thing left was the default security group with the default security rules [13:05:58] before running the cookbook [13:06:34] oh great :) [13:07:37] andrewbogott: it seems we have another user that was not added to the project-bastion, what's the procedure? (got pinged in slack) [13:08:13] ah, found it https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Users_not_in_bastion_project nice [13:08:34] hm, seems to be happening more and more :/ [13:08:42] but yeah, you can just add them manually. [13:09:09] I'd want to dig in the logs and see what happened except https://phabricator.wikimedia.org/T421911 which in theory I am working on this week [13:09:29] 🤦‍♂️ [13:09:44] so I'll go ahead and unblock the user right? [13:10:04] yep [13:15:21] Hmmm, I think the user was already in the bastion project [13:16:48] maybe not, the ldap group just appeared 👍 [13:21:49] did anyone already make a ticket for "Power Supply - Status - issue on cloudbackup2003:9290" ? [13:22:49] andrewbogott: maintenance on mr router, see -operations [13:23:46] causes a power-supply warning? oblique [13:23:51] I meant switch ( msw1-codfw) [13:24:11] all the ps1 lost connection [13:24:29] so I can expect some follow up from that as unknow power status or similar [13:24:36] if it doesn't recover soon than let's look at it [13:24:39] aaah ok, makes sense [13:24:43] thx [13:26:57] since I'm looking at the alerts... I am hoping based on the recent 'tools-platform owns PAWS' discussion that someone other than me will look at the nfs capacity warning :) [13:33:13] ...and... the tofu-infra-test failure is a mangum thing which I am fixing now [13:33:17] *magnum [13:40:49] andrewbogott: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1302748 I'm not sure how to proceed, is the tracking task enough to go forward with the patch ? [13:41:16] andrewbogott: we have the tools-platform refinement meeting in 20 mins... :) cc aputhin [13:42:39] godog: I think I meant to comment but not -1. If you're interested in doing the tofu work now we can start on that but I'm also fine with just tracking it in phab. [13:43:16] andrewbogott: ack ok! thank you, yeah I'm going to push for icinga deprecation first [13:43:24] works for me! [13:44:12] cheers, similarly I'm seeking reviewers for https://gerrit.wikimedia.org/r/q/topic:%22bug/T328502%22+is:open [13:45:54] hm, too bad we don't have ta.avi for followup [13:46:15] indeed [13:47:05] that's actually not urgent per-se, it is more me fretting about it [13:47:46] it == burn icinga to the ground [13:49:35] andrewbogott: while we're on the subject, what's your take on wikitech-static and its icinga monitoring ? https://phabricator.wikimedia.org/T362397 [13:50:39] I think we should continue to monitor it but I'm not totally sure how or from where. [13:51:00] Do we have any tools to monitor website content other than prometheus/alertmanager? [13:52:01] internal not afaik [13:52:11] claime: it is partially automated. The email went out to those listed as project admins. Yes, you can respond or create a Phab ticket. [13:52:37] andrewbogott: ok I'll draft up a plan and update the task [13:53:07] I remain hopeful that someday collaboration services owns wikitech-static but they have pushed back on the last couple of attempts. [13:53:55] I'll put a nice bow on it with refreshed monitoring [13:54:03] might help! [13:54:17] the problem is not who owns wikitech static, but who owns wikitech [13:54:24] from there the -static one is obvious ;) [13:55:16] mmmmmaybe [13:58:35] still andrew? xd [14:05:27] wikitech is 'just a wiki' now so in theory the same folks who own en or office or meta or whatever. [14:33:13] dhinus: have you ever seen the tofu infra tests say "Failed to create trustee or trust for Cluster"? [14:45:10] andrewbogott: not that I remember... [14:45:18] ok [14:47:17] I suspect some quota filling up related to auth [14:56:30] oooooh it's one of the recent CVE patches. I wonder if they just broke magnum/heat entirely? [15:42:58] yep, that was it! https://gerrit.wikimedia.org/r/c/operations/puppet/+/1303476 [15:43:24] * andrewbogott parachuted into the middle of a keystone developers standup and extracted the magic flag [15:51:39] xd [16:02:20] oneliner review if anyone around: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1303479 [16:04:36] yep you got it [16:04:40] thx [17:13:42] re: wikitech-static - i feel the onus is on us to address and stabilize the current issues that keep reappearing (e.g. / full) before we can enter a conversation about handoff to anyone else. [17:41:22] * dcaro off, cya tomorrow! [21:41:58] andrewbogott: neat. upstream merged my patch to fix the bool labels https://github.com/vexxhost/magnum-cluster-api/pull/1064 [21:42:41] nice!