[01:10:16] * bd808 off [08:26:40] morning [08:29:29] o/ [08:52:16] morning [09:00:00] o/ [09:16:30] would it be useful to have, lets say, a script or a manifest of some kind to run against toolforge CLIs to make sure everything works as expected? [09:17:06] I'm testing k8s 1.24 in lima-kilo, but exercising by hand the different code paths for the different components is very tedious [09:18:20] anyway, in this case, we know the 1.24 upgrade is fairly simple. So maybe a problem that doesn't need solving at this time [10:13:09] this would be the best https://phabricator.wikimedia.org/T357977 [10:13:36] and reuse it in prod [10:48:05] 👍 [10:48:13] * arturo errand, back later [12:28:21] cloudnet2007-dev neutron netns alert is me, sorry about that [13:11:43] ack [13:12:39] could you please review https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/65 [13:27:41] LGTM [13:28:46] the PuppetConstantChange alert in cloudweb2002-dev is puppet trying (and failing) to start "striker" [13:29:02] dhinus: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1009350 [13:29:17] taavi: thanks :) [13:29:27] I'll ack the alert and link there [13:32:35] dcaro: thanks! [15:03:05] is this page still up to date? https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Managing_package_upgrades [15:04:00] it's listed in the "clinic duties" activities, but I wonder if it's something we still want to do [15:05:34] dhinus: Most things are handled by unattended upgrades, I would guess that that page is there to troubleshoot when security upgrades don't get applied for some reason. [15:06:21] "Description of the intended workflow and steps for keeping systems updated" is confusing [15:06:43] it mentions using clush that I think has been replaced with cumin? [15:07:54] yep [15:08:09] (as in it was replaced, we had some clush setup for longer, but it's gone afaik) [15:08:09] arturo wrote most of that page, so he might have more context [15:08:52] ubuntu was also replaced with debian [15:10:13] I'm tempted to just archive that whole page as "obsolete" and write a new one [15:10:40] sounds good to me [15:12:38] yeah, that seems fine if you're in the mood for writing [15:13:33] maybe the first section Channels of updates can actually stay [15:15:26] "Workflow" seems out of date, is there anything in there that is important to retain? maybe something about cluster-wide apt upgrades with Cumin? [15:16:16] dhinus: i have a vague memory that all of toolforge had unattended upgrades disabled at some point. we re-enabled after I pointed out no-one was ever manually upgrading anything [15:17:24] taavi: that makes more sense, maybe that's when the idea of running upgrade as part of clinic duty came about [15:18:27] I will remove the mention of that page from the clinic duties page, I don't think we need to check weekly if there are pending upgrades [15:19:12] 👍 [15:32:08] I added an "outdated" banner to the "workflow" section with the clush commands. the first time we have to do a cluster-wide apt upgrade we can document the current workflow. [15:32:16] dhinus: yes, I think that page was relevant in 2018 [15:32:41] arturo: thanks for confirming! [15:34:38] I think we can archive the whole page [15:37:27] please approve: https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/109 [15:37:41] arturo: is "archiving" simply adding the "archive" template to the top and adding the "Archive" category? [15:38:39] Sometimes we move pages to the Obsolete: namespace as well [15:38:42] dhinus: mmm I don't think I know how to do it [15:39:09] that would be nice, I think, the obsolete namespace [15:39:26] and we probably have a bunch of pages that need to go there too [15:39:28] If you do that move you should leave a redirect behind. Timo rightly gets grumpy if you don't [15:43:51] should I keep the full name "Portal:Cloud VPS/Admin/Managing package upgrades" in the Obsolete: namespace? [15:44:10] (that's the default when I use the "Move page" feature) [15:44:37] (but it looks odd as it keeps the old namespace in there) [15:44:38] yeah, because "Portal:" is actually a false namespace [15:44:43] ah I see! [15:44:57] it's just a title convention and not a proper namespace [15:45:02] we should fix that to be a real namespace someday [15:45:05] gotcha [15:45:51] page moved, leaving a redirect behind. thanks all for your help :) [16:00:25] taavi: I tried in the first place and was told to just stop. There's a phab task somewhere [16:01:42] T123427 [16:01:43] T123427: Create Portal namespace on wikitech to give a place for audience specific landing pages - https://phabricator.wikimedia.org/T123427 [16:02:44] T123425 is a trip down memory lane [16:02:44] T123425: [EPIC] Make wikitech more friendly for the multiple audiences it supports - https://phabricator.wikimedia.org/T123425 [16:58:00] so continuing the eranbot discussion a bit.. I am 100% prepared to be in the minority here but I don't consider it fully unreasonable to break the tool if we get no response at all from MA in two weeks [16:59:06] taavi: the "evil WMF breaks things for enwiki" discussion is what I would like to see avoided. [16:59:11] taavi: this is hair-splitting but I think you're right that it wouldn't be /unreasonable/ for us to break it, since we gave plenty of notice. [16:59:20] But I do think it would be unnecessary. [17:04:24] andrewbogott: I would like your input on T359412 [17:04:24] T359412: [trove] wrong quota_usages values in project tf-infra-test - https://phabricator.wikimedia.org/T359412 [17:05:41] dhinus: if you aren't interested in diving into the trove code then yeah, zero-ing out the db values is what I'd do. [17:06:01] thanks, I just wanted to make sure I wasn't making things worse by doing it [17:06:46] * arturo offline [17:15:29] hmm "UPDATE command denied to user 'galera_backup'@'localhost'" [17:16:01] add `-u root` to your `mariadb` command [17:16:17] thanks [17:40:01] * dcaro off [17:40:04] cya tomorrow [17:41:52] * dhinus off [21:08:55] Rook: is paws-puppetmaster-2 still meaningful or can it be removed? Its current clients are paws-nfs-1.paws and bastion.paws [21:09:44] oh, and it doesn't seem to have any local changes at all! huh [21:10:11] same question for the puppetmaster in quarry [21:13:39] paws should have some secrets used for the replica.cnf api on the nfs server? [21:16:09] you're right, it does have local secrets [21:16:17] OK, so I'll plan to migrate that one soon [21:16:23] Maybe I'll do metricsinfra first though [21:17:10] oh no, andrewbogott is on puppet upgrade duty again. I'm sorry this is always your job, but thank you for being excellent at grinding this stuff [21:17:44] I kind of like it, it's nice to work on something that I can tell is done when it's done [21:18:30] taavi: looks like you created metricsinfra-puppet-2 but never did anything further, is that right? Or is that project partially migrated like project-proxy [21:18:31] ? [21:19:36] I wonder if the secrets for nfs can be included in the ansible for paws [21:19:56] andrewbogott: I honestly don't remember. that might have been a testbed for the data migration cookbook I was working on at some point [21:20:12] ok -- want me to delete it or leave it? [21:20:43] up to you, whichever is easiest for actually migrating the clients over [21:21:00] ok [21:21:18] I'm going to delete it in favor of something with 'puppetserver' in the name [21:26:14] taavi: is there a part of the growing set of Striker fixes you are speed running that you would like my help reviewing or testing at this point? Much <3 for both the work and the acceptance of soft leadership there :) [21:28:13] bd808: if you could test https://gerrit.wikimedia.org/r/c/labs/striker/+/1009232 that would be great. the keystone container in my env is segfaulting which blocks me from testing it myself and so far I haven't figured out why that is. [21:29:39] I will give it a shot. And then be very humbled that the fix was this straight forward... ;)