[07:37:16] morning [07:59:21] morning [08:40:29] o/ [08:41:05] I'm deploying cleanup changes to tools and toolsbeta related to the PSP deprecation project. No impact expected/intended [08:41:32] ack [08:42:36] arturo: can you please take care of updating https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Pod_isolation_(PodSecurityPolicy)? [08:43:49] sure [08:52:14] can I close T362443 and its subtasks or does someone still want to learn something? [08:52:15] T362443: Learn how to do what Taavi does - https://phabricator.wikimedia.org/T362443 [09:05:43] * arturo refreshes a few wikitech pages about PSP [09:10:01] tf-infra-test breaks because the Trove instance flavors have changed: "Flavor 248 could not be found" [09:10:53] if I create a Trove instance from Horizon I see a lot of flavors (including some g3 and g2) [09:11:38] do we already have a task to clean up that list? [09:12:20] I bet we don't, you may be the first to discover this problem [09:12:36] dhinus: most likely you see those because you're an admin [09:12:46] taavi: ah that makes sense [09:12:50] https://openstack-browser.toolforge.org/ has a list of flavors that most people see [09:13:13] that one looks good [09:13:27] I will fix the flavor used by tf-infra-test [09:20:35] https://github.com/toolforge/tf-infra-test/pull/13/files [09:21:27] +1 [09:21:35] blancadesal: I assigned https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/346 to you [09:21:56] arturo: 👍 [09:31:28] taavi: if you are not finding anything to do today, the CI seems broken for toolforge-weld https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/47 [09:33:55] arturo: huh. let me have a look [09:36:39] thanks [09:37:21] https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/48 [09:38:26] taavi: LGTM, that was fast [09:38:32] hmm ""Flavor b204f489-f1a5-4d91-9a11-af1ae8b66bea is not supported for datastore version 379d8765-8503-4286-a046-0a4c8f8f745a" [09:38:52] it did work for mysql and postgres, but not mariadb [09:46:28] hmmm I've never seen that before [09:52:01] I can reproduce it from the CLI [09:52:08] arturo: done [09:52:15] blancadesal: thanks! [09:52:50] uh https://wikitech.wikimedia.org/wiki/Help:Trove_database_user_guide#Flavor_is_not_supported_for_datastore_version_(HTTP_400) [09:53:36] file under: searching for the answer on google, and the third result is wikitech! [09:57:44] https://github.com/toolforge/tf-infra-test/pull/14 [09:58:33] ship it [09:59:52] * dhinus merges and relaunches tf-infra-test [10:22:41] dcaro: friendly reminder about the cephmon racking task [10:23:07] oh you already did, sorry [10:23:25] please ignore [10:31:45] toolforge ci/cd is failing for the wmcs-k8s-metrics repo :-( [10:31:45] https://gitlab.wikimedia.org/repos/cloud/toolforge/wmcs-k8s-metrics/-/jobs/297020 [10:54:20] I believe this is the fix: https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/365 [11:16:58] dhinus: can I assign T367725 to you? I realized I won't be around in two weeks to delete them [11:16:58] T367725: Get rid of cloud-cumin VMs in cloudinfra project - https://phabricator.wikimedia.org/T367725 [11:23:07] has anyone used the `/srv/ops/replica_cnf_web/functional_tests` recently? They are broken for tool accounts [11:23:13] https://www.irccloud.com/pastebin/inmIQx1L/ [11:28:33] blancadesal: not me :-( [12:02:01] taavi: yes I can do it! [12:06:52] hmm trove is still not happy with the 2GB flavor: "Flavor 41f0ea41-75ca-44a6-be66-bb56b2a90721 is not supported for datastore version 379d8765-8503-4286-a046-0a4c8f8f745a" [12:08:45] I think it might require 2 cores as well, which seems silly but is matching what I found here https://wiki.openstack.org/wiki/TroveFlavorsPerDatastore#Why_does_Trove_requires_filtering_flavors_by_datastore_.3F [12:20:33] the filter seems to be based on the mysql table "datastore_version_metadata" [12:20:42] I'm not finding a way to modify that table from the CLI though [12:22:10] it's only listing associations for the mariadb datastore, so the other datastores don't have any filter [12:22:50] I think truncating that table might be the easiest fix, unless mariadb is really unable to start with less than 2GB of ram [12:23:02] I'll wait for a.ndrew's opinion [12:25:12] relevant comment from the trove source code: https://opendev.org/openstack/trove/src/commit/e19a7d4e50edc399b83d57e905a34f96fafcad1d/trove/datastore/models.py#L762-L764 [12:27:57] this is a good example of tf-infra-test finding a genuine issue: it's currently impossible to create trove mariadb instances using /any/ flavors, because the only allowed ones are old ones [12:28:09] I'll create a task [12:30:57] dhinus: :-) :-( [12:34:50] T368725 [12:37:30] where's stashbot? :looks: [12:40:32] it's back! T368725 [12:40:33] T368725: [trove] cannot create mariadb instances - https://phabricator.wikimedia.org/T368725 [12:42:22] is there something wrong with gitlab pipelines today? https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/jobs/297110 [12:45:14] blancadesal: just passing by, that error in the tests sounds like you are hitting the wrong version of envvars-api (you can check the journalctl logs of the replica of service), and/or the logs on the envvars-api side [12:46:09] The ci error looks like ci infra related, maybe ask in the gitlab channel [12:46:56] dcaro: thanks :) yeah, I figured it's gitlab related. There was some kind of update a few hours ago [12:47:23] blancadesal: I just hit the 'retry' button [12:48:08] dcaro: yup, I'm going through the logs and it's def related to envvars-api [12:49:12] is the toolsbeta test tool meant to have the TOOL_TOOLSDB_USER envvar? [12:57:11] (disregard, I think I'm just confused. back to reading source code) [13:22:55] andrewbogott: when you have a sec can you look at T368725? [13:22:56] T368725: [trove] cannot create mariadb instances - https://phabricator.wikimedia.org/T368725 [13:25:46] dhinus: let's try truncating the table and see if it still requires 2G of ram [13:25:57] but also I feel like there's a commandline way to do this... [13:28:51] I didn't to an extensive search, but I couldn't find one [13:29:07] I could find methods in the code... [13:29:28] bah [13:29:39] if you're already in the db you should just truncate [13:30:06] I'll do it :) [13:32:01] done. creating an instance with 1gb now [13:35:08] bah, I forgot some --params I will use horizon instead [13:39:36] it's working, I can connect and create tables [13:41:03] ok then! Whatever memory hogging issue it had must be fixed. [13:47:25] I resolved the issue and updated the docs, thanks for looking into it! [13:48:18] can you please +1 this revert? https://github.com/toolforge/tf-infra-test/pull/15 [14:02:30] re-running tf-infra-test [14:25:29] aaand, it passed! [14:28:09] nice [15:42:04] * arturo offline [21:35:31] bd808: just in case you know the answer to this... do we 1) run our toolforge redis in unprotected mode, or do we 2) set a password manually, or is this 3) a new concept with the latest redis that isn't addressed in the current cluster? [21:35:52] (this question brought to you by me having just built a new node and discovering that it will only talk to localhost) [21:42:49] andrewbogott: I'm pretty sure "unprotected mode" [21:43:01] there is no password needed to connect [21:43:12] ok. I think that can be changed at runtime, I'll see if I can figure that out [21:43:17] I thought Puppet set that config... [21:44:08] seems not [21:45:00] https://gitlab.wikimedia.org/toolforge-repos/containers-redis/-/blob/main/src/containers/redis/templates/redis.conf?ref_type=heads#L2 is what I put in the new container thing [21:45:23] oh, wait, it's set in sentinel.conf but not in redis.conf [21:46:58] is the new container thing intended to replace our existing redis cluster? Am I wasting my time upgrading toolsbeta redis with the old puppet classes? [21:47:56] D.avid slapped a big "not maintained by WMCS" disclaimer on the redis container docs, so I would say not everyone thinks it is a replacment ;) [21:48:18] hmph [21:48:27] ok, well, we'll see how far I get via the lazy route here [21:48:33] * andrewbogott just hoping to kill off a few more buster VMs [21:48:39] and in practice it is only a partial replacement because it doesn't have durable storage [21:49:57] andrewbogott: the tools redis instances are bullseye [21:50:05] https://openstack-browser.toolforge.org/server/tools-redis-5.tools.eqiad1.wikimedia.cloud [21:50:05] toolsbeta is buster [21:50:22] I'm just assuming the roles work there [21:50:40] bullseye vs bookworm [21:51:36] All I know so far is... [21:51:39] https://www.irccloud.com/pastebin/sSiqASL3/ [21:51:44] which does not happen for the oldernodes [21:52:20] But I expected CONFIG SET to work via redis-cli which it does not [21:53:32] :nod: and that new redis-4 node is bookworm which is newer than the tools redis nodes. [21:54:10] oddly redis-cli prompts me with an autocomplete for CONFIG SET and then redis responds with ERR unknown command 'CONFIG', with args beginning with: 'SET' [21:54:15] <3 [21:54:51] if you were looking for a quick fix I would match the tools instances rather than jumping up to bookworm which is I assume a newer redis version with some config changes [21:55:05] that's a good point [21:55:27] I will do that once I stop raging against redis rejecting the very command that it suggests (and that appears in the help as well) [21:56:39] heh [21:56:50] computers == worst [22:05:42] yep, I guess that mode isn't the default in bullseye [22:05:57] so this will be left as an exercise for future Andrew [23:31:32] What should we do about T367555? Is there a Striker deploy in toolsbeta or codfw that takes care of the need for a pre-production deploy test? Should I continue to pretend to have time to keep the project alive or should WMCS take it over? [23:31:33] T367555: Cloud VPS "striker" project Buster deprecation - https://phabricator.wikimedia.org/T367555 [23:32:29] * bd808 will look for responses in backscroll later