[08:10:33] https://phabricator.wikimedia.org/T389919 [08:53:12] taavi: thanks for quick actions [09:05:06] arturo: please review https://github.com/toolforge/paws/pull/484 [09:13:18] taavi: did you deploy in tools? I don't see the cookbook run results in the MR [09:13:53] dcaro: yes https://sal.toolforge.org/log/IqGDzJUBffdvpiTrabTW [09:14:02] ack thanks [09:19:37] taavi: I think you need to set me as reviewer? I don't have any review button [09:20:16] hmmm you seem to be missing from the toolforge org [09:21:01] arturo: you should have an invite nw [09:21:26] got it [09:22:40] taavi: MR approved [09:22:48] do we have a ticket to migrate all github repos to gitlab? [09:23:11] no idea, but I was also about to make one if we do not [09:23:18] please do [09:23:51] T327057 [09:23:51] T327057: [toolforge] repositories move to gitlab - https://phabricator.wikimedia.org/T327057 [09:24:04] paws has T373896 which is sort of related [09:24:04] T373896: Can gitlab build docker images? - https://phabricator.wikimedia.org/T373896 [09:24:06] T295754 [09:24:07] T295754: Investigate gitlab for PAWS - https://phabricator.wikimedia.org/T295754 [09:24:50] as far as I know, you can do whatever in gitlab runners [09:25:00] including building docker images [09:25:33] it was added after yep [09:25:58] (not sure of the details though) [09:27:53] optional review: https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1129305 otherwise I'll just merge [10:29:36] dhinus: do you know if we have wikireplicas grafana panels that show usage, like how many connections there are, or similar? [10:30:01] not that I know of, I actually have an open ticket about adding more stats [10:30:20] we have some generic mysql stats [10:30:41] let me find the link [10:30:43] thanks [10:31:44] https://grafana-rw.wikimedia.org/d/000000273/mysql [10:32:26] if you select clouddb* hosts you get some rough numbers [10:32:52] the task to gather more data is T381587 [10:32:53] T381587: [wikireplicas] Gather usage stats - https://phabricator.wikimedia.org/T381587 [10:41:03] thanks [10:48:19] dhinus: clouddb1019 and 1020 are the entry point proxies? [10:50:01] nope, the proxies are cloudlbs [10:50:07] clouddbs are all actual databases, see https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Physical_database_server_layer [10:50:14] oh, ok [10:56:34] I created this to summarize ops/s [10:56:34] https://grafana-rw.wikimedia.org/d/a688b60e-d3a8-478c-8c75-9b06c1ea32a9/wiki-replicas?orgId=1 [10:57:20] I assume ops means database queries? [11:21:54] yes that is indeed the number of SQL queries, I double checked in the mysql exporter [11:27:48] manuel told be it may be misleading, because a 6k rows query can be reported as 6k different ops [11:29:22] hmm maybe because of row-based-replication. the value comes from the "Queries" values in "SHOW GLOBAL STATUS", but I'm not sure how that value behaves with replication [11:30:10] https://mariadb.com/kb/en/server-status-variables/#queries [11:31:04] I guess it will also include replication statements, not just user queries [11:31:22] so it's not very useful because it's affected by the amount of writes to the prod db [11:31:51] I see [11:44:54] maybe we could enable this (currently disabled) https://mariadb.com/kb/en/user-statistics/ [11:45:20] which would also allow us to see the heaviest users [11:45:36] but I'm not sure if enabling that has a performance impact [11:47:01] maybe we could try to enable, and monitor it closely for a few days [11:47:54] yep, and we can also enable it on one host only [11:49:46] I slightly remember that it does have some impact, not sure if it will be enough to make it not worth having [11:59:52] mmmm [12:00:01] I detected a potential problem with the latest tofu-infra refactor [12:00:24] resources from different deployments may have same state reference name [12:00:33] example: [12:00:46] does this `module.project["admin"].openstack_networking_network_v2.network["lan-flat-cloudinstances2b"]` means the network from eqiad1 or codfw1dev? [12:00:52] don't we store them in separate tfstate files? [12:01:28] correct [12:01:32] maybe, but the code is the same [12:01:35] so look at this patch [12:01:36] https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/157/diffs [12:02:00] the move {} block is misleading and tofu cannot process it when running tofu plan for eqiad1 (which is what the cookbook does) [12:03:21] I could merge anyway, then run tofu apply in codfw1dev only, then drop the move {} blocks [12:03:33] but it doesn't feel very elegant [12:03:52] I see the problem. maybe there's a way to selectively apply a move block (only when you run the plan for codfw)? [12:05:19] another solution might be to wrap everything in two high-level modules, so you end up with something like module.deployment["codfw"].project["admin"], etc. [12:05:54] yes, they are the 2 options I have in mind at the moment [12:06:26] I think to unblock this one I'm fine with the merge anyway+delete... but we should fix it for future cases [12:07:05] ok [12:07:51] a third option might be something like https://github.com/antonbabenko/terraform-best-practices/tree/master/examples/medium-terraform [12:08:31] which is kinda similar to the two high-level modules [12:08:48] but you don't have any "main.tf" at all outside of those [12:12:42] I don't have time at the moment for another refactor, so I think I'll just do the trick by hand [12:14:59] sgtm, I was just brainstorming ideas for the future :) [12:15:35] thanks :-) [12:18:14] ok, I will be doing it now, will report after a few minutes how it went :-P [12:22:01] done! success [12:23:34] taavi: the new network names display way better in horizon [12:23:36] https://usercontent.irccloud-cdn.com/file/PD0b6A9U/image.png [12:24:09] can we hide wan-transport-codfw from the options for non-admins somehow? [12:25:12] hopefully! [12:27:09] I'll leave that for our horizon expert [13:37:05] arturo: 'HA network tenant admin' has shared=False and doesn't show on that panel. wan-transport-codfw has shared=True; I suspect that's why it appears. Is there any reason I shouldn't change the ownership/access on transport-codfw to match the HA network? [13:37:34] warning, the HA network is automatically created by neutron [13:37:38] we don't manage that network [13:38:02] regarding the `shared` settings [13:38:21] it has consequences on how the shared virtual router can use the network [13:38:40] basically, it means the network can be used by whatever tenant [13:38:58] I assume if you set shared=False, then the shared router wont be able to use it [13:39:08] but I may be misunderstanding the semantics of that setting [13:39:16] possible. I don't know either, I'll read a bit [13:39:30] but if every project needs /access/ to that network then it seems unlikely that Horizon will want to hide it [13:39:48] VMs don't need access to that network [13:39:52] Part of why I added support for a default network was to avoid confusion about that [13:39:53] only the shared router [13:40:10] does the shared router live in a particular project? [13:40:15] admin [13:40:41] ok! So we want that project to be scoped to just the admin project. That should certainly be possible... [13:41:10] well, all shared networks belong to the admin project [13:46:17] the neutron documentation on this topic is particularly unreadable [13:47:20] I have been trying to understand it for years [13:47:49] arturo: I switched that network to unshared and all the network tests passed (and it no longer apears in the UI) [13:47:51] what else should I check? [13:48:09] nothing else I guess! [13:48:25] does creating a new VM still work? [13:48:31] good question :) I'll try [13:56:29] yes, VM creation still works [13:57:33] arturo: want me to do the same with wan-transport-eqiad ? [13:57:59] sure [13:58:05] you are doing it via tofu-infra, right? [13:58:24] no but I can [13:58:42] otherwise whatever setup will be reverted in the next run [13:58:55] ok [14:04:09] arturo: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/161 [14:05:54] andrewbogott: -1 [14:06:23] oops, reading fail [14:08:29] should be better now, I hope [14:10:42] yeah [14:11:33] I'm curious to see how tofu will report the changes [14:11:52] I will let you apply when you feel like it. there's certainly nothing urgent about the change. [14:12:02] please you do :-) [14:12:10] ok! [14:12:11] tangent: do you have thoughts about the new network panels in codfw1dev? I'm pretty sure they don't allow normal users to do anything dangerous but because I'm usually logged in as admin some of it looks very scary :) [14:12:36] can you remind me the url / how to navigate to them? [14:13:27] tofu plan seems normal: [14:13:30] https://www.irccloud.com/pastebin/mZN3wCZY/ [14:13:38] want me to look for anything else before I confirm [14:13:39] ? [14:14:01] andrewbogott: you may want to use the cookbook [14:14:04] New panels are just under the 'Network' tab in labtesthorizon [14:14:08] there is a cookbook for this workflow [14:14:10] Yeah, I am [14:14:24] but it still prompts for changes [14:14:55] right, then what is confusing me is that the plan was not run before merging the change [14:15:15] the workflow is usually MR -> plan -> merge -> apply [14:15:37] does that mean I should've run the cookbook before merging? [14:15:53] yes, there are 2 cookbook runs involved, one before merge, other after merge [14:16:09] oh! I did not know that at all. I only knew about the 'apply' cookbook after merge [14:16:12] but anyway, the patch has been merged, so I guess you can go ahead and confirm the changes [14:16:24] what is the pre-merge cookbook? [14:16:41] it runs `plan` for the MR [14:16:51] so you can see the actions of tofu before merging the code [14:17:11] I see, that makes sense. [14:17:20] Is that something gitlab could/should do in the CI pipeline? [14:17:55] yes: T370652 [14:17:56] T370652: tofu-infra: introduce additional gitlab-ci automation - https://phabricator.wikimedia.org/T370652 [14:21:00] andrewbogott: regarding the horizon panels, there are a couple of 'delete' buttons that I don't like. I assume they are there because I'm admin [14:21:29] https://usercontent.irccloud-cdn.com/file/QilBzAWY/image.png [14:21:53] I doubt you can delete a subnet if it has ports attached to it anyway [14:22:06] yep, that's the scary bit! [14:22:16] pair that with the fact that this is all defined in tofu-infra [14:22:24] Of course even if horizon showed you that, the neutron policies would prevent it. [14:22:38] When I use my mortal user account the buttons aren't there [14:22:45] ok [14:22:56] anyway, subnets being busy, and tofu-infra combination, it seems difficult to fat-finger into a mess [14:23:16] https://usercontent.irccloud-cdn.com/file/JoxmZFqH/Screenshot%202025-03-25%20at%209.22.48%E2%80%AFAM.png [14:23:32] ^ that's for someone without admin role [14:23:58] look in the network topology section [14:24:26] https://usercontent.irccloud-cdn.com/file/AQ54fgkb/image.png [14:24:52] wow, I wish the horizon people wouldn't duplicate their action buttons all over the place [14:25:28] and, dammit, the 'delete' button appears there for a non-admin. Sloppy. [14:25:36] Guess I need to make an upstream patch. [14:25:46] :-( [14:26:19] * arturo food time [16:17:13] task about tofu-infra state backup: T389964 [16:17:14] T389964: tofu-infra: implement some state backup mechanism - https://phabricator.wikimedia.org/T389964 [16:17:20] thanks! [16:17:31] +1 [18:35:17] * dcaro off [18:41:03] btw. dhinus thanks for attending the k8s sig! [18:45:52] np!