[08:29:55] dhinus: good morning. I would like to merge this to re-arrange the imports. https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/25 it should help with the next patch [08:34:48] morning! [08:34:52] let me have a look :) [08:38:58] thanks [08:42:01] arturo: makes a lot of sense to me, approved [08:42:17] thanks, merging [08:42:29] I'll rebase the next one and send your way. That one will be bigger [08:53:13] dhinus: https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/23 plan in the latest comment [09:07:58] looking [09:14:55] I'm not understanding why 2 resources (the port_v2 ones) are modified and not imported... I don't see them in the preious imports, so why are they already existing? [09:15:12] because I added a description [09:15:31] and also because I'm assigning a fixed IP, which was otherwise just computed before [09:19:47] I think they will be imported+modified [09:19:58] that is how I interpret the plan [09:20:07] # module.ports.openstack_networking_port_v2.port["cloudinstances3-flat-gw-test"] will be updated in-place [09:20:07] # (imported from "286aa59e-b288-46c0-a5ea-ee3c5de5af26") [09:25:39] "updated in-place" usually means they are already in the state [09:25:52] and I don't understand how they ended up in the state :) [09:26:14] they are not at the moment. They are added with the patch [09:26:19] they are imported, then updated [09:27:10] ok, imported+modified in the same patch! [09:28:12] yep I think you're right, that's the "imported from" line underneath the "updated in-place" [09:28:55] I could avoid any modification at all, not setting the name and description [09:29:14] I could do that in a follow up patch [09:29:17] no that's fine [09:29:34] I just was not understanding the plan output, because I never did an import+modify in place [09:29:43] so I was not used to this output, but I think it's good [09:30:02] so "cloudinstances2b-gw-lan-flat" is a name you added, and that port is currently unnamed? [09:30:08] in openstack [09:30:27] yeah, all the ports are mostly unnamed in the openstack db [09:30:36] ok all clear! [09:30:37] because they are usually created automatically [09:30:46] approved! [09:30:57] thanks, merging [09:31:32] yeah it's a classic problem also in AWS, when you create things with CLI or GUI some resources are created automatically, but in tofu you have to specify them all [09:31:43] which is generally good because you understand better what's going on [09:31:47] and how many things are actually created [09:31:59] yeah [11:46:57] dhinus: I'm seeing this on clouddb1013 logs: [11:46:59] https://www.irccloud.com/pastebin/knKe9uJQ/ [11:49:00] weird. I haven't touched that body, will have a look [11:49:27] thanks [11:52:24] dhinus: what I'm researching is an user report of not being able to connect to either toolsdb or the replicas [11:52:45] the replica.my.cnf file has been created already [11:54:07] and maintain-dbusers reports this in the logs [11:54:09] https://www.irccloud.com/pastebin/GOwT5vQQ/ [11:54:15] I'll open a phab ticket [11:56:52] thanks [11:59:03] T371011 [11:59:04] T371011: toolforge: some accounts don't have access even though maintain-dbusers claims it has created everything - https://phabricator.wikimedia.org/T371011 [12:01:17] dcaro: are you available to investigate this? [12:02:13] hmf I pasted a (hashed) password inadvertently to phab [12:02:28] we should rotate it for user s56035 [12:02:42] but I see the user and grants do exist in toolsdb [12:02:55] let me see if I can connect [12:03:26] I tried this command: [12:03:27] sql -v local [12:03:37] from within the tool account [12:03:39] arturo: let me check it out [12:03:48] (lexica-tool) [12:03:49] dcaro: thanks [12:06:33] the credentials file was created at 11:58 on the 24th [12:07:07] Jul 24 11:58:34 cloudcontrol1005 maintain-dbusers[1353413]: ERROR [root.inner:160] Request to create replica.my.cnf file for account_type tool and account_id tools.lexica-tool failed without response. [12:07:08] dhinus: do you know how to rotate the password? [12:07:17] I can change it manually [12:07:26] there's a few other tools that failed at that moment too [12:07:33] but that won't update maintain-db-users [12:07:45] I guess maintain-db-users has a database of passwords? [12:07:50] or how does it work? [12:08:10] I'll try changint the pwd manually for that user, and verify that I can connect with the new pwd [12:08:14] dcaro: so, maybe it failed the first time, but already had generated the password. Then the next time, created the account in the DBs with a different password? [12:08:17] then we can figure out the maintain-db part [12:08:28] arturo: I think so yes [12:08:52] ok, I think that sounds possible [12:09:23] dhinus: yep, maintain-dbusers has it's own database [12:09:47] you can remove the replica.conf file and it will be rotated iirc [12:10:24] there's a lot of logs about one tool not having a home dir, Jul 24 18:58:21 cloudcontrol1005 maintain-dbusers[1371668]: ToolforgeUserFileBackend:Skipping account daxserver: Home directory (/srv/tools/home/daxserver) does not exist yet [12:10:28] since yesterday [12:11:17] I don't know what that really means, about the homedir. That feels like some kind of partial account. I checked maintain-kubeusers already, and it is working just fine [12:11:45] it's a user [12:11:53] pwd changed in toolsdb, and I can connect [12:12:01] so I guess the pwd in replica.my.cnf is not correct [12:12:20] dhinus: thanks, I think that confirms the earlier theory [12:12:25] I'm recreating the pw for lexica-tool [12:13:18] arturo: for users iirc the home is created when the user first logs in [12:13:55] yaeh, makes sense [12:14:02] by PAM mkhomedir [12:14:04] in the bastion [12:15:32] are the docs here up-to-date? https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Regenerate_replica.my.cnf [12:15:41] about regenerating the replica.my.cnf file [12:15:44] I'm checking the list of tools that failed during the same time that lexica-tool [12:16:17] arturo: I think so yes [12:18:11] ok, shall I try it for lexeme-tool? [12:18:16] sorry, lexica-tool [12:19:06] the user reports it is working now, so I'm not touching it again [12:21:11] I think it might be working on toolsdb because I fixed it manually [12:21:32] but it will not work in wikireplicas, and the pwd in the db for maintain-db-user is also out of adte [12:21:35] *date [12:21:51] ok, then let me run in [12:22:04] yeah let's do the full "regenerate" as described in the wiki [12:22:54] mmmm, but dcaro already did earlier? [12:23:00] I'm recreating the pw for lexica-tool [12:23:05] oh sorry I missed that line! [12:23:31] I still see my manual pwd in replica.my.cnf though [12:23:48] maybe that was a race condition. I'll run this [12:23:55] aborrero@cloudcontrol1005:~ $ sudo /usr/local/sbin/maintain-dbusers delete tools.lexica-tool --account-type=tool [12:24:08] lgtm [12:24:15] ok, running it [12:24:29] https://www.irccloud.com/pastebin/xFMSKz5r/ [12:24:59] I'm now watching the maintain-dbuser logs [12:25:37] replica.my.cnf in the tool's dir was deleted [12:25:43] now maintain-db-user should recreate it [12:26:23] just now [12:26:23] Jul 25 12:25:59 cloudcontrol1005 maintain-dbusers[2204447]: INFO [root._populate_new_account:697] Wrote replica.my.cnf for tool tools.lexica-tool [12:26:46] it's there [12:26:59] but I don't see the log entry for the DB account creation [12:27:25] I cannot log in to toolsdb with the new password [12:27:46] ok, just now [12:27:47] https://www.irccloud.com/pastebin/mgjpVDqc/ [12:27:48] it will get there, give it a minute [12:27:55] just rotated all the others too [12:27:56] * arturo was impatient [12:28:23] now I can connect! [12:30:26] all the others can connect too now [12:30:29] thanks you two for the assistance [12:31:02] feel free to write your conclusions in T371011 [12:31:03] T371011: toolforge: an account don't have db access even though maintain-dbusers claims it has created everything - https://phabricator.wikimedia.org/T371011 [12:34:41] I'm not sure about the actual cause of the issue, dcaro do you understand what happened exactly? [12:35:18] arturo: now I see your comment in phab with your theory [12:35:26] 👍 [12:35:43] and dcaro's comment with the root cause [12:35:55] I don't have anything else to add :) [12:36:03] dhinus: it's related to the envvars api path changes, at some point it broke replica_cnf new account creation, yep in the task [12:36:18] all clear! [12:36:50] excellent, thanks [12:37:13] oops, my bad for forgetting to add an s in two places to replica_cnf the other day 🙈 [12:38:09] xd, no problem, we should finish up adding monitoring to maintain-dbusers to avoid that in the future [12:38:37] T332955 [12:38:37] T332955: [maintain-dbusers] Generate prometheus metrics - https://phabricator.wikimedia.org/T332955 [12:38:46] if anyone wants to give it a go [12:39:28] there were a couple patches also that never got merged improving the cli iirc/refactoring that we wanted in before the prometheus metrics (as it makes it easier) [12:39:44] https://gerrit.wikimedia.org/r/c/operations/puppet/+/908843 [12:40:27] arturo: I'll leave you the honor of resolving T371011 [12:40:27] T371011: toolforge: an account don't have db access even though maintain-dbusers claims it has created everything - https://phabricator.wikimedia.org/T371011 [12:40:51] ✅ [12:40:58] again, thanks everyone [12:41:11] the 'claims it has created everything' in the title is a bit misleading, it did not claim it, it actually said it failed in the logs [12:41:51] maybe just keep just the first par? "toolforge: an account don't have db access" [12:41:54] or even the account name [12:42:48] anyway it's explained in the comments so I think that's fine [12:42:52] it's ok, yep [14:57:19] arturo: I need to run the same command [14:57:39] dhinus: which command? [14:57:49] arturo: I need to run the same command "maintain-dbusers delete tools" for another tool, did you trigger maintain-dbusers after that, or just waited for it to start automatically? [14:57:59] arturo: sorry the first message was incomplete :P [14:58:18] context: T326613 [14:58:20] I just run the command in the shell and waited, watching the service logs [14:58:30] sounds good, I will do it after the team meeting [14:58:51] ok [16:02:01] * arturo offline [16:02:12] arturo: just checked, validating admission policies don't allow mutation, we will have to be careful with that [16:24:29] * dhinus offline [17:11:16] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056263 is a small bug fix for Striker that could go out when any SRE has the time to do the puppet merge dance. [17:26:38] bd808: merged :), /me off [17:27:04] feel free to tm me if you need anything (I'm also oncall if you need) [17:27:10] cya \o [17:31:09] thanks dcaro. :)