[08:25:31] morning [08:53:51] morning [09:03:18] morning [11:17:33] ssh root@pawsdev-bastion.pawsdev.codfw1dev.wikimedia.cloud [11:17:43] wrong window? [11:17:51] Or send incomplete... [11:18:04] ssh rook@pawsdev-bastion.pawsdev.codfw1dev.wikimedia.cloud # does not work [11:18:22] nevermind with the paste window, it is apparently beyond me [11:18:35] Anyways, I can get into that system as root but not as rook, what am I doing wrong? [11:19:03] I seem to be a member and reader of the project [11:20:05] let's have a look [11:20:11] 2024-01-24T11:18:48.669221+00:00 pawsdev-bastion sshd[77005]: pam_access(sshd:account): access denied for user `rook' from `172.16.128.19' [11:20:44] at least in LDAP, you're a member of paws-dev but not pawsdev [11:21:26] but you are a member in the openstack database. odd [11:22:42] Did these not transfer to ldap? https://www.irccloud.com/pastebin/Cxd3i5g0/ [11:23:14] (back in a moment) [11:23:31] seems like that's indeed broken. I'm trying to figure out why that is [11:25:27] ldap.CONNECT_ERROR: {'result': -11, 'desc': 'Connect error', 'ctrls': [], 'info': '(unknown error code)'} [11:28:33] Oh, a connection error could explain it [11:29:07] I think it's a TLS issue of some sort [11:29:17] with the keystone ldap integration, that is [11:34:01] https://gerrit.wikimedia.org/r/c/operations/puppet/+/992666 should fix it I think [11:55:34] that didn't seem to be it. I'm going for lunch but will continue looking once i'm back [11:57:22] Thanks! [12:13:22] this almost feels like something is not picking up the new config. very odd. [12:15:31] Rook: I tried restarting things a bit more and seems like that did it. try now? [12:17:30] That did it. Thanks! [12:50:16] https://admin.beta.toolforge.org/ is now running on a debian 12/containerd based worker [13:12:41] \o/ [13:53:08] dcaro: https://phabricator.wikimedia.org/tag/toolforge/ - what's the difference between 'backlog' and 'ready to be worked on'? [13:54:06] taavi: it's just a clean column to start reviewing from backlog there, I though I might just keep the new one and hide backlog. The name could have been 'backlog 2' but I though that 'ready to be worked on' was clearer xd [13:54:17] ah lol [13:55:52] feel free to review tasks and move to that column too, it might take me a couple days to get to all of them before I empty the 'backlog' one, then I'll start with the triage ones, but as a routine chore [14:44:28] Do we want to setup ELK on toolforge (for toolforge-related logs)? T141500 [14:44:28] T141500: Setup ELK based logging for tool labs infrastructure components - https://phabricator.wikimedia.org/T141500 [14:45:15] If nobody wants to do that soonish, I'd close it and re-create when we have the need and time to work on it [15:39:01] taavi: which trove db did you try to backup yesterday? [15:39:27] I have a fix but a) it only works on mysql and maria and b) I'm only 95% sure it won't break things [15:39:50] andrewbogott: the one in metricsinfra. that is mariadb, let me take a manual backup first to account for that 5% :D [15:40:06] thanks! [15:44:25] ok manual backups taken, what do you want me to try? [15:46:57] I can do it or you can... "openstack database instance rebuild 2d99da2f-8690-4deb-8222-3c55899b0287 [15:47:10] That should transplant the database into a fresh engine with updated config [15:47:25] I've seen it work! But it definitely doesn't work on postgres [15:48:48] sure, you can do it as well [15:50:53] 'k [15:51:53] if you are used to ssh'ing into the trove VM, this will break the host keys [15:52:25] k [15:56:12] well that seems to have not worked at all [15:56:30] I fear it needs to be based off of a new image in order to upgrade to a new image :/ [15:57:28] I can just migrate this instance to a new one if that's easier [15:58:35] Well, there's no particular problem I'm trying to solve at this point other than "find a comprehensive way to update existing old Trove VMs" [15:58:50] which may turn out to require a lot of hands-on work but let me dig a bit deeper [15:59:03] If you migrated you'd just do a manual dump, build a new db instance, and import? [15:59:30] yeah [16:00:21] ok, let's wait on that until I'm fully out of other options [16:00:27] sure [16:01:02] the good news is that the upstream folks seem to have fixed a lot of things. The bad news is... we have a lot of users on dbs from before that [17:44:03] * dcaro off [17:45:26] hey folks sorry about the late notice but I was wondering the situation with cloudweb2002-dev [17:45:46] this is in rack b5, where we are hoping to move all hosts from old switch to new switch tomorrow as part of upgrade cycle [17:46:33] so that involves moving it's uplink cable from one device to another, interrupting comms [17:46:43] topranks: downtime with that host is just fine, it's short right? [17:46:59] I know it's not "specifically" a WMCS box, in terms of Wikitech I think we can just deal with the breif ooutage [17:47:16] it's strictly test/dev so it's no problem at all [17:47:17] andrewbogott: yeah we're saying 60 seconds to everyone, but likely 10-20 [17:47:20] ok [17:47:20] wikitech lives on cloudweb1003/4, that is a testing box that serves no real traffic [17:47:23] cool thanks! [17:47:25] yeah that's totally fine [17:47:28] ah ok thanks