[13:35:40] slyngs: two easy questions. 1) Is there anything special I need to do to get this change deployed? and 2) does the regex look right to you? I want to support redirects to both https://openstack.eqiad1.wikimedia.cloud:25000/protected and also https://keystone.eqiad1.wikimedia.cloud/protected (with no port specified for the latter) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1125249/4/hieradata/role/common/idp.yaml [13:36:53] um, sorry,those urls should be https://openstack.eqiad1.wikimediacloud.org:25000/protected and https://keystone.openstack.eqiad1.wikimediacloud.org/protected [13:37:02] (one with a port specified an one without) [13:37:02] 1) Should deploy automatically when merged. 2) Let me just test, it looks about right, but I'll just check [13:38:01] Regex seems fine as well [13:38:21] cool, in that case I have a third question :) [13:39:00] If you navigate to https://labtesthorizon.wikimedia.org/ you will see that there's a mismatch [13:39:23] I have seen that it sometimes doesn't work when you change an existing service [13:39:36] labtest is from a different patch, this one: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1125264 [13:39:50] What we can do it try to do a quick failover to the other host, after a restart [13:39:55] hang on [13:40:04] so, I'm being confusing, talking about two different services. [13:40:09] I think the prod change is working properly [13:40:30] but what's not working is the change to cloudidp-dev.wikimedia.org [13:41:20] If you think it's a caching issue I can just reboot the host that cloudidp-dev is running on since it's not a prod service. [13:42:39] Cool, let's try that first, again I have seen service changes not being picked up [13:43:26] ok, rebooting... [13:44:40] Oh, the regex'es aren't the same [13:45:15] https://(keystone.)?.openstack.*.wikimediacloud.org.* [13:45:15] https://(keystone.)?openstack.*.wikimediacloud.org.* [13:45:29] oh! I see it! an extra [13:45:30] . [13:45:41] Yup :-) [13:47:27] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1125428 [13:47:38] that should help! [13:48:05] +1 [13:50:59] ok, that fixed it [13:51:06] which means now I can start working on the /actual/ problem :( [13:51:22] What's the actual problem? [13:51:32] I'll only implicate you in that one if you have a couple hours to burn :) [13:51:58] Sadly I one have about 10 - 15 minutes :-) [13:51:58] The login cycle works on my hotspot network but not on my wifi network. And other users are reporting the same, that it doesn't work on certain home networks. [13:52:09] I thought it was the port 25000 thing, so all of the above was about moving things to 443 [13:52:13] but it still breaks for me on 443 [13:52:34] Oh that is weird [13:52:47] oh, actually... [13:52:49] I'm looking at https://phabricator.wikimedia.org/T388137 [13:53:05] but that's different from what I get, the message I see is [13:53:07] https://www.irccloud.com/pastebin/SKMIZOFN/ [13:53:41] What was the request? [13:54:38] * andrewbogott trying to figure out how to cut and paste from web tools... [13:55:25] https://usercontent.irccloud-cdn.com/file/AQt8juFp/Screenshot%202025-03-07%20at%209.54.50%E2%80%AFAM.png [13:55:34] that probably doesn't show the good bit [13:56:10] Probably in the "Request" tab [13:58:18] https://usercontent.irccloud-cdn.com/file/TLPnEapY/Screenshot%202025-03-07%20at%209.57.58%E2%80%AFAM.png [14:00:01] That doesn't seem particularly broken. [14:01:06] yeah [14:04:06] If the network I'm behind is blocked someplace it wouldn't produce a 400 would it? [14:05:03] going to try with a working network and see if I can tell the difference [14:05:52] Nope I don't think so,... Given that you're seeing it on a hotspot, I do wonder if it's some CGNAT in play. [14:07:18] actually it works on the hotspot [14:07:22] and fails with the home fiber network [14:07:37] Aah... that's worse somehow [14:16:44] andrewbogott: I have to run, but drop me a message if there's anything you need me to look at. [14:16:59] ok! You've already gotten me past one typo, thanks for that [14:35:42] slyngs (when you're back), here is the failure in my apache logs: [14:35:45] https://www.irccloud.com/pastebin/LDn2793o/ [14:36:09] Can't imagine why a change in network would scramble my cookie [16:37:23] dhinus: I have a scheme for (maybe) fixing that horizon/401 issue; are you wrapping up for the day or are you interested in having a look? [16:37:44] oddly unrelated to recent convo with simon which, that change didn't do much [17:06:14] andrewbogott: I'm still around and I'm interested :) [17:07:02] I was trying to think of what could explain it but it looks more and more mysterious [17:09:21] I checked and the mapping between idp and keystone uses the name field. I'm thinking if we map via id instead of name then there's no chance to screw up encodings since IDs are ascii. [17:10:03] right now the mapping looks like this: [17:10:24] [{'local': [{'user': {'name': '{0}', 'type': 'local', 'domain': {'name': 'Default'}}}] | 'remote': [{'type': 'HTTP_OIDC_NAME'}]}] [17:10:52] just switching name to id and OIDC_NAME to OIDC_ID doesn't work, but that whole HTTP_OIDC_NAME thing is kind of a black box to me. [17:11:18] If we can figure out what the data structure looks like that oid is providing to keystone then we'll know what to call it, probably [17:15:40] hmmm I wonder how much of this is hardcoded into keystone, but we can always check the source [17:16:12] my brain is a bit fried on a Friday eve but this looks promising [17:16:17] I'm kind of assuming that the encoding is hardcoded, hence wanting to work around it [17:16:34] ack [17:17:41] I'm more puzzled by your issue where it only works on one network [17:17:57] at least here we know there's a wrong encoding somewhere and we "just" have to find where, or how to work around it [17:17:58] yeah, trying to set that one aside for the moment since I already tried the one thing that I thought would work [17:18:19] agreed we can try fixing the encoding one first, so at least we get that one out of the way [17:18:27] I was thinking that if you're handier with web tools (and have two working eyes) maybe you could inspect the login dance and see if you can find the blob of data that idp is passing to keystone and/or horizon after you auth with the idp popup? [17:18:37] let me try [17:18:47] maybe with labtesthorizon since that's easier to hack [17:20:11] I'm now getting 401 on labtesthorizon, which I guess is due to your temp changes? [17:21:35] yes, but that's after the mapping happens... [17:21:40] (on the phone now sorry) [17:21:47] yes I'm inspecting the network calls [17:21:50] cool [17:24:29] I see a POST call to the keystone endpoint with a big "id_token" in the paylod, but that's encoded [17:24:54] maybe it's just base64 [17:27:53] yep it's 3 base64 strings separated by "." [17:29:35] the last one is a signature [17:29:49] I'll send you the decoded payload in a dm [17:29:51] andrewbogott: [17:29:52] can you tell what fields? [17:34:42] looks like the id is in the field preferred_username [17:37:59] retried now, still 401 [17:38:28] we probably have to dig deeper into the keystone code :/ [17:40:24] dhinus: can you try once more? I'm not seeing a failed lookup in the logs... [17:40:32] although I guess if the mapping is bogus then we wouldn't get that far [17:41:27] retried, still 401 [17:41:43] * dhinus reads https://docs.openstack.org/keystone/latest/admin/federation/mapping_combinations.html [17:42:22] yeah, doesn't appear in the logs which makes me think it's choking on that mapping [17:45:22] I turned on debug logs, will you try once more? [17:45:28] sure [17:45:42] (I can't try because of the network thing and also because my username = my id so I get fake success sometimes) [17:45:51] 401 [17:51:46] try again? [17:52:45] 401 :/ [17:56:12] ok, time for some debug lines [17:58:40] there are some notes here https://github.com/openstack/keystone/blob/master/doc/source/admin/federation/openidc.inc#configure-mod_auth_openidc [17:58:46] logs are saying 'remote_id_parameter = HTTP_OIDC_ISS' which I don't entirely follow but maybe remote_id_parameter isn't what I think it is [17:59:17] can you try another failed login? [17:59:37] oh, the vhost, hm... [18:00:14] ISS apparently stands for "issuer" [18:00:33] seems like [18:00:34] OIDCScope "openid email profile" [18:00:41] should include the field we want? But name already isn't in there [18:01:00] try another login? [18:01:03] I was also thinking about it, but it's already in the payload so I'm not sure if adding it there would make a difference [18:01:13] trying a login [18:01:26] 401 [18:03:29] hm [18:03:32] ANDREW, complete idp content is {'id': 'openid', 'domain_id': '3adf532eb4e14cd79dcdcdcb22e096a2', 'enabled': True, 'description': None, 'remote_ids': ['https://cloudidp-dev.wikimedia.org/oidc'], 'authorization_ttl': None} [18:03:38] absolutely not what i was looking for [18:05:00] I'm reading keystone/federation/utils.py [18:05:10] or at least squinting at it [18:12:31] dhinus: log in again? [18:12:38] trying [18:12:41] I didn't change anything but debug lines [18:13:00] 401 [18:15:48] OIDCScope is correct: scopes are groups of claims, and the "profile" scope contains the "preferred_username" field [18:17:19] I have to log off... I can do a couple more attempts later if it helps! [18:17:50] ok. Thank you for helping! [18:18:11] you're welcome, thanks for looking into this! [18:18:17] omg I swapped a - with a _ [18:18:24] can you stick around for another 3 minutes while I fix that? [18:21:21] sure [18:21:32] try now? [18:21:57] IT WORKED [18:22:02] amazing [18:22:10] the field was OIDC-preferred_username [18:22:14] the unexpected happy end [18:22:16] mixed - and _ [18:22:26] ok I'll try rolling this out in eqiad1 and see what it breaks :) [18:22:29] thank you again [18:22:35] thank you! [18:22:40] have a good weekend :) [18:24:08] you too [18:47:27] Now I need to sit in the dark for a while. See y'all on the 18th if nothing explodes before then