[07:54:50] Folks just a heads up - I've depooled eqsin in advance of core router upgrades there which I'll be kicking off in a few minutes. [09:28:02] Hello, I'm running to a small issue with the dse-k8s bootstrap and I wonder if someone can see what I'm doing wrong please? Essentially I'm not getting the helmfile secrets published on the deploy servers, so I can't deploy cfssl-issuer. It's something related to the profile::kubernetes::deployment_server::helmfile class and the profile::kubernetes::deployment_server_secrets::services hash. [09:29:45] I've added a secret for cfssl-issuer in /srv/private/hieradata/role/common/deployment_server/kubernetes.yaml but I don't get an `/etc/helmfile-defaults/private/dse-k8s_services` directory on the deploy servers. [09:39:40] <_joe_> btullis: if no one else looks into it before I'm done with the other two request I have in queue, I'll look into it [09:39:54] ditto [09:39:56] <_joe_> I guess you added your cluster description to the kubernetes_clusters variable, right? [09:43:16] _joe_: Many thanks. Yes, I added a dse-k8s group and a dse-k8s-eqiad cluster to that hash structure. [09:53:12] btullis: o/ is the dse-k8s cluster in the list of k8s clusters in puppet? If not IIRC the new secrets will not be added [09:54:10] mmm seems so [09:54:50] ah but probably it is hieradata/common/profile/kubernetes/deployment_server.yaml [09:54:58] there is no dse-k8s entry in there [09:56:53] Ah, thanks. Will check now. [09:57:38] basically profile::kubernetes::deployment_server::helmfile [10:04:08] I don't yet have any *other* services though and profile::kubernetes::deployment_server::services doesn't contain cfssl-issuer for any cluster. [10:05:14] Do I just put an empty hash structure for dse-k8s into that file before this deep_merge will work? https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/kubernetes/deployment_server/helmfile.pp#L27 [10:05:48] yeah I think that there should be, in theory, at least one service in the "public" list to have all merged in [10:07:53] btullis: maybe you can just add a cfssl-issuer entry in the service list, like the knative-serving one for ml-serve [10:08:04] it should in theory do the trick, easy to check it via pcc [10:08:12] Great! Trying that now. Thanks. [10:11:33] lemme know how it goes! Maybe for clarity we should add the missing entries for other clusters as well [10:11:58] Yep, will do. [10:15:58] I think it looks OK. https://puppet-compiler.wmflabs.org/pcc-worker1003/37145/deploy1002.eqiad.wmnet/index.html [10:16:43] yep! [11:01:20] All sorted. Many thanks. [12:03:09] icinga is broken? [12:04:01] Total Warnings: 0 [12:04:01] Total Errors: 0 [12:04:04] So maybe it is me [12:06:21] yeah, same here. maybe it's a botched config file, having a look [12:06:32] I checked the file moritzm [12:06:35] And it came back clean [12:06:38] ok [12:06:52] I ran: /usr/sbin/icinga -v /etc/icinga/icinga.cfg [12:07:14] <_joe_> restart icinga? [12:07:23] <_joe_> see what errors it spits [12:07:25] <_joe_> if any [12:07:27] yeah, that one (just wondering since the error message states the config is broken) [12:11:29] I'm trying a restart [12:11:40] nothing interesting in the logs [12:14:11] nah,didn't help. ^ godog when you're back [12:16:04] moritzm: ack, checking [12:18:20] mmhh I think Ie01dd93fcf might have broken things jbond [12:18:36] Sep 7 11:58:14 alert1001 puppet-agent[26553]: (/Stage[main]/Icinga/File[/var/log/icinga/icinga.log]/owner) owner changed 'nagios' to 'root' [12:18:39] Sep 7 11:58:14 alert1001 puppet-agent[26553]: (/Stage[main]/Icinga/File[/var/log/icinga/icinga.log]/group) group changed 'nagios' to 'root' [12:18:46] though that's the log file, nevermind [12:19:07] godog: i think that could well be the issue. currently have some fall out on the puppet masters im trying to debug [12:19:26] ack, thanks I'll keep investigating [12:21:46] ok chgrp www-data /var/cache/icinga/ fixes things, I'll send a patch [12:21:54] thx [12:40:45] jbond: is there a list somewhere of perms changed after Ie01dd93fcf ? for audit purposes [12:41:12] godog: trying to generate tha [12:41:58] thank you [15:25:31] i created a small script to query puppetdb and check what resources where changed during the recent change to the default owner/group permissions. https://phabricator.wikimedia.org/P34123. the report is now clean but see the fixed and fixed_prefixes variables for paths that where affected. i have manully checked all paths and puppet code to make sure the right thing is happening now. most issues [15:25:37] either related to symlinks or unnecceasry ... [15:25:39] ... permissions. the few real errors have been fixed [15:25:40] cc godog ^^ [15:26:56] aprt from fixing the uissue the script may be of intrested to oithers to see the type of stuff you can do with puppetdb and the easier pql api interface [15:50:42] Would anyone here know how I might be granted admin bits for the quarry repo on gerrit (https://gerrit.wikimedia.org/r/admin/repos/analytics/quarry/web) ? [15:54:35] <_joe_> Rook: I'd ask in #wikimedia-releng [15:54:46] Thanks! [17:57:19] hi sre folks! we had an issue on a fundraising-tech wiki last week relating to cookies being lost. The issue was resolved when y'all reverted this patch here https://gerrit.wikimedia.org/r/c/operations/puppet/+/827566/. it looks like the problems related have come back. bblack it was suggested that you'd be a good person to speak to regarding this? thanks in advance! [18:01:02] jgleeson: when did they come back? (do we have some kind of functional monitoring that tracks this?) [18:02:06] bblack: we just heard about it happening again yesterday [18:02:47] also if you re-load the page a few times, it seems to work [18:02:48] yeah bblack first report was here https://phabricator.wikimedia.org/T316578#8215105 [18:03:10] make sure you try it in an incognito window [18:06:50] the hypothesis is that the GeoIP cookie isn't making it to mediawiki? [18:07:13] (from the browser -> caches -> MW) [18:09:24] that sounds right [18:10:37] bblack: I think only when it's set by varnish [18:10:54] but when it's already coming from the browser then it seems it is making it there [18:12:51] I don't think varnish ever fake-injects the cookie [18:13:32] the only way the cookie value reaches mediawiki, for a fresh/incognito session, is some earlier HTTP request results in a Set-Cookie for GeoIP, and then a later request will see it echoed back from the browser [18:14:17] (or JS loaded in that first view can probably see it in the browser immediately) [18:15:20] I see the "Sorry, we're having a little trouble [...]" is generated in the main page output server side [18:15:45] is the correct replacement also generated server-side in response to a cookie being present, or is JS involved in overlaying the Sorry message which serves as a background default? [18:16:21] I'm trying to unwrap the mechanisms here and understand them, but it's a bit opaque [18:16:38] yeah [18:16:52] https://github.com/wikimedia/mediawiki-extensions-FundraiserLandingPage/blob/master/includes/Specials/FundraiserRedirector.php [18:16:56] bblack: ^ [18:16:57] The sidebar Donate link in my incoginto test, has the link as having parameter: &country=XX [18:17:07] if I change that to country=US, it works [18:17:15] so is it really being carried by the param and not the cookie there? [18:18:14] bblack: waaaat oki one sec this may be it [18:18:14] hmmm that link is generated too [18:18:17] so the flow is: [18:18:20] huh [18:18:35] https://en.wikipedia.org/wiki/Main_Page -> Click Donate link, which is this URI: [18:18:45] https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en [18:19:14] which then issues a 302 redirect to: [18:19:15] https://donate.wikimedia.org/w/index.php?title=Special:LandingPage&country=XX&utm_medium=spontaneous&utm_source=donate&utm_campaign=spontaneous [18:20:19] hmmm [18:20:29] bblack: I don't see the donate link on the sidebar carrying a country=XX param as you mentioned above [18:20:33] if I send a valid US GeoIP cookie in a curl req to FundraiserRedirector, I do get &country=US in the redirected URL output [18:20:44] AndyRussG: see above for the missing step between the two [18:20:44] so.. the problem right now with a fresh browser [18:20:50] yep right [18:20:57] is that www.wikipedia.org sets the GeoIP cookie to .wikipedia.org [18:21:03] and donate.wm.o is outside the scope of the cookie [18:21:19] so the U-A doesn't send the cookie to donate.wm.o [18:21:20] yeah, it probably doesn't happen if you start from anywhere in wikimedia.org [18:22:03] (confirmed that incognito works fine starting from meta.wikimedia.org -> Donate sidebar) [18:23:22] so basically, the current system would only work if you've somehow already picked up a cookie from somewhere in *.wikimedia.org domains [18:23:40] (which may be common for some typical user scenarios!) [18:23:47] ^ same here, it works fine for me if I click on the sidebar donate link in a fresh incognito session from meta.wikimedia.org [18:26:10] so, there's this shaky foundation we're standing on here (they need to have loaded something from *.wikimedia.org to get the geoip cookie, *before* hitting that donate redirector link) [18:26:30] in a "typical" pageview it seems like it should generally work because of a login check [18:26:51] if I incognito load a random enwiki article, one of the many sub-urls fetched by the browser is: [18:27:06] https://login.wikimedia.org/wiki/Special:CentralAutoLogin/checkLoggedIn?type=script&wikiid=enwiki&proto=https [18:27:17] and the response to that request contains a Set-Cookie: Geoip=.... [18:27:53] (with .wikimedia.org as the domain) [18:29:08] but I think in my incog sessions when I try this, it fails because chrome refuses to obey that set-cookie, because it's considered "3rd party cookie" and default-blocked in the incognito case [18:29:35] (you can see this in developer console, looking at the checkLoggedIn response, there's a yellow triangle after the set-cookie) [18:30:52] one way to reliably solve this in the face of all this, would be to have the Special:FundraiserRedirector use a two-step redirect process [18:31:27] (redirect to itself once with some extra parameter like &phase=1, then strip that away in the final redirect) [18:31:38] the cookie would get set with the first redirect, and consumed by the second [18:33:04] thanks bblack. I guess we need to figure out if this is a new problem or a long standing problem [18:33:10] that we've just missed [18:33:12] it feels like it's new [18:33:15] long standing AFAIK [18:33:33] or did feel, ha [18:33:35] we haven't changed how GeoIP cookie is set for a while [18:33:53] of course browsers could have evolved in how they treat "third-party" cookies [18:33:59] or what they consider a third-party cookie [18:34:15] yeah I don't know if default-block-3rd-party-cookies is new in recent Chrome's incognito or has been there a while [18:34:37] I don't think it does it regular non-icognito windows, or the whole world of advertising + social mediapulation would collapse [18:35:55] BTW, I don't see any requests for https://login.wikimedia.org/wiki/Special:CentralAutoLogin/checkLoggedIn?type=script&wikiid=enwiki&proto=https from my browser checking a random article on enwiki [18:36:19] it might be different if you have some of our existing cookies [18:36:42] incognito session [18:36:46] ah [18:36:47] restarting again the browser.. [18:37:38] so to be clear how I'm seeing it: fresh Chrome incognito -> Open Dev Tools, leave "Network" tab open [18:37:51] yep [18:37:57] I had 3 incognito taba [18:37:58] *tabs [18:38:01] and those 3 share cookies [18:38:06] https://en.wikipedia.org/wiki/The_Sirens_of_Titan -> [18:38:14] yep, I'm seeing it now [18:38:18] and chrome blocks the cookie [18:38:37] somewhere in the list is of fetched URIs is one labeled: checkLoggedIn?type=script&wikiid=enwiki&proto=https [18:38:50] (that one loads from login.wm.o, if you click into the details) [18:39:08] but it does seem architecturally shaky to rely on that as the mechanism for donatewiki redirector seeing the x-domain cookie [18:40:34] theoretically we could also fake-inject the cookie towards the origin on initial Set-Cookie request, but we haven't done that before, and it's pretty complex and dangerous. [18:41:13] funny enough.. firefox doesn't report the cookie as blocked [18:41:24] but it's effectively blocking it [18:42:09] yeah I would assume that Chrome used to not block 3rd party in incog mode (just start with a clean slate), and now it does, which makes it harder to use incog as a clean slate to test expected non-incog behaviors. [18:42:23] that's firefox 91.13.0esr shipped with bullseye [18:42:33] I also get this in FF btw [18:42:36] you can apparently set a non-default setting to accept them [18:43:24] yep [18:43:29] vgutierrez bblack thanks so much for digging into this! I have to run to pick up kids from school, I believe jgleeson may be around for a bit yet (?), and I'll be back online later in my afternoon [18:43:35] if I disable "enhanced protection tracking" in firefox [18:43:38] it works as expected [18:44:15] quick note on urgency is that since if you go to donate.wikimedia.org from a fresh browser session it seems to work. that means that people who are clicking to that domain from links in FR e-mails should not have any issues [18:44:32] anyways, my recommendation would be that regardless of what triggered this (could be browsers changing, donatewiki changing, or some unrelated change in other parts of MW and/or our CDN), the way it works now isn't very solid, and the double-redirect thing would make it solid. [18:44:36] https://usercontent.irccloud-cdn.com/file/dOWy6BVZ/image.png [18:44:59] however we do still get a non-negligible amount of donations from the sidebar donate clikcs, which is where it seems to be broken for many users [18:45:16] ah hmmmm [18:45:51] yeah the problem is the redirector is assuming it will get a cookie, even though the browser may not have visited a wikimedia.org domain before that click to ever acquire a wikimedia.org cookie [18:46:32] bblack vgutierrez if there are suggested actions for fr-tech pls don't hesitate to note them here (I'll see backscroll later) or on the ticket (https://phabricator.wikimedia.org/T316578) [18:46:38] maybe it did work with some browsers in some past contexts, thanks to that arbitrary loginwiki fetch setting what is effectively a 3rd-party-domain cookie ahead of time, but now it doesn't always do so [18:47:11] hmmm... .. thx!!!!! :) :) [19:11:17] thanks bblack and vgutierrez. I'll try and summarise on the ticket your thoughts and we can investigate it further [19:42:24] /ac/ac [19:49:47] jgleeson: after doing some more reading and looking around on the Chrome side of things, I'm pretty well convinced that 3rd party cookies should still work in default Chrome in a normal (not incog) session. But there are plans to make them default-not-work sometime in 2023 (or later, if there's more pushback?). [19:50:14] Most likely the cases that are failing now are because someone has changed their settings or installed some privacy or adblock extension which is doing the 3rd-party-cookie blocking. [19:50:56] (or they're not on Chrome, and the other browser is implementing this policy, either by default or by preference/plugin/extension/whatever) [19:52:38] thanks bblack [19:57:06] either way, there might be implications beyond donatewiki, from this general trend towards killing 3rd-party cookies. I don't *think* S:UL relies on them, but I'm not 100% sure. We might need to audit related things a bit before we get caught in an emergency later