[13:08:21] btullis: you may have a go at merging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1184878 at your convenience. I do not have much insight as to how dumps work on k8s, thus I am on the fence on deploying it myself [13:18:33] effie: Cool. Will do, thanks. [14:01:59] turns out running puppet in podman in WMCS isn't the quickest thing ever, who knew? [14:16:33] :) [14:58:06] * inflatador is shocked [15:14:16] Emperor: that level of inception gave the final hit to my brain (on a Friday afternoon) :D [16:13:27] I'm sorry to ask on a Friday: would someone be able to help with a small change to the hcaptcha config in operations/puppet? The change would be an update to proxy rules and syncing this to a single urldownloader host [16:16:51] kostajh: I can take a look [16:16:58] rzl: thanks so much [16:17:04] I'll DM you [16:19:37] ok, we have proxy set up in between clients and hCaptcha. If you go to https://auth.wikimedia.org/test2wiki/wiki/Special:CreateAccount and click on the "Username" field, you'll see that we start sending requests to https://hcaptcha.wikimedia.org [16:20:00] This is working fine, except, we made one change to use the `secure-api.js` endpoint instead of `api.js`, and this requires one update to the hcaptcha proxy config in operations/puppet [16:20:31] requests to `https://assets-hcaptcha.wikimedia.org/1/api.js` should be proxied to `https://js.hcaptcha.com/1/api.js` [16:22:22] There's some documentation here about the hCaptcha proxy setup https://wikitech.wikimedia.org/wiki/HCaptcha [16:22:46] oh! I thought you had a patch and you just needed it deployed -- are we also figuring out how to configure the proxy? :) [16:23:56] rzl: I'm looking into it now. Yes, I could use some help with the nginx side. [16:24:07] I can take a look, but just to check the obvious first, all the SREs who've actually worked on this are in European time zones so already done for the week? [16:24:33] rzl: yes, unfortunately. [16:24:37] (seems like mostly R.aine and e.ffie) [16:24:41] cool no worries [16:25:28] just means I'll want to build up a lot more confidence in how this works before fiddling with it on a Friday, so might ask you for some patience [16:25:39] I'm looking as well [16:25:42] yeah I understand completely. And I'm seeing if I can figure out a patch for the nginx part. [16:26:08] I've also asked hCaptcha folks if they have an example config for this, since it's not in the example they provided to us already. [16:26:22] we might want to be filtering some additional headers [16:28:41] cdanis: thank you! [16:32:18] cdanis: see also /srv/git/private/hieradata/common/profile/hcaptcha/proxy.yaml if you haven't found it already [16:32:29] ty [16:35:34] fyi, both links to ulsfo are down (Lumen and Arelion are working on it), and we're relying on the last hope GRE tunnel to reach that site (cc topranks) [16:35:38] to clarify something: the proxied request also needs to bring along all the query parameters that were in the original request. [16:36:25] mutante, urandom, could one of you depool ulsfo and monitor the situation ? (cf. main-announce emails) [16:36:39] I'm about to log off for the weekend [16:36:57] kostajh: is that a change in behavior from the existing proxying rules? [16:37:34] Arelion: fibercut between Denver and Strasburg, Lumen: power failure in Tacna, AZ [16:37:39] cdanis: yes. We changed the config in operations/mediawiki-config to load `secure-api.js` instead of `api.js`, so we can use something called secure enclave mode. When used with a proxy, this also requires an update to the proxy rules [16:38:06] XioNoX: ok! [16:38:42] mutante: thx! once at least 2 of those alerts have cleared it should be good to repool https://alerts.wikimedia.org/?q=scope%3Dnetwork&q=alertname%3DCoreRouterInterfaceDown [16:38:55] XioNoX: alright, gotcha [16:39:10] just depooled ulsfo [16:39:13] thx [16:39:32] don't hesitate to ping me if needed but I might not be near my computer [16:39:43] ok [16:39:57] monitoring [16:40:21] I am back home now so I will keep an eye and am close to my laptop if needed [16:40:27] cool [16:42:10] looking for maint-announce mails, mainly [17:52:45] re: fiber cut. "After investigation, our provider is engaging their local fiber repair team, no ETA shared. [17:52:57] for anyone else following along: c.danis and r.zl have kindly pushed a fix for the hCaptcha proxy to T378188 [17:52:58] T378188: Implement secure enclave mode for hCaptcha - https://phabricator.wikimedia.org/T378188 [17:54:41] provider of our provider is telling their local provider to provide repair - we shall see. [18:11:14] The Lumen circuit is back up [18:11:34] I just noticed 2 alerts are gone but 2 alerts are still there. [18:11:36] https://grafana.wikimedia.org/goto/JDKTD0rHg [18:11:39] and they did not email about it just yet [18:12:03] Arelion is still down [18:12:14] still seeing 2 core router interfaces down.. yea..that [18:12:45] I would say let's wait another hour and make a call on what to do [18:12:53] ack! [18:16:17] just to confirm: we have 2 connections that makes it redundant.. but today .. one had a power failure and one had a fiber cut.. 2 unrelated incidents..right? [19:16:57] sorry I had to jet out [19:17:32] one still down, other stable though [19:17:46] yes we have two transport links in ulsfo, one from codfw and the other from eqord (which in turn has links to codfw and eqiad) [19:18:12] should we turn it back on with one link or wait for both to be back [19:18:21] still have time to check again later [19:18:48] we also have a GRE tunnel which goes to eqdfw (in Dalals) which is a poor-mans backup that goes over the internet [19:19:02] I think - based on us having the GRE - we can probably repool the site [19:19:25] ok, doing that [19:19:54] ok, I'll keep an eye on how things look after [19:20:55] done [19:21:03] thanks [19:24:12] traffic on the rise again [21:08:44] on-call handover: we temp depooled ulsfo due to a fiber cut. it's pooled again. no other incidents happened