[13:08:21] <effie>	 btullis: you may have a go at merging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1184878 at your convenience. I do not have much insight as to how dumps work on k8s, thus I am on the fence on deploying it myself
[13:18:33] <btullis>	 effie: Cool. Will do, thanks. 
[14:01:59] <Emperor>	 turns out running puppet in podman in WMCS isn't the quickest thing ever, who knew?
[14:16:33] <jhathaway>	 :)
[14:58:06] * inflatador is shocked
[15:14:16] <elukey>	 Emperor: that level of inception gave the final hit to my brain (on a Friday afternoon) :D
[16:13:27] <kostajh>	 I'm sorry to ask on a Friday: would someone be able to help with a small change to the hcaptcha config in operations/puppet? The change would be an update to proxy rules and syncing this to a single urldownloader host
[16:16:51] <rzl>	 kostajh: I can take a look
[16:16:58] <kostajh>	 rzl: thanks so much
[16:17:04] <kostajh>	 I'll DM you
[16:19:37] <kostajh>	 ok, we have proxy set up in between clients and hCaptcha. If you go to https://auth.wikimedia.org/test2wiki/wiki/Special:CreateAccount and click on the "Username" field, you'll see that we start sending requests to https://hcaptcha.wikimedia.org
[16:20:00] <kostajh>	 This is working fine, except, we made one change to use the `secure-api.js` endpoint instead of `api.js`, and this requires one update to the hcaptcha proxy config in operations/puppet
[16:20:31] <kostajh>	 requests to `https://assets-hcaptcha.wikimedia.org/1/api.js` should be proxied to `https://js.hcaptcha.com/1/api.js`
[16:22:22] <kostajh>	 There's some documentation here about the hCaptcha proxy setup https://wikitech.wikimedia.org/wiki/HCaptcha
[16:22:46] <rzl>	 oh! I thought you had a patch and you just needed it deployed -- are we also figuring out how to configure the proxy? :)
[16:23:56] <kostajh>	 rzl: I'm looking into it now. Yes, I could use some help with the nginx side.
[16:24:07] <rzl>	 I can take a look, but just to check the obvious first, all the SREs who've actually worked on this are in European time zones so already done for the week?
[16:24:33] <kostajh>	 rzl: yes, unfortunately. 
[16:24:37] <rzl>	 (seems like mostly R.aine and e.ffie)
[16:24:41] <rzl>	 cool no worries
[16:25:28] <rzl>	 just means I'll want to build up a lot more confidence in how this works before fiddling with it on a Friday, so might ask you for some patience
[16:25:39] <cdanis>	 I'm looking as well
[16:25:42] <kostajh>	 yeah I understand completely. And I'm seeing if I can figure out a patch for the nginx part. 
[16:26:08] <kostajh>	 I've also asked hCaptcha folks if they have an example config for this, since it's not in the example they provided to us already.
[16:26:22] <cdanis>	 we might want to be filtering some additional headers
[16:28:41] <kostajh>	 cdanis: thank you!
[16:32:18] <rzl>	 cdanis: see also /srv/git/private/hieradata/common/profile/hcaptcha/proxy.yaml if you haven't found it already
[16:32:29] <cdanis>	 ty
[16:35:34] <XioNoX>	 fyi, both links to ulsfo are down (Lumen and Arelion are working on it), and we're relying on the last hope GRE tunnel to reach that site (cc topranks)
[16:35:38] <kostajh>	 to clarify something: the proxied request also needs to bring along all the query parameters that were in the original request. 
[16:36:25] <XioNoX>	 mutante, urandom, could one of you depool ulsfo and monitor the situation ? (cf. main-announce emails)
[16:36:39] <XioNoX>	 I'm about to log off for the weekend
[16:36:57] <cdanis>	 kostajh: is that a change in behavior from the existing proxying rules?
[16:37:34] <XioNoX>	 Arelion: fibercut between Denver and Strasburg, Lumen: power failure in Tacna, AZ
[16:37:39] <kostajh>	 cdanis: yes. We changed the config in operations/mediawiki-config to load `secure-api.js` instead of `api.js`, so we can use something called secure enclave mode. When used with a proxy, this also requires an update to the proxy rules 
[16:38:06] <mutante>	 XioNoX: ok!
[16:38:42] <XioNoX>	 mutante: thx! once at least 2 of those alerts have cleared it should be good to repool https://alerts.wikimedia.org/?q=scope%3Dnetwork&q=alertname%3DCoreRouterInterfaceDown
[16:38:55] <mutante>	 XioNoX: alright, gotcha
[16:39:10] <mutante>	 just depooled ulsfo
[16:39:13] <XioNoX>	 thx
[16:39:32] <XioNoX>	 don't hesitate to ping me if needed but I might not be near my computer
[16:39:43] <mutante>	 ok
[16:39:57] <mutante>	 monitoring 
[16:40:21] <topranks>	 I am back home now so I will keep an eye and am close to my laptop if needed
[16:40:27] <mutante>	 cool
[16:42:10] <mutante>	 looking for maint-announce mails, mainly
[17:52:45] <mutante>	 re: fiber cut.  "After investigation, our provider is engaging their local fiber repair team, no ETA shared.
[17:52:57] <kostajh>	 for anyone else following along: c.danis and r.zl have kindly pushed a fix for the hCaptcha proxy to T378188
[17:52:58] <stashbot>	 T378188: Implement secure enclave mode for hCaptcha - https://phabricator.wikimedia.org/T378188
[17:54:41] <mutante>	 provider of our provider is telling their local provider to provide repair - we shall see.
[18:11:14] <topranks>	 The Lumen circuit is back up 
[18:11:34] <mutante>	 I just noticed 2 alerts are gone but 2 alerts are still there.
[18:11:36] <topranks>	 https://grafana.wikimedia.org/goto/JDKTD0rHg
[18:11:39] <mutante>	 and they did not email about it just yet
[18:12:03] <topranks>	 Arelion is still down 
[18:12:14] <mutante>	 still seeing 2 core router interfaces down.. yea..that
[18:12:45] <topranks>	 I would say let's wait another hour and make a call on what to do 
[18:12:53] <mutante>	 ack!
[18:16:17] <mutante>	 just to confirm: we have 2 connections that makes it redundant.. but today .. one had a power failure and one had a fiber cut.. 2 unrelated incidents..right?
[19:16:57] <topranks>	 sorry I had to jet out 
[19:17:32] <mutante>	 one still down, other stable though
[19:17:46] <topranks>	 yes we have two transport links in ulsfo, one from codfw and the other from eqord (which in turn has links to codfw and eqiad)
[19:18:12] <mutante>	 should we turn it back on with one link or wait for both to be back
[19:18:21] <mutante>	 still have time to check again later
[19:18:48] <topranks>	 we also have a GRE tunnel which goes to eqdfw (in Dalals) which is a poor-mans backup that goes over the internet 
[19:19:02] <topranks>	 I think - based on us having the GRE - we can probably repool the site 
[19:19:25] <mutante>	 ok, doing that
[19:19:54] <topranks>	 ok, I'll keep an eye on how things look after 
[19:20:55] <mutante>	 done
[19:21:03] <mutante>	 thanks
[19:24:12] <topranks>	 traffic on the rise again 
[21:08:44] <mutante>	 on-call handover: we temp depooled ulsfo due to a fiber cut. it's pooled again. no other incidents happened