[07:23:57] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9788897 (10MoritzMuehlenhoff) [07:41:22] 06serviceops, 06Data-Persistence: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996#9788961 (10JMeybohm) I tend to agree, also for sake of alignment of sessionstore with the rest of our services. Unfortunately this feels like the more involved chan... [09:03:33] 06serviceops, 10MoveComms-Support, 10MW-on-K8s, 06SRE, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki and Commons) - https://phabricator.wikimedia.org/T362323#9789300 (10Clement_Goubert) [09:23:00] 06serviceops, 10iPoid-Service (iPoid 1.0), 10Trust and Safety Product Sprint (Sprint 10 (13th May - 24th May)): Define service level indicators and service level objectives - https://phabricator.wikimedia.org/T348935#9789419 (10Tchanders) [11:53:07] 06serviceops, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9789858 (10MoritzMuehlenhoff) Redict is now packaged in Debian: https://tracker.debian.org/pk... [11:53:48] 06serviceops, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9789859 (10MoritzMuehlenhoff) [12:40:48] hello folks! [12:40:52] trying again - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1029544 :) [12:41:03] this is to test Tegola with the mesh sidecar basically [12:44:57] why am I doing it? This is a good question :D [12:45:10] we'd need to migrate thanos-swift.discovery to PKI [12:45:26] and for some reason, tegola currently doesn't like the new TLS settings (increase in CPU usage etc..) [12:58:58] lgtm! [12:59:07] <3 [13:16:19] of course it doesn't work since the aws sdk forces https [13:16:29] :( [13:20:22] there must be a way to specify the protocol, we are no the only ones having issues for sure [13:24:21] I think e.ffie ran into this as well last time [13:24:42] (sorry, I did not come around to comment on this earlier) [13:31:39] yeah I suspected something may have turned in this way [13:31:42] I am reading https://github.com/go-spatial/tegola/blob/master/cache/s3/s3.go [13:32:56] we used the python sdk for recommendation-api-ng and the http://etc.. worked [13:33:07] so I kinda hoped go would have given me the same :D [13:35:02] maybe setting http://localhost:6022 as endpoint, not sure if it was tried as well [13:36:54] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010 - https://phabricator.wikimedia.org/T363209#9790294 (10Jhancock.wm) [13:37:20] I seem to remember there was something about missing headers (Host,SNI?) there as well...but I don't really recall. Maybe effie knows [13:38:40] forcing http leads to [13:38:40] SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method. [13:38:54] that I suspect it is some weird http/https mismatch [13:43:12] 06serviceops, 06Content-Transform-Team, 07Essential-Work, 07Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324#9790308 (10elukey) To keep archives happy: tried to set up a local sidear in staging (I think it was attempted bef... [13:45:15] elukey: we were not able to properly authenticate to s3 via envoy. We didn't use the AWS request signing filter at the time, I do not remember why though, it has been way too long [13:46:07] effie: ack ack :( [13:46:24] I'll try to make some tests, if I manage to find something I'll report back [13:47:42] I think the signing filter could be the only option if we are going via envoy, while the other is updating tegola altogether [13:48:24] I am a little ignorant, do you mean that envoy may hide some response/request headers messing up the sdke? [13:48:27] *sdk? [13:51:36] I am not 100% sure, we were debugged and deployed that I think in 2021? [13:51:41] 06serviceops: Provide nodejs20 base images for production - https://phabricator.wikimedia.org/T362681#9790329 (10Jdforrester-WMF) 05Open→03Resolved >>! In T362681#9782772, @Jdforrester-WMF wrote: >>>! In T362681#9781064, @MoritzMuehlenhoff wrote: >> I kicked off a build of the node20 image, it should hop... [13:51:58] were debugging this* [13:53:36] 06serviceops: Provide nodejs20 base images for production - https://phabricator.wikimedia.org/T362681#9790331 (10MoritzMuehlenhoff) >>! In T362681#9790329, @Jdforrester-WMF wrote: >>>! In T362681#9782772, @Jdforrester-WMF wrote: >>>>! In T362681#9781064, @MoritzMuehlenhoff wrote: >>> I kicked off a build of... [14:18:17] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9790432 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestage... [14:18:38] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 10vm-requests, 07Kubernetes: Site: eqiad 3 VM request for staging-eqiad kube-apiserver - https://phabricator.wikimedia.org/T364746 (10JMeybohm) 03NEW [14:28:35] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 10vm-requests, 07Kubernetes: Site: eqiad 3 VM request for staging-eqiad kube-apiserver - https://phabricator.wikimedia.org/T364746#9790476 (10MoritzMuehlenhoff) p:05Triage→03Medium LGTM [14:31:10] I'm trying to 'rake run_locally' on deployment-charts to e.g. test https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1030950 with podman locally (symlinked to docker) though getting "Errno::EACCES: Permission denied @ dir_s_chdir - /src" has anyone run into that and/or uses podman for this setup ? [14:31:59] the full output being https://phabricator.wikimedia.org/P62364 [14:32:23] I'm guessing that's rake inside the container being unhappy [14:36:33] godog: just guessing it might not be allowed to write to the container fs by default? [14:38:26] jayme: yeah I thought so too, though /src which is the workdir is an rw volume, I'd expect that work :| [14:38:36] this -v /tmp/d20240513-40981-pn6kgo:/src:rw [14:39:32] https://tenor.com/4JpY.gif [14:39:58] lolz [14:42:15] https://c.tenor.com/1-7PRjYcw5AAAAAC/tenor.gif [14:42:59] eheh, nice one - with homer in the background [14:43:26] the office delivers [14:49:51] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 3 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9790605 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemast... [15:02:02] effie: so I think the issue is https://github.com/aws/aws-sdk-go/issues/1473 [15:02:06] that is not an easy one :( [15:13:56] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010 - https://phabricator.wikimedia.org/T363209#9790688 (10Jhancock.wm) @Papaul, This was the last screen I got. The servers all have the OS installed and it failed at the certificate stage. I think it's caus... [15:17:03] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010 - https://phabricator.wikimedia.org/T363209#9790699 (10MoritzMuehlenhoff) All insetup roles default to Puppet 7 these days (as does the kafka-main roler itself), so these should be installed with Puppet 7. [15:17:55] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010 - https://phabricator.wikimedia.org/T363209#9790707 (10MoritzMuehlenhoff) I think the reason the installation failed is because there is no entry in site.pp yet. [15:54:21] 06serviceops, 10CirrusSearch, 03Discovery-Search (Current work), 13Patch-For-Review: Implement global ratelimiting in our service mesh - https://phabricator.wikimedia.org/T362310#9790941 (10CodeReviewBot) pfischer opened https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_re... [16:56:32] Hi, is there a way to force php-fpm to show the nice error message instead of this if the error type is of a certain type (here, DBUnexpectedError?) https://phabricator.wikimedia.org/T360930#9791250 [16:57:15] I wrote this message here https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1014101/10/includes/libs/rdbms/loadmonitor/LoadMonitor.php#125 [16:57:28] but as we knew, it's not bubbling up [17:20:34] hi friends, anything I should worry about before helmfile apply'ing a 200-ish sized DaemonSet? I have at least verified updatestrategy is rolling [17:40:40] 06serviceops: deploy1003 implementation tracking - https://phabricator.wikimedia.org/T364417#9791412 (10akosiaris) Thanks @dzahn. It's fine as a parent task, thanks for adding it. T364416 already says bullseye for what is worth. Adding @jijiki for her information. [17:40:52] (mine answered in -k8s-sig) [19:58:00] 06serviceops, 10[DEPRECATED] wdwb-tech, 10Citoid, 06Content-Transform-Team-WIP, and 10 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118#9792043 (10Jdforrester-WMF) [21:19:49] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010 - https://phabricator.wikimedia.org/T363212#9792290 (10VRiley-WMF) [21:24:49] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010 - https://phabricator.wikimedia.org/T363212#9792304 (10VRiley-WMF) kafka-main1006 Rack: A 3 U 24 CableID: 1881 Port: 36 kafka-main1007 Rack: B 3 U 34 CableID: 5173 Port: 19 kafka-main1008 Rack: C 3 U...