[08:43:51] hi, may anyone knowing about MediaWiki on Kubernetes explains to me why helm takes 6 minutes to deploy? [08:44:14] example: `helmfile -e codfw --selector name=main apply in /srv/deployment-charts/helmfile.d/services/mw-api-int (duration: 05m 56s)` [08:44:28] it almost feels like it is updating the workers serially [08:44:56] <_joe_> hashar: that shouldn't be the norm, but we do limit the amount of workers that are replaced at any time [08:45:12] <_joe_> hashar: is this common or this was a first? [08:45:18] it always happen [08:45:19] :) [08:45:48] I might be able to retrieve the timing in Kibana since scap send the log there [08:46:08] <_joe_> so yeah, the speed of deployment is a bit of a balance. If you want to have speedy deployments, then you need to have a higher amount of free resources [08:46:08] with the duration added to the log context (I have added that a few months/years ago) [08:46:14] <_joe_> and we're in the middle of the transition [08:46:41] <_joe_> hashar: can you open a task? serviceops is going to be very busy this week, with the datacenter switchover happening [08:46:48] no no [08:46:56] I was merely wondering / thinking out loud [08:47:21] that will be bringed up eventually later on by ahmon / releng that knows about mw on k8s [08:47:31] I think we will have a shared goal to speed up deployment [08:48:08] if that is solved by itself once the migration has completed, that is fine [08:48:24] <_joe_> not necessarily 100%, but yes right now we have less wiggle room [08:48:26] and even if not, it can be checked or optimized after it has completed [08:50:06] my stupid rant from yesterday night was "why have we moved to kubernetes if deployment is an order of magnitude slower" :-] [08:50:14] which is really is a teaser to dig into it [08:56:06] _joe_: you are right, I am filing a task about it, at least to capture the discussion :) [08:58:06] <_joe_> hashar: moving to kubernetes was never touted as a way to have faster deployments [08:58:38] <_joe_> actually, I always maintained that given they're going to be consistent, repeatable, and guaranteed to be overall safe, they might be in fact slower [08:59:13] <_joe_> especially for as long as we ship around a 7GB godzilla of a container. But the current level of slowness is a construct of the transition [08:59:33] <_joe_> There's stuff we can do to mitigate this, but this is not the week for my team to think about it [09:00:50] yes yes I have no complaint/concern [09:00:59] my rant was merely a question of curiosity :) [09:01:08] and whether maybe something can be made a little bit faster or not [09:01:25] and that is definitely not for this week, that is going to be an APP goal I think [09:01:38] well there is an hypothesis about speeding up deployments [09:01:54] anyway, just ignore it :) [09:08:14] 06serviceops, 06Commons: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9640913 (10seav) @TheDJ, thanks for confirming the issue. Is there a way to use your shell trick to determine which images would need purging? I think it's a UX issue i... [09:08:42] 06serviceops, 06Commons: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9640916 (10seav) Additional note: @tstarling [[ https://wikis.world/@TimStarling/112118792888986414 | said on Mastodon ]] that this is most probably just T344233 manife... [09:11:06] _joe_: I have captured it in a task (feel free to ignore it, I haven't subscribed you to it ) Helm deployment of MediaWiki now takes 6 minutes - https://phabricator.wikimedia.org/T360403 [09:28:16] 06serviceops, 06Commons, 10SRE-swift-storage: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9641068 (10TheDJ) >>! In T358738#9640913, @seav wrote: > Is there a way to use your shell trick to determine which images would need purging? No... [09:34:47] 06serviceops, 06Commons, 10SRE-swift-storage: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9641119 (10akosiaris) >>! In T358738#9639319, @TheDJ wrote: > ping @akosiaris Ideas on why codfw is out of date and won't correct ? Is it out of... [10:11:20] 06serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.03.04 - 2024.03.24), 07Kubernetes, 13Patch-For-Review: Improve how we address outside k8s infrastructure from within charts (e.g. network policies) - https://phabricator.wikimedia.org/T331894#9641270 (10JMeybohm) I know think we might have misund... [13:02:19] 06serviceops, 07Datacenter-Switchover: 14Update DC switchover cookbooks to handle mw-jobrunners - 14https://phabricator.wikimedia.org/T359154#9641761 (10jijiki) 05Open→03Resolved p:05Triage→03High a:03jijiki 14This is done, weill reopen if something goes south  [13:02:27] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9641765 (10jijiki) [14:20:33] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9642157 (10ops-monitoring-bot) jiji@cumin1002 - Cookbook cookbooks.sre.discovery.datacenter depool all services in codfw: Northward... [14:27:16] 06serviceops, 10MoveComms-Support, 07User-notice: MoveComms support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9642248 (10Trizek-WMF) The few reactions I observed from communities came from users who thanked me for the information message I sent last Friday. [14:40:43] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 4 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9642359 (10ops-monitoring-bot) jiji@cumin1002 - Cookbook cookbooks.sre.discovery.datacenter depool all services in codfw: Northward... [15:04:12] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642547 (10MoritzMuehlenhoff) [15:40:11] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642787 (10MoritzMuehlenhoff) [15:55:17] 06serviceops, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9642859 (10MoritzMuehlenhoff) [15:56:05] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9642861 (10akosiaris) We had to repool kartotherian in codfw as we had a [CPU exhaustion event](https://grafana.wikimedia.org/d/000... [16:11:56] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9642921 (10Clement_Goubert) Some tweaking of replicas was needed on mw-on-k8s, which was expected as this is the first switchover w... [17:29:30] 06serviceops, 10Prod-Kubernetes: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9643267 (10JMeybohm) I've summarized my findings at https://wikitech.wikimedia.org/wiki/User:JMeybohm/PSP_Replacement @akosiaris, @elukey: I'd like you to take a look and ask... [18:08:00] When I see in Gitlab CI that a docker image was pushed to registry: "pushing manifest for docker-registry.discovery.wmnet/repos/sre/ .." but I can't see it on https://docker-registry.wikimedia.org .. is it just caching and waiting for the homepage builder? I remember thinking this last time but then it took quite some time. [18:08:15] as in days, rather than hours [18:12:40] mutante: IIRC the homepage is build by some cronjob plus caching - but you should be able to pull the image directly after pushing [18:13:41] jayme: ACK, thanks, I guess I can just bump the version and deploy to staging [20:31:22] Heyo. Anyone around to delete https://gitlab.wikimedia.org/jiji/phab/-/merge_requests/9 ? I may have accidentally put half of a password in the title ._. [20:32:11] (my own, not one of running services. Rotating it all now) [20:40:33] brett: The merge request was successfully deleted. [20:40:46] Thank you! [20:40:52] brett: this best channel for this is -gitlab btw [20:41:01] ah, thanks [20:41:23] I wasnt sure I could do that.. I could after "re-auth" [20:41:36] ~ "enter admin mode" kind of thing [22:31:16] 06serviceops, 06Commons, 10SRE-swift-storage: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9644283 (10tstarling) I thought there was no cross-DC replication of thumbnails. T299125#8221206 seems to support that. So it's expected that a b...