[06:58:34] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mc2023.codfw.wmnet'... [07:12:09] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) [07:31:45] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2023.codfw.wmnet'] ` and were **ALL** successful. [08:04:05] 10serviceops, 10GitLab (Initialization): GitLab Puma reduced availability due to automated restart - https://phabricator.wikimedia.org/T289454 (10Jelto) [08:04:34] 10serviceops, 10GitLab (Initialization): GitLab Puma reduced availability due to automated restart - https://phabricator.wikimedia.org/T289454 (10Jelto) p:05Triage→03Low [08:05:42] 10serviceops, 10observability, 10GitLab (Initialization), 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Jelto) I created a dedicated task for the reduced availability for the puma worker/exporter: https://phabricator.wikimedia.org/T289454 [08:13:19] 10serviceops, 10GitLab (Initialization): GitLab Puma reduced availability due to automated restart - https://phabricator.wikimedia.org/T289454 (10Jelto) [09:38:27] 10serviceops, 10User-jijiki: Productionise thumbor1005 and thumbor1006 - https://phabricator.wikimedia.org/T285477 (10jijiki) Since @Dzahn and @Jelto are finished with the mw* refresh and I am finishing the memcached refresh, to my knowledge, there are no upcoming servers to refresh. It indeed makes sense to u... [10:32:25] 10serviceops, 10User-jijiki: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10jijiki) [10:33:16] 10serviceops, 10User-jijiki: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10jijiki) [10:35:35] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mc2025.codfw.wmnet'... [10:35:46] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10jijiki) [11:08:38] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2025.codfw.wmnet'] ` and were **ALL** successful. [11:34:41] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Migrate default nework policies (default-network-policy-conf.yaml) to GlobalNetworkPolicies - https://phabricator.wikimedia.org/T280125 (10akosiaris) 05Open→03Resolved a:03akosiaris This has been done. The Network policies have been migrated, the... [11:34:44] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes, and 2 others: Upgrade Calico - https://phabricator.wikimedia.org/T207804 (10akosiaris) [13:09:52] 10serviceops, 10User-jijiki: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Dzahn) @Arnoldokoth Happy to help with this ! [13:21:46] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10Dzahn) a:03Dzahn [14:32:24] 10serviceops, 10GitLab (Initialization): GitLab Puma reduced availability due to automated restart - https://phabricator.wikimedia.org/T289454 (10Jelto) `gitlab1001` just had another rolling restart of puma workers: ` {"timestamp":"2021-08-23T14:06:42.554Z","pid":12276,"message":"PumaWorkerKiller: Rolling Res... [14:45:19] Hi! sukhe and myself are working on envoy puppetization for the caching layer, next commit to be merged is dual stack (RSA+ECDSA) support: https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507, PCC seems to be happy against existing instances (NOOP at envoy level but some changes on the puppet layer). could we get a review from somebody of the team? [14:58:40] running a couple minutes late to the meeting, sorry [15:14:26] 10serviceops, 10Wikimedia-Site-requests, 10Technical-Debt: Consider splitting search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Gehel) The Search Platform team isn't responsible for this service, so from my point of view, feel free to do wha... [15:24:24] zpapierski: we are not sure that merging your patch would help, would you mind creating a task specifically about this issue [15:27:32] and how we can reproduce irt [15:27:47] I am trying to make some time to debug this a bit [15:29:19] 10serviceops, 10Patch-For-Review, 10User-jijiki: Productionise mc10[37-54].eqiad.wmnet - https://phabricator.wikimedia.org/T278225 (10jijiki) 05Open→03Resolved [15:33:16] 10serviceops, 10observability, 10GitLab (Initialization), 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10brennen) 05Open→03Resolved > @brennen do you miss some additional alerts? Maybe something more application specific? As I mentioned there are at leas... [15:59:31] 10serviceops, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech, and 2 others: Deploy Flink (rdf-streaming-updater) to kubernetes (k8s) - https://phabricator.wikimedia.org/T264006 (10dcausse) [16:02:28] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mc2027.codfw.wmnet'... [16:03:23] effie: we merged https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/714364 and it worked [16:05:04] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10jijiki) [16:08:09] dcausse: oh this is beautiful! [16:17:38] 10serviceops, 10GitLab, 10Release-Engineering-Team (Radar), 10User-brennen: GitLab patch release: 13.12.10: Resolves "Username ending with MIME type format is not allowed" errors - https://phabricator.wikimedia.org/T288631 (10thcipriani) 05Open→03Stalled This is waiting on apt import. [16:17:40] thank you ! [16:28:52] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE: Ensure the code is deployed to mediawiki on k8s when it is deployed to production - https://phabricator.wikimedia.org/T287570 (10thcipriani) 05Open→03Resolved >>! In T287570#7251342, @Joe wrote: > The code should now be deployed when merged/... [16:28:56] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10User-jijiki: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10thcipriani) [16:35:23] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2027.codfw.wmnet'] ` and were **ALL** successful. [17:18:41] 10serviceops, 10Wikimedia-Site-requests, 10Technical-Debt: Consider splitting search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Legoktm) >>! In T289224#7301754, @Gehel wrote: > The Search Platform team isn't responsible for this service, so... [17:19:23] hey, I'm looking to merge https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/713934 later today, but not super familiar with the `deployment-charts` repo [17:19:42] after merging will the change automatically be applied, or is there a separate step to actually apply the change? [17:21:32] ah should have checked wikitech first, can anyone confirm that https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Deploying_with_helmfile are the steps to follow? [17:22:25] e.g. run `helmfile -e ${CLUSTER} -i apply` from `/srv/deployment-charts/helmfile.d/services/linkrecommendation` [17:30:18] 10serviceops, 10Wikimedia-Site-requests, 10Technical-Debt, 10User-Majavah: Consider splitting search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Majavah) a:03Majavah [17:32:59] ryankemper: yep, that's it [17:33:21] +2, let jenkins merge it, then wait a minute or two for it to be pulled onto deploy1002 and then run that helmfile comand [17:33:35] and you'll want to run it against staging cluster first, then eqiad then codfw [17:41:05] 10serviceops, 10Wikimedia-Site-requests, 10Technical-Debt, 10User-Majavah: Consider splitting search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Legoktm) In the 1/128 sampled web request logs, this service has 1.7k entries over 30 days, so... [18:23:52] 10serviceops, 10Wikimedia-Site-requests, 10Technical-Debt, 10User-Majavah: Split search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Legoktm) [18:40:07] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) @Joe As of `docker-registry.wikimedia.org/restricted/mediawiki-webserver:2021-08-04-134912-webserver` it lo... [18:41:55] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) [21:28:13] hey serviceops, based off the latest comment on T284346, should we be consulting you before using the node12 images? [21:31:55] the WVUI project switched to node 12 https://github.com/wikimedia/wvui/blob/master/.nvmrc and wanted to update the project's Dockerfile to use the node 12 version as well. just wanted to make sure there were no glaring problems with making that change [21:42:53] nikkinikk_: that seems totally fine since wvui isn't being deployed as a service [21:44:15] legoktm: coo coo thanks 😎 [22:20:43] 10serviceops, 10MW-on-K8s, 10SRE, 10MW-1.37-notes (1.37.0-wmf.20; 2021-08-23), 10Patch-For-Review: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 (10Legoktm) My deployment plan is: * Turn on envoy proxy nowish, test various requests with curl manually * En... [23:32:11] 10serviceops, 10MW-on-K8s, 10SRE, 10MW-1.37-notes (1.37.0-wmf.20; 2021-08-23), 10Patch-For-Review: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 (10Legoktm) >>! In T288848#7303412, @Legoktm wrote: > * Enable proxy in mwdebug k8s deployment too. Note that... [23:59:02] legoktm: so I've helm-applied the change to staging [23:59:19] I'd like to do a `kubectl delete po linkrecommendation-staging-7476db744d-2dbq6` to verify the new pod comes up healthy [23:59:31] doesn't look like the RBAC allows that though