[07:17:44] <elukey>	 jayme: o/ just seen your Friday message - I'll proceed with kubestage2004 this morning, and I guess you'll do 1003? (I just merged the patch that you sent for partman)
[07:18:24] <elukey>	 we should probably send an email to ops@ to warn people about it, even if we do it early in the EU morning it should go unnoticed
[07:39:19] <_joe_>	 why would people notice?
[07:39:34] <elukey>	 good morning :)
[07:39:40] <elukey>	 if they deploy while we reimage
[07:39:47] <elukey>	 (to staging)
[07:40:01] <_joe_>	 don't we still have a node up that can host all of our staging containers?
[07:40:27] <elukey>	 yep yep, but if anything comes up people know what we are doing
[07:40:32] <elukey>	 anway, I am reimaging kubestage2002
[08:38:28] <elukey>	 kubestage2002 on bullseye :)
[08:38:36] <elukey>	 (and uncordoned)
[08:48:53] <elukey>	 _joe_ if it is not a problem I can depool kubestage1003 and reimage it as well
[08:52:01] <jayme>	 elukey: o/ I wasn't sure if we should do one node in eqiad first and then the other codfw one but I think it does not really matter
[08:56:04] <elukey>	 jayme: ack, I think that we can move the staging eqiad cluster to overlay / bullseye as well, and then think about one kubernetes* node
[08:56:31] <jayme>	 +1
[08:56:48] <elukey>	 I'll also take care of the ml-serve clusters
[08:57:04] <elukey>	 jayme: ok to proceed with 1003 then?
[08:57:42] <jayme>	 elukey: yes, sure!
[08:57:50] * elukey proceeds
[09:45:59] <elukey>	 kubestage1003 up and running (uncordoned and taking traffic again)
[09:46:18] <elukey>	 need to step afk for a bit, if all looks good I can also reimage 1004
[10:01:48] <jayme>	 great, thanks!
[10:08:31] * jayme rebalanced pods in staging-codfw
[10:12:16] <wikibugs>	 10serviceops, 10Add-Link, 10Growth-Team, 10Patch-For-Review: Many repeated config file changed / config file reloaded messages from promehteus statsd exporter - https://phabricator.wikimedia.org/T300629 (10JMeybohm)
[10:23:42] <wikibugs>	 10serviceops, 10Add-Link, 10Growth-Team, 10Patch-For-Review: Many repeated config file changed / config file reloaded messages from promehteus statsd exporter - https://phabricator.wikimedia.org/T300629 (10JMeybohm) 05In progress→03Resolved I've applied the default change and tested with linkrecommenda...
[10:36:07] <elukey>	 jayme: if you are ok I am going to proceed with kubestage1004
[10:36:34] <jayme>	 elukey: no objections
[10:40:59] <elukey>	 mmm interesting, pulling the istiod image is taking ages
[10:41:29] <elukey>	 on 1003 I mean, after draining 1004
[10:41:45] <elukey>	 it took 4 minutes
[10:41:49] <elukey>	 and now it finished
[10:42:14] <elukey>	 jayme: --^
[10:42:17] <elukey>	 normal on staging noes?
[10:42:21] <elukey>	 *nodes
[10:42:29] <jayme>	 uhm...no
[10:42:55] <jayme>	 that's pretty long. The eqiad nodes are very new as well
[10:43:24] <elukey>	 it is around 150MB, weird
[10:43:56] <elukey>	   Normal   Pulling      4m4s                   kubelet, kubestage1003.eqiad.wmnet  Pulling image "docker-registry.discovery.wmnet/istio/pilot:1.9.5-5"
[10:45:48] <jayme>	 does not seem like ther is a problem in dragonfly p2p nework ..strage.
[10:47:21] <wikibugs>	 10serviceops, 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: wikimediacz-l does not hold all posts for moderation - https://phabricator.wikimedia.org/T298729 (10MatthewVernon)
[10:49:57] <jayme>	 elukey: I think the node might have been pulling a bunch of images in parallel (because of the drain and it's pretty cold cache due to recent reimage=
[10:52:16] <elukey>	 jayme: makes sense yes, I think that we can proceed with 1004
[10:52:22] <jayme>	 ack
[11:07:11] <wikibugs>	 10serviceops, 10Prod-Kubernetes: setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm)
[11:07:32] <wikibugs>	 10serviceops, 10Prod-Kubernetes: setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm)
[11:28:48] <elukey>	 1004 back from the reimage, uncordoned and pooled
[11:28:55] <elukey>	 so all staging envs are on bullseye now!
[11:36:36] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team (Done by Feb 23🔥): Make scap deploy to kubernetes together with the legacy systems - https://phabricator.wikimedia.org/T299648 (10MatthewVernon)
[11:37:46] <elukey>	 going afk for lunch, lemme know if anything looks weird
[13:11:37] <hnowlan>	 I'd like to further scale up jobqueue, any objections? number of replicas is getting a little high but unfortunately might be necessary for the short-term https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/762418 
[13:15:16] <wikibugs>	 10serviceops, 10DC-Ops, 10SRE: setup/install mc20[38-55] - https://phabricator.wikimedia.org/T302218 (10akosiaris)
[13:43:52] <wikibugs>	 10serviceops, 10Parsoid: Move testreduce to nodejs 12 - https://phabricator.wikimedia.org/T301303 (10MatthewVernon)
[13:44:38] <wikibugs>	 10serviceops, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10wdwb-tech, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10MatthewVernon)
[13:45:36] <jayme>	 hnowlan: +1 - still looks fine
[14:02:14] <wikibugs>	 10serviceops, 10WMDE-Technical-Wishes-Maintenance: Migrate kartotherian production service to node12 - https://phabricator.wikimedia.org/T301475 (10MatthewVernon)
[14:02:31] <wikibugs>	 10serviceops, 10WMDE-Technical-Wishes-Maintenance: Migrate geoshapes production service to node12 - https://phabricator.wikimedia.org/T301476 (10MatthewVernon)
[15:05:01] <wikibugs>	 10serviceops, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10wdwb-tech, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10Joe) Ok so a few requirements:  1) we need the repository to be on gerrit, and to include a `.pipeline` directory t...
[15:06:29] <wikibugs>	 10serviceops, 10SRE: Renew puppet cert for etcd.codfw.wmnet - https://phabricator.wikimedia.org/T302153 (10Joe) a:03Joe
[15:13:19] <wikibugs>	 10serviceops, 10SRE: Renew puppet cert for etcd.codfw.wmnet - https://phabricator.wikimedia.org/T302153 (10Joe) I think this is the old etcd certificate we used to use for etcd in codfw; since we've moved to etcd v3 we're using a new cert created with cergen:  ` $ openssl s_client -host conf2004.codfw.wmnet -p...
[16:44:29] <wikibugs>	 10serviceops, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10wdwb-tech, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10Michaelcochez) @Joe for the base image, would you recommend our current approach of starting from an 'empty' image...
[16:48:56] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T293728 (10elukey) @akosiaris both staging clusters are on bullseye with overlay, I have updated the hiera settings after some rounds of reimage. I am currently reimaging all ml-serve nodes wi...
[16:59:14] <wikibugs>	 10serviceops, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10wdwb-tech, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10Joe) >>! In T301471#7726097, @Michaelcochez wrote: > @Joe for the base image, would you recommend our current appro...
[17:02:38] <wikibugs>	 10serviceops, 10Prod-Kubernetes: setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T302208 (10JMeybohm)
[17:02:43] <wikibugs>	 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw, 10Kubernetes: (Need By: TBD) rack/setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T299470 (10JMeybohm)
[17:04:20] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T293728 (10JMeybohm) >>! In T293728#7726108, @elukey wrote: > Then once the host is up and running, uncordon/pool/etc.. For new nodes it is easier, maybe we could try to add one with bullseye...
[17:10:11] <wikibugs>	 10serviceops, 10Product-Infrastructure-Team-Backlog, 10SRE, 10Maps (Geoshapes), and 2 others: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (10akosiaris) >>! In T274388#7722815, @MSantos wrote: > @akosiaris and @jijiki how can we move forward with this? >  > For context:  > - [[...
[18:16:55] <elukey>	 ml-serve-codfw on bullseye + overlay! (8 worker nodes in total)