[00:11:38] 10serviceops, 10SRE, 10Datacenter-Switchover: Use encrypted rsync for deployment::rsync - https://phabricator.wikimedia.org/T289857 (10Legoktm) a:03Legoktm [01:35:11] 10serviceops, 10SRE, 10Datacenter-Switchover, 10Patch-For-Review: Use encrypted rsync for deployment::rsync - https://phabricator.wikimedia.org/T289857 (10Legoktm) I'm guessing no one has done this until now because deployment::rsync was using hand-rolled rsync + timer rather than quickdatacopy. I gave it... [01:38:20] 10serviceops, 10SRE, 10Datacenter-Switchover: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858 (10Legoktm) {T289857} has some notes on how to enable stunnel for this. However the #mw-on-k8s image building process also performs an rsync against the releases host, so it might also n... [06:40:01] hello folks [06:40:13] https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout seems really nice (using kfserving for canary releases) [06:43:43] just added a third model for itwiki-damaging, all good [06:43:45] \o/ [06:44:00] I am going to test the canary release in a bit, but we have 3 models now [07:02:39] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Prod-Kubernetes, and 2 others: Move eventgate services to use TLS only - https://phabricator.wikimedia.org/T255871 (10JMeybohm) @Ottomata that looks unrelated to your chance (but related to yours @Jelto ). We will take a look! [12:15:26] 10serviceops, 10SRE, 10Datacenter-Switchover: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858 (10Dzahn) All files sent to releases are meant to be available to the world though. Does it still matter to encrypt traffic internally for something like this? [12:27:36] I am also testing the knative "serverless" behavior for kubeflow, and it seems working nicely [12:27:57] it is a matter to set minReplicas: 0 for the InferenceService CRD (that is shipped by kubeflow) [12:28:31] with the image already pulled on the node, it took ~10s for a request to bootstrap a pod and get a revision score [12:28:45] (as opposed to ~0.5/0.7s) [12:31:03] I thought this needed the metric server, but I was wrong (it is needed only to autoscale pods based on cpu/memory, like the istio/knative ones) [12:31:03] knative autoscales based on requests [12:31:03] (IIUC) [12:31:03] really nice :) [12:56:55] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10Dzahn) racktables should be readonly and never change again. regardless of data center. it just needs to stay around as-is for reading. nothing should try to write to it, and if it would try then it would be good if th... [13:05:40] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Prod-Kubernetes, and 2 others: Move eventgate services to use TLS only - https://phabricator.wikimedia.org/T255871 (10JMeybohm) >>! In T255871#7320889, @JMeybohm wrote: > @Ottomata that looks unrelated to your chance (but related to yours @Jelto ). We will t... [13:22:05] 10serviceops, 10SRE, 10Datacenter-Switchover: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858 (10fgiunchedi) IMHO yes, we should encrypt traffic unless we have reasons not to (e.g. system is going to be retired, too hard/complex to implement vs advantages, etc) [13:29:32] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10Marostegui) Yeah, there's nothing to do then [13:32:32] 10serviceops: install racktables on miscweb2002 - https://phabricator.wikimedia.org/T269746 (10Dzahn) 05Open→03Resolved great! thanks again :) [13:40:17] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Prod-Kubernetes, and 2 others: Move eventgate services to use TLS only - https://phabricator.wikimedia.org/T255871 (10Ottomata) Great! Proceeding... [14:04:14] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10mepps) @Niharika Based on my read, it also looks like the 10 day delay would only be when there were holidays too. What's the next step... [15:05:31] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Prod-Kubernetes, and 2 others: Move eventgate services to use TLS only - https://phabricator.wikimedia.org/T255871 (10Ottomata) [15:46:54] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10JMeybohm) @Jelto and I where looking at issues in staging today that where caused by high disk and network IO on kubestage1001 (due to the mediawiki C... [18:54:48] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10sbassett) >>! In T288844#7321649, @mepps wrote: > It sounds like @sbassett is moving forward with looking into this. Er, whoops, I'm a... [20:01:47] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) >>! In T284628#7322098, @JMeybohm wrote: > Maybe we should to with increasing the timeout in staging after all and fast track the replacement o... [23:13:48] 10serviceops, 10GitLab, 10Release-Engineering-Team (Next), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10brennen) Note upgrade path docs here: https://docs.gitlab.com/ce/update/index.html#upgrade-paths [23:32:21] effie: when you have time tomorrow, I would love some help figuring out