[06:39:59] 10serviceops, 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) Following on a previous discussion, noting some concerns that current implementation (copying over the latest ba... [06:54:17] 10serviceops, 10CX-cxserver, 10Wikidata, 10wdwb-tech, 10Language-Team (Language-2021-July-September): cxserver: https://cxserver.wikimedia.org/v2/suggest/source/Paneer/ca?sourcelanguages=en occasionally fails with HTTP 503 - https://phabricator.wikimedia.org/T285219 (10KartikMistry) >>! In T285219#722084... [07:58:17] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate nginx-controller as an Ingress - https://phabricator.wikimedia.org/T286197 (10JMeybohm) My past impression of the nginx-ingress was that while it's okay for low traffic stuff you would start getting trouble with increased traffic. That probably is mostly due to the... [08:14:39] 10serviceops, 10SRE-swift-storage, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech: Find a way to make swift Tempauth usable behind envoy - https://phabricator.wikimedia.org/T286935 (10Joe) I would say this needs a more thorough change of how we use envoy - I'm specifically thinking of doing something... [08:40:40] hi, I'm looking for the package (and an example image that's using it) that propagates puppet certs (context is skipping envoy to access swift) [08:54:22] found wmf-certificates that's probably it [08:56:35] _joe_: I was discussing with david and nemo-yiannis that we will probably end up letting their apps directlty talk to thanos-swift [08:56:46] unless we have another solution at hand [08:57:02] <_joe_> dcausse: correct [09:07:38] wmf-certificates <-- dcausse nemo-yiannis [09:07:48] it is in our debian repo [09:07:57] let me find an example as well [09:08:50] effie: thanks! [09:09:38] the other solution for us would be to patch the swift client, that would allow to still use envoy. Might be hard to replicate for other usecases tho [09:11:45] yeah, lets keep it simple and uniform [09:13:02] effie: just looked through our blubber file and adding a package seems pretty obvious [09:13:17] ok cool then [09:39:49] 10serviceops, 10MW-on-K8s, 10Release Pipeline, 10Patch-For-Review: Evaluate Dragonfly for distribution of docker images - https://phabricator.wikimedia.org/T286054 (10JMeybohm) [10:39:35] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Joe) Hi and sorry for the late replies, just got back from my break and I'm catching up with the backlog. Just to... [10:46:40] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) [10:47:23] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) [12:43:54] kubelets on ml-serve-ctrl100[1,2] may have needed 2 more vcores [12:43:55] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=ml-serve-ctrl1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=ml_serve&from=now-14d&to=now [13:08:43] <_joe_> elukey: what's happening there? I see cpu utilization increasing linearly [13:08:50] <_joe_> what is running on that system?> [13:09:26] <_joe_> 22% of cpu for softirq?? [13:14:13] _joe_ still have to figure out, it seemed creeping over time, but right after I deployed the kubelets + bird for calico [13:14:25] but I added 2 more vcores to see the differences [13:14:54] <_joe_> elukey: it doesn't look like a healthy trend tbh [13:16:28] definitely, I am curious to see if it rehappens during the coming days [13:17:18] I'll report what I find [13:42:43] 10serviceops, 10MW-on-K8s, 10SRE: Create a gateway in kubernetes for the execution of our "lambdas" - https://phabricator.wikimedia.org/T261277 (10Joe) >>! In T261277#7204714, @JMeybohm wrote: > We also talked about using Istio Ingress in the past (envoy-based) which could be a good fit as well and we could... [13:54:15] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10Joe) [13:56:42] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10Joe) Istio can be configured with native ingress resources, using the annotation: ` kubernetes.io/ingress.class: istio ` see https://istio.io/latest/docs/tasks/traffic-manageme... [14:40:43] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10Joe) Of course, istio also offers its own custom resource definitions for a richer configuration: the istio gateway (https://istio.io/latest/docs/reference/config/networking/gat... [14:41:04] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10Joe) [14:53:04] jayme: thanks for fixing the CI issues! [15:01:03] 10serviceops, 10MW-on-K8s, 10SRE: Evaluate istio as an ingress for production usage - https://phabricator.wikimedia.org/T287007 (10Joe) Metrics can easily be collected with prometheus - in fact, istio ships with the correct annotations and thus should easily be picked up by our prometheus without adding any... [15:07:29] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) [15:08:40] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) [16:10:24] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw1289.eqiad.wmnet` - m... [16:22:00] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw1290.eqiad.wmnet` - m... [16:27:17] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) [16:29:39] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) [16:33:14] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Dzahn) [16:37:44] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw1297.eqiad.wmnet` - m...