[07:34:33] 06serviceops, 10[DEPRECATED] wdwb-tech, 10Foundational Technology Requests, 10Wikidata, 10Wikidata-Query-Service: API Gateway to provide authorization and capacity management for W[CD]QS - https://phabricator.wikimedia.org/T313813#9751867 (10Aklapper) a:05DAbad→03None Removing inactive task assignee.... [07:35:37] 06serviceops, 10AQS2.0, 07Code-Health-Objective, 07Epic: AQS 2.0 - https://phabricator.wikimedia.org/T263489#9751931 (10Aklapper) a:05DAbad→03None Removing inactive task assignee. (Please do so as part of offboarding - thanks.) [08:21:29] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9752050 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemast... [08:54:14] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9752129 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster20... [09:15:28] 06serviceops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06SRE, and 2 others: Site: codfw 1 VM request for staging-codfw kube-apiserver - https://phabricator.wikimedia.org/T363310#9752198 (10JMeybohm) 05Open→03Resolved [11:41:06] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Co-locate kube-apiserver and etcd on new staging control plane nodes - https://phabricator.wikimedia.org/T363307#9752733 (10JMeybohm) p:05Triage→03High [12:43:32] akosiaris: I just verified that this works with node18: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1023824 [12:45:56] nemo-yiannis: awesome, thanks for letting me know [13:13:43] hi folks! [13:14:14] I have a high level question about redirects with the envoy local/mesh proxy for Mediawiki [13:14:33] (since it is related to a problem that I am working on in Istio/sidecar land) [13:15:18] IIUC most of the daemons using the mw api are instructed to use http://localhost:65XX/etc.. [13:15:57] sometimes it happens that the MW API returns a 301 with a Location header, and afaics the header value is always https://some-domain.wikipedia.org/etc.. [13:17:18] in theory the daemons/services that get the Location header with https may have problems if the library used to fetch HTTP data (urllib etc..) automatically follows redirects, since it may try to contact a https://etc.. endpoint that doesn't have any handler in the envoy sidecar [13:17:56] Is there any trick used in envoy-land to update/fix the Location header, or is it something completely offloaded to the daemon/services using it? [13:20:10] elukey: it's not handled in the service mesh [13:21:00] and ofc network policies will block the new request, so it's left up to the applications using it [13:21:55] akosiaris: ack thanks for confirming, I suspected that.. in the ml python services using the istio/envoy sidecar we have to force http (in the python code) since otherwise we'll not have any HTTP metric etc.., and we had some cases of redirects like yue.wikipedia.org -> zh-yue.wikipedia.org that caused implicit http -> https redirects [13:22:02] ending up in timeouts etc.. [13:22:15] so yeah we need to add some workaround in the python code [13:24:25] also, totally different subject - I am rolling out pki tls certs for restbase cassandra instances [13:24:43] in theory no client should be affected but lemme know if you see anything werid [13:24:46] *weird [13:25:33] thanks [14:02:06] 🔥 [14:10:37] jayme: <3 [14:20:32] 06serviceops, 06Machine-Learning-Team: Rename the envoy's uses_ingress option to sets_sni - https://phabricator.wikimedia.org/T346638#9753407 (10JMeybohm) [15:44:38] 06serviceops: Experiment with Memcached Proxy - https://phabricator.wikimedia.org/T363723 (10jijiki) 03NEW [15:44:41] 06serviceops: Experiment with Memcached Proxy - https://phabricator.wikimedia.org/T363723#9753887 (10jijiki) p:05Triage→03Low [15:55:48] 06serviceops, 06MediaWiki-Engineering, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9753927 (10jijiki) @MSantos It would be great if someone from #mediawiki-engineering could undertake this as it requires changes in mediaw... [15:57:13] 06serviceops, 06MediaWiki-Engineering, 10Sustainability (Incident Followup): Cache mw-mcrouter service ClusterIP in apcu cache - https://phabricator.wikimedia.org/T363186#9753937 (10jijiki) [16:02:05] 06serviceops, 06Machine-Learning-Team, 10MW-on-K8s, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9753974 (10elukey) After a lot of tests and config changes, we are almost ready to proceed with prod. Hopefully we'll get to it on April 2nd. [16:59:23] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Co-locate kube-apiserver and etcd on new staging control plane nodes - https://phabricator.wikimedia.org/T363307#9754317 (10JMeybohm) I've added the new, stacked control-plan with some manual intervention as etcd did not come up initially w... [17:12:37] 06serviceops, 10ops-codfw, 06SRE: Degraded RAID on mw2382 - https://phabricator.wikimedia.org/T362938#9754389 (10Jhancock.wm) Apologies for the wait on this one. I checked out the server and the drives look to be working physically. But when I logged into the idrac it sees zero disks. Checked the warranty an... [20:18:08] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 12), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9755282 (10Scott_French) I believe that's everything that can be done for now,...