[09:12:08] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) [09:13:21] 10serviceops, 10SRE, 10User-Elukey: Test memsniff as possible replacement of memkeys - https://phabricator.wikimedia.org/T228970 (10jijiki) For the time being, we have packaged memkeys for bullseye so not to block T293216 [09:32:46] 10serviceops, 10Data-Engineering-Planning, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8517742, @Ottomata wrote: > So, I think the JobManager is no... [09:40:52] 10serviceops, 10Abstract Wikipedia team (Phase θ – Throttling): Kubernetes Wikifunctions security and control measures - https://phabricator.wikimedia.org/T326785 (10akosiaris) [10:14:44] 10serviceops, 10Datasets-General-or-Unknown, 10Developer Productivity, 10Growth-Team (Current Sprint): Allow use of wmf's MW CLI scripts on snapshot hosts instead of bypassing - https://phabricator.wikimedia.org/T314697 (10DMburugu) [10:16:30] 10serviceops, 10Datasets-General-or-Unknown, 10Growth-Team, 10Developer Productivity: Allow use of wmf's MW CLI scripts on snapshot hosts instead of bypassing - https://phabricator.wikimedia.org/T314697 (10DMburugu) [11:01:44] 10serviceops, 10Diffusion-Repository-Administrators, 10Projects-Cleanup: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Clement_Goubert) By "stale docker images" do we mean these? ` releng/composer-hhvm releng/composer-package-hhvm releng/composer-test-hhvm releng/hhvm... [11:17:12] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Clement_Goubert) >>! In T265876#8512672, @Joe wrote: > We have now the logs in kafka, and thus should also be ingested in logstash, and create a dashboard. > > Once tha... [11:24:37] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) [11:25:02] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium [11:25:09] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability, 10Patch-For-Review: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10Clement_Goubert) [11:30:03] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) The retention of the kafka topic is currently the default 7 days. This will be reduced once logstash ingestion is setup. [12:29:14] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10JMeybohm) [14:04:20] 10serviceops, 10Diffusion-Repository-Administrators, 10Projects-Cleanup: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Jdforrester-WMF) >>! In T237038#8519019, @Clement_Goubert wrote: > By "stale docker images" do we mean these? > ` > releng/composer-hhvm > releng/com... [14:07:01] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc1040.eqiad.wmnet with OS bullseye [14:34:58] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc1040.eqiad.wmnet with OS bullseye completed: - mc1040 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [14:52:32] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) Import to eqiad has been completed and traffic is being served via eqiad. [14:52:45] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) [14:54:23] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, 10Kubernetes: Metrics changes with Kubernetes v1.23 - https://phabricator.wikimedia.org/T322919 (10JMeybohm) [15:20:34] 10serviceops, 10Data-Engineering-Planning, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) > Maybe the easier way out is to have the operator chart create a GlobalNet... [15:25:40] 10serviceops, 10Data-Engineering-Planning, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) > Getting the NetworkPolicy right might be a bit tricky, though. That one w... [15:29:17] jayme: why does the networkpolicy to talk to k8s api have to match pods the operator creates? [15:35:30] low weight pooling k8s thumbor for a few minutes [15:43:26] 10serviceops, 10Data-Engineering-Planning, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8519921, @Ottomata wrote: >> Getting the NetworkPolicy right... [15:46:15] okay great jayme, i was looking to see if kubernetesMasters was available to the services helm charts, but i'm not sure they are. is there a different values files I can/should specifcy in the flink-app-example/helmfile.yaml ? [15:47:07] ottomata: hm, I fear there is not [15:47:50] That's not something that is usually required by "userland-services", so we only have that in the admin_ng part currently [15:48:30] could be an argument for a globalnetworkpolicy created by the operator chart ;) [15:55:18] aye [15:57:29] ottomata: but I would try to limit that to the pods the operator creates (so the actual flink clusters) rather than allowing all pods in the watchNamespace to access the api [15:58:15] I'd assume they have some useful labels - or could add some using the podTemplate [16:03:14] some context about those limitations is in https://phabricator.wikimedia.org/T287491 if you are curious [16:11:35] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10JMeybohm) [16:15:55] <_joe_> ottomata: given you've been my guinea pig... 1) there is a new sextant release with some improvements 2) now there's a skeleton of documentation https://gitlab.wikimedia.org/repos/sre/sextant/-/blob/main/README.md [16:16:30] <_joe_> and https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/879557 to add a note to the readme of the deployment charts repo [16:16:50] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Remove the .Values.kubernetesApi hack - https://phabricator.wikimedia.org/T326729 (10JMeybohm) [16:17:21] nice, thanks _joe_ ! [16:35:30] jayme: the flink operator chart iterates through thet watchNamespaces to create Roles in thte namespace. perhaps the operator can do that for the a non global network policy too? [16:35:36] in the watchNamespace? [16:42:00] hm difficult to construct the correct name and metadata though... [16:42:00] hm [17:07:15] <_joe_> why? [17:16:01] 10serviceops, 10Diffusion-Repository-Administrators, 10Projects-Cleanup: Archive operations/debs/hhvm repository - https://phabricator.wikimedia.org/T237038 (10Clement_Goubert) >>! In T237038#8519512, @Jdforrester-WMF wrote: >>>! In T237038#8519019, @Clement_Goubert wrote: >> By "stale docker images" do we m... [17:19:31] _joe_: that why for me? [17:20:06] hang on, lemme get my template rendering and i'll link to change and you will see [17:33:53] <_joe_> ottomata: yeah it was out of curiosity basically [17:34:40] <_joe_> I managed to write a validation function in helm templates, I feel like I can do anything with that turd [17:34:57] _joe_: i'll answer cuz my template is being annoying. [17:35:00] <_joe_> I'm sure your use case will defeat me though :) [17:35:03] Next step, write a parser [17:35:14] we need NetworkPolicy that alllows flink app pods to talk to k8s api [17:35:17] Then recreate mediawiki in helm template [17:35:17] <_joe_> ottomata: if a helm template is not annoying, you're not doing it right [17:35:49] kubernetesMasters is not defined in helmfile services values, so no way for the flink-app chart to know it. [17:36:09] <_joe_> ok [17:36:16] so, we are trying to make the flink-kubernetes-operator chart install a NetworkPolicy in the service's namespace that allows flink pods to talk to k8s aopi [17:36:24] <_joe_> right [17:36:36] <_joe_> and what is the problem with the networkpolicy name and metadata? [17:36:40] but, the flink-kubernetes-operator chart does not have all the nice vendor templates and service specific values [17:36:45] of the actual service [17:36:52] <_joe_> I guess it needs to be unique for namespace [17:36:58] <_joe_> given it's a networkpolicy [17:37:00] yes, we know the namespace [17:37:12] but, we want to select specifc pods [17:37:13] <_joe_> so you can just have a fixed name for that networkpolicy, right? [17:37:15] in that namespace [17:37:17] <_joe_> ahhh [17:37:21] <_joe_> ok [17:37:25] i'm going to try just hardcoding [17:37:26] selector [17:37:29] app: flink-app [17:37:38] with a big note saying THIS IS HARDCODED TO MATCH flink-app CHART NAME [17:37:46] <_joe_> that seems like the best option yes [17:37:57] <_joe_> I mean it makes sense for that to be hardcoded tbh [17:38:15] well, if/when we make a chart for flink-session (we probably won't) we'd have to think about this too [17:38:18] <_joe_> we only want to open the masters to the flink-app pods anyways [17:38:24] ya [17:38:50] <_joe_> or we can just make a bigger chart that includes both deployments and activate one or the other using feature flags [17:39:00] <_joe_> which is a hack, but oh well [17:39:19] <_joe_> uhm, we could maybe add an additional annotation to those pods [17:39:45] <_joe_> something like flavour: flink [17:39:54] <_joe_> but I'd punt that to the day you actually need it [17:40:06] yeah, i had same thought process [17:40:12] cool [18:03:17] 10serviceops, 10SRE: Memcached, mcrouter in MediaWiki on Kubernetes - https://phabricator.wikimedia.org/T277711 (10jijiki) a:05Joe→03jijiki [18:08:28] okay jayme: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/879618 maybe will work? [18:08:31] no global policy needed? [18:41:38] 10serviceops, 10DC-Ops, 10SRE-swift-storage, 10ops-eqiad: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH) [18:41:51] 10serviceops, 10DC-Ops, 10SRE-swift-storage, 10ops-eqiad: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH) [18:43:00] 10serviceops, 10SRE-swift-storage: serviceops implementation tracking for ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326847 (10RobH) [18:47:18] 10serviceops, 10SRE-swift-storage: serviceops implementation tracking for ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326849 (10RobH) [20:06:38] 10serviceops, 10SRE-swift-storage: serviceops implementation tracking for ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326849 (10RobH) 05Open→03Invalid actually data persistence this was a mis categorization [20:07:22] 10serviceops, 10DC-Ops, 10SRE, 10SRE-swift-storage, 10ops-eqiad: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH) [20:07:36] 10serviceops, 10SRE-swift-storage: serviceops implementation tracking for ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326847 (10RobH) 05Open→03Invalid in valid this is actually data persistence i had it mislabeled