[07:29:51] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Krinkle) [07:54:25] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10akosiaris) Thanks for this task. Just to put this in writing, #serviceops has planned to do the `Per Cluster ramp-up` in the `April-Jun` quarter (this quarter we are focusing on Med... [11:06:10] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc1054.eqiad.wmnet with OS bullseye [11:08:32] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10akosiaris) >>! In T324576#8537916, @Ottomata wrote: > @akosiaris manually edited the flink-pod... [11:11:11] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install arclamp1001.eqiad.wmnet - https://phabricator.wikimedia.org/T319433 (10fgiunchedi) [11:24:56] 10serviceops, 10SRE, 10Patch-For-Review, 10User-fgiunchedi: service implementation tracking: arclamp2001.codfw.wmnet - https://phabricator.wikimedia.org/T319429 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by filippo@cumin1001 for hosts: `webperf2004.codfw.wmnet` - webperf2004.codfw.wmn... [11:26:15] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install arclamp2001.codfw.wmnet - https://phabricator.wikimedia.org/T319428 (10fgiunchedi) [11:26:46] 10serviceops, 10SRE, 10Patch-For-Review, 10User-fgiunchedi: service implementation tracking: arclamp2001.codfw.wmnet - https://phabricator.wikimedia.org/T319429 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is completed -- arclamp is hosted on arclamp2001 and webperf2004 has been decom'd [11:27:48] 10serviceops, 10Arc-Lamp, 10Performance-Team (Radar), 10SRE Observability (FY2022/2023-Q3), 10User-fgiunchedi: Expand RAM on arclamp hosts and move them to baremetal - https://phabricator.wikimedia.org/T316223 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi All done! arclamp now lives on baremetal... [11:36:47] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc1054.eqiad.wmnet with OS bullseye completed: - mc1054 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [12:35:05] 1 million dollar question - if I have a change for the changeprop's chart, who should I ask? :D [12:35:43] (I am adding a couple of new jobs basically, nothing fancy just read from kafka -> call lift wing) [12:37:15] elukey: in the absence of someone better, probably me? :D will give it a look [12:37:30] hnowlan: ah so you are taking all the fun!! :D [12:37:47] still not ready yet, I'll add you as soon as it is readable! Thanks a lot! [12:38:09] (changeprop looks really nice btw, it fits really nicely with what I need to do) [12:38:16] (credits to Joe for the hint) [13:30:30] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) YES! Thank you so much @akosiaris! Okay just so I understand, setting KUBERNETES... [13:42:37] akosiaris: why is app flink-app-main? from what i can see with the vendor templates [13:42:40] app: {{ template "base.name.chart" . }} [13:42:51] {{- define "base.name.chart" -}} [13:42:51] {{- default .Chart.Name .Values.chartName | trunc 63 | trimSuffix "-" -}} [13:44:05] i don't see anythign obvious in helmfile that is overriding that, and locally it is app: flink-app [13:53:37] also, i think i need to set that KUBERNETES_SERVICE_HOST if running in wmf prod clusters condiitonally. how can I tell? [13:56:37] i think i'm going to set kubernetesApi in environment helmfile values and vary on that [14:12:20] ottomata: so, helm template . in charts/flink-app returns [14:12:26] kind: FlinkDeployment [14:12:26] metadata: [14:12:26] name: flink-app-RELEASE-NAME [14:12:42] I have no idea what the FlinkDeployment CRD is, first time I ever see it [14:13:07] but given that, "main" is the release name as pointed out in helmfile.yaml [14:13:17] and indeed I see there is a release in there called main [14:14:12] I have no idea how that resource becomes something else btw, nor what is copied [14:14:21] I assume it's a thin wrapper over the Deployment resource? [14:15:31] cause there is indeed there a deployment resource and the labels are [14:15:39] Labels: app=flink-app-main [14:15:39] component=jobmanager [14:15:39] type=flink-native-kubernetes [14:15:45] with Pod Template having: [14:15:52] Pod Template: [14:15:52] Labels: app=flink-app-main [14:15:52] component=jobmanager [14:15:52] release=main [14:15:52] routed_via=main [14:15:53] type=flink-native-kubernetes [14:16:01] so... I guess this is the job of the operator? [14:17:19] yes, exactly as you say [14:17:19] but [14:17:21] ah, interestingly the flinkdeployment resource has a pod template too and there you are correct that [14:17:26] right [14:17:30] Labels: [14:17:30] App: flink-app [14:17:30] Release: main [14:17:30] routed_via: main [14:17:41] and this is starting to stink operator bug [14:17:48] hm [14:17:53] so app should be flink-app [14:18:04] and yet it is not [14:18:14] ok that makes more sense for what i was reading [14:18:15] okay [14:19:02] I haven't read the docs btw at all, I am just assuming stuff here [14:19:37] yeah i'll look into it, you verifying that {{ template "base.name.chart" . }} should be flink-app helps [15:57:23] inflatador: FYI, in case you haven't seen: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/ [16:00:15] ottomata good to know...will we need to do this to existing operator in DSE? I assumed we weren't that far yet [16:02:29] the operator is instatlled in dse, so we will need to do that part [16:02:32] but there is no risk! [16:02:42] i've _almost_ got an actual flink app deployed there [16:03:03] it'd be nice to practice the upgrading with existing FlinkDeployments [16:03:08] but we don't really need do that [16:03:14] the example app doesn't have any satet [16:03:15] state [16:03:45] for the operator, i think for us the process will use helmfile instead of helm, but the steps are the same. [16:04:10] oh actually, we aren't upgrading to v1beta1, so the existing FlinkDeployments thing is irrelevant [16:05:38] Definitely see the benefit to practicing upgrades [16:42:31] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2038.codfw.wmnet with OS bullseye [17:17:17] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2038.codfw.wmnet with OS bullseye completed: - mc2038 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [18:05:58] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) Something is messing with app label I'm setting, and we suspect it is the flink-kube... [21:11:24] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc2039.codfw.wmnet with OS bullseye [21:46:57] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc2039.codfw.wmnet with OS bullseye completed: - mc2039 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa...