[08:29:48] elukey: \o/ [08:38:47] :) [08:38:57] I'll do the eqiad ones (4 for the moment) this morning [09:27:21] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Implement POC for istio ingress - https://phabricator.wikimedia.org/T290966 (10JMeybohm) [12:58:38] Hello. I wonder if I could get some guidance please. I'm attempting to write a helm chart for running DataHub. The pattern I'm attempting to use consists of one umbrella chart and four subcharts: https://phabricator.wikimedia.org/T301454#7727873 [13:00:06] I'm currently stuck trying to build the egress policy for each of the subcharts using the symlinked default-network-policy-conf.yaml file. [13:01:35] There's a WIP patch here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375 [13:01:35] ...and there's a design document for the DataHub deployment here, in case that's of interest: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375 [13:02:57] I bet it has something to do with having subcharts and things assuming specific path's... I can take a look [13:03:19] jayme: Many thanks. [13:04:05] all ml-serve nodes on bullseye :) [13:25:02] nice :-) [13:59:48] 10serviceops, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10MoritzMuehlenhoff) Before the address this for production please give me a few days headsup and I'll respin the PHP 7.4 packages for the latest versions. There is currently on... [14:00:54] do we have a dashboard that shows how every k8s worker node is doing cpu/memory/ephemeral-storage wise? Something like kubectl describe nodes [14:05:10] 10serviceops, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Reedy) [14:05:51] 10serviceops, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Reedy) [14:09:35] 10serviceops, 10DC-Ops, 10SRE: setup/install mc20[38-55] - https://phabricator.wikimedia.org/T302218 (10Papaul) @akosiaris hello any reason why this task is assigned to me ? [14:18:44] 10serviceops, 10DC-Ops, 10SRE: setup/install mc20[38-55] - https://phabricator.wikimedia.org/T302218 (10akosiaris) a:05Papaul→03None >>! In T302218#7728106, @Papaul wrote: > @akosiaris hello any reason why this task is assigned to me ? I created it as a subtask of T294962 and forgot to remove the assign... [14:19:16] 10serviceops: Productionise mc20[38-55] - https://phabricator.wikimedia.org/T293012 (10akosiaris) [14:19:34] 10serviceops, 10DC-Ops, 10SRE: setup/install mc20[38-55] - https://phabricator.wikimedia.org/T302218 (10akosiaris) [14:40:10] elukey: host overview oder cluster overview dashboards work for k8s clusters as well [14:43:38] jayme: yeah but I meant something like [14:43:59] Resource Requests Limits [14:43:59] -------- -------- ------ [14:44:05] memory 13244Mi (10%) 17524Mi (13%) [14:44:07] etc.. [14:44:20] more k8s related [14:47:01] ah [14:47:07] no :-) [14:48:43] elukey: https://phabricator.wikimedia.org/T264625 [14:49:01] that has all the things [14:49:48] ah nice [14:55:45] it's a nice k8s onboarding project actually. If you happen to have someone to onbaord :-) [15:01:49] <_joe_> elukey: oh [15:02:12] <_joe_> elukey: there is node_top.py in my home on cumin2002 :D [15:02:23] <_joe_> it only works with codfw-staging at the moment [15:03:48] :) [15:45:38] 10serviceops, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10JMeybohm) >>! In T271736#7728080, @MoritzMuehlenhoff wrote: > Before the address this for production please give me a few days headsup and I'll respin the PHP 7.4 packages for... [15:51:56] o/ I have "helmfile apply" stuck looping on "1 out of 2 expected pods are ready" and then timing out, there seems to be a pod it fails to kill (always the same one) [15:54:01] dcausse: by timing out you mean it rolls back? [15:54:06] yes [15:54:58] flink-session-cluster-main-taskmanager-644fd8f6dd-sq5jb has an age of 27h so this seems weird [15:55:11] dcausse: is it in staging or prod? [15:55:13] thich cluster? [15:55:17] staging sorry [15:55:23] ah this is interesting [15:55:36] we moved the cluster to bullseye + overlay recently [15:56:07] it's also the first time I deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/751070 [15:56:21] I mean I saw this change in the helm diff [15:58:48] Error creating: pods "flink-session-cluster-main-taskmanager-77767d6549-vdtf6" is forbidden: exceeded quota: quota-compute-resources, requested: limits.memory=3024288k, used: limits.memory=9597152k, limited: limits.memory=10Gi [15:59:35] this is 14m ago [16:00:23] oh so not enough res on the cluster [16:00:27] oopsie :) [16:00:40] I should tune staging values down a bit [16:00:56] yeah, that would actually be nice [16:01:05] ok doing that [16:06:33] 10serviceops, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10MoritzMuehlenhoff) >>! In T271736#7728632, @JMeybohm wrote: >>>! In T271736#7728080, @MoritzMuehlenhoff wrote: >> Before the address this for production please give me a few d... [17:00:37] kk [20:29:44] not only can you have git repos on gerrit, phabricator, github or gitlab. In addition phabricator also supports git, svn AND mercurial repos :o [21:41:30] 10serviceops, 10MW-on-K8s, 10Performance-Team, 10SRE, 10WikimediaDebug: Ensure WikimediaDebug "log" and "profile" features work with k8s-mwdebug - https://phabricator.wikimedia.org/T288164 (10dpifke) 05Open→03Resolved [21:41:33] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10User-jijiki: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10dpifke) [22:04:54] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dduvall) a:05dduvall→03None