[03:55:55] 10Machine-Learning-Team, 10artificial-intelligence, 10Edit-Review-Improvements-RC-Page, 10Growth community maintenance, and 3 others: Enable ORES in RecentChanges for Hindi Wikipedia - https://phabricator.wikimedia.org/T303293 (10Halfak) 1. FWIW, I'm only aware of one community who might have not wanted OR... [05:27:04] 10Machine-Learning-Team, 10artificial-intelligence, 10Edit-Review-Improvements-RC-Page, 10Growth community maintenance, and 3 others: Enable ORES in RecentChanges for Hindi Wikipedia - https://phabricator.wikimedia.org/T303293 (101997kB) There was a consensus on hiwiki village pump when initial talks were... [06:01:54] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Support (or not) the ORES augmented feature output in liftwing - https://phabricator.wikimedia.org/T301766 (10elukey) Current status: * we deployed the new editquality image for arwiki, and tested the feature. Next steps: * apply the same change to the ot... [06:23:26] good morning [06:26:05] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Return meaningful HTTP responses in Lift Wing's revscoring backends - https://phabricator.wikimedia.org/T300270 (10elukey) @AikoChou the last version of editquality should be the one with your augmented output (only applied to arwiki),... [08:02:35] Good morning :) [08:05:28] o/ [09:03:43] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Return meaningful HTTP responses in Lift Wing's revscoring backends - https://phabricator.wikimedia.org/T300270 (10achou) @elukey Yep, the last version of editquality is the one with augmented output. But I found the current image versi... [09:33:47] ahhh now I recall why the current istio build issue was familiar https://github.com/istio/istio/issues/32978 [09:38:33] Is their recommended solution to change the reqirement to 195-patch? [09:40:27] Or maybe use 196, but who knows what else that would break :-S [09:45:09] Hmmm. 1.9.6 is a sec release and has little other changes. Might be worth considering [09:45:15] elukey: https://istio.io/latest/news/releases/1.9.x/announcing-1.9.6/ [09:45:24] That would dodge the 195-patch mess [09:46:26] I can repro the 195 failure here, but 196 builds fine [09:46:30] klausman: I think that we just need https://github.com/istio/istio/commit/98a6168b48fd02c2e3d7cc819067d50044004ba5 [09:46:44] that was included in the 1.9.5-patch branch (sorry in a meeting) [09:46:49] ack. [09:47:10] We can chat on this later [09:52:16] back! [09:52:42] so the main trouble is that in the production-images repository we have the istio-build docker image that currently pulls the 1.9.5 branch [09:53:06] that is not here anymore :D but we don't need to rebuild the image so not a big deal (binaries on it already compiled and available) [09:53:33] I'd like to keep the same code to build for the moment on the istio repo so we can have them in sync [09:53:38] (I know it is lovely to maintain) [09:56:35] Ok then :) [09:56:52] I somehow thought you were making a (new) deb, for unclear reasons [09:57:36] I can explain the various bits [09:57:49] so we currently have two istio sources: [09:58:14] (the unclear reasons pertain to my thinking, not your work :D) [09:59:01] 1) the istioctl deb package - this one is special since we need to keep multiple versions of the istioctl tool on deploy1002, so it basically downloads supported versions from upstream - official amd64 releases - and pack them in a simple deb). It is needed to configure the basic things on k8s land (like when we bootstrap istio etc..) [09:59:57] 2) the production-images docker images, that are: build (containing all the binaries compiled from the 1.9.5-patch branch), pilot, proxyv2 and operator (that we don't use) [10:00:17] the build image is used to copy the various binaries to the pilot/proxyv2 ones [10:00:24] Ah, I am seeing a better picture now [10:00:42] the missing bit is $something that ships the istio-cni and istio-iptables binaries to the k8s nodes [10:01:09] so the idea is to have a generic "istio" deb repo that creates several packages, the first of them is the istio-cni one [10:01:10] Would those need to be in the images or on the host OS? [10:01:17] on the host os [10:01:25] *nod* [10:01:43] since those binaries are called by the kubelet when a pod needs to be created to get the iptables horror [10:01:55] (we also have calico cni binaries for ipam etc..) [10:02:44] the great thing would be to deprecate the istioctl deb and fold it in the new istio repo, but I didn't find a way to manage multiple versions of the istioctl binaries yet (since we target a single build version in debian rules etc..) [10:02:51] so for the moment we can keep them separate [10:02:59] and see what is best for the future [10:03:12] this is my brain dump, lemme know if anything needs more details :) [10:03:58] DO we have external constraints on Istio versions? [10:06:50] the istio makefile pulls binaries from google, like the envoy ones [10:07:32] we could in theory inject ours (since serviceops builds envoy) but I am not sure if it is really something that we want (surely an unsupported way) [10:08:01] Ack [10:08:03] Janis opened a task to upgrade to a more recent istio version, can't find it yet [10:08:26] but one nice thing will be to see how an upgrade looks like with istioctl + the cni binaries [10:08:41] "Nice", he says [10:09:59] in theory the path should be [10:10:24] 1) roll out the new deb for istio-cni (istio ensures compatibility between some ranges of istio versions IIUC) [10:10:29] 2) use istioctl to upgrade [10:10:38] but of course we'll do it with traffic depooled from a DC :D [10:11:22] That seems simple enough [10:56:14] istio-cni finally built :) [11:02:22] \o/ [11:38:35] * elukey lunch [13:03:13] elukey: I presume I can just submit https://gerrit.wikimedia.org/r/c/labs/private/+/772430 ? (private repo changes for the staging k8s) [13:34:24] 10Machine-Learning-Team, 10artificial-intelligence, 10Edit-Review-Improvements-RC-Page, 10Growth community maintenance, and 3 others: Enable ORES in RecentChanges for Hindi Wikipedia - https://phabricator.wikimedia.org/T303293 (10calbon) Awesome, thanks for this. I'll loop in the growth team. [13:58:38] klausman: yes yes! [14:12:53] thx :) [14:26:00] there you go, created https://gerrit.wikimedia.org/r/c/operations/debs/istio/+/771670 [14:26:11] now I am going to file other changes for puppet and deployment-charts [15:05:46] aiko: for the itemsquality code review feel free to open a new pull request with all your changes etc.., I'll close mine in case [15:05:54] so you can keep testing and working on it [15:14:54] elukey: when you have a moment, I am puzzled by a Puppet failure [15:15:24] Oh feck, Gerrit is down [15:15:47] sure [15:15:54] https://puppet-compiler.wmflabs.org/pcc-worker1001/34477/ml-staging-ctrl2002.codfw.wmnet/change.ml-staging-ctrl2002.codfw.wmnet.err [15:16:46] This indicates to me that hieradata/common.yaml is missing a "ml-staging" stanza around line 76. But it's there. Now with Gerrit down, I of course don't know if that isn't what's causing issues. [15:17:04] Oh. it's back :) [15:17:51] https://gerrit.wikimedia.org/r/c/operations/puppet/+/772417/12/hieradata/common.yaml Here's my common.yaml [15:20:51] the issue seems to be at line 38 of profile::kubernetes::master, that is [15:20:52] $_tokens = $all_infrastructure_users[$kubernetes_cluster_group].filter |$_,$data| { [15:21:12] yes, it indicates that the lookup fails (it gets undefined) [15:21:23] $kubernetes_cluster_group for that machine is ml-staging [15:22:16] yeah but it also tries to use profile::kubernetes::infrastructure_users [15:23:04] it is defined in the private repo [15:23:10] and it is a per-cluster hash [15:23:12] That's in the private change I submitted (and Amir puppet-merged) [15:23:38] Mh. Maybe it isn't. [15:23:55] Yeah, that's missing [15:24:06] Welp, that sounds fixable :) [15:24:21] ack :) [15:26:59] DO changes like that need review? [15:29:20] for labs private you can generally merge and maybe put others in CC as FYI [15:29:28] Roger [15:30:37] elukey: I'll do the review of the istio-cni change once this k8s works (or I grow tired of it :)) [15:32:49] yes yes not really urgent, I am doing the companion puppet etc.. changes [15:32:53] that will take me a bit [15:43:24] I'm just looking for excuses not to stare at puppet stuff :) [16:14:19] And +1'd! [16:14:46] Not 100% sure about the duplicate license lines in debian/copyright, but I'm sure you know what you're doing :) [16:15:25] Unrelatedly, the k8s control plane change now is accepted by puppet/pcc, and I'll start picking it apart for gradual updates of prod. [16:19:22] I have shamelessly copied from the calico-cni package :D [16:19:28] *copied it [16:29:51] A time-honored software development and system configuration technique [17:43:43] Hello all. Do you by any chance have any jobs that make use of the hadoop nodes with GPUs? That's an-worker[1096-1101]. [17:44:07] Hey Ben, feel free to proceed! [17:44:36] I'm asking because I need to schedule a reboot of those six servers for a kernel update and I'm trying to minimize disruption to everyone. Many thanks. [17:45:22] Great. Thanks elukey. Will probably proceed from about 09:30 UTC tomorrow, if that's still OK with you. [17:49:11] btullis: all good we don't use the nodes, and research too IIRC since we only experimented with distributed tensorflow a while ago [17:49:21] Cc: miriam: --^ [17:49:25] (just to double check) [17:49:43] miriam: Research doesn't use the hadoop gpus atm right? [17:55:48] yes not at the moment elukey, I think you can go ahead btullis! [17:55:54] thanks for checkin :) [19:17:58] * elukey afk!